FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-05-16 08:53:06 [INFO] Going to loop over dir imagenette2-160
2023-05-16 08:53:06 [INFO] Found total 13394 images to run on, 13394 train, 0 test, name list 13394, counter 13394
2023-05-16 08:53:20 [INFO] Found total 13394 images to run onimated: 0 Minutes
Finished histogram 7.122
Finished bucket sort 7.177
2023-05-16 08:53:20 [INFO] 309) Finished write_index() NN model
2023-05-16 08:53:20 [INFO] Stored nn model index file fastdup_imagenette/nnf.index
2023-05-16 08:53:21 [INFO] Total time took 14601 ms
2023-05-16 08:53:21 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-05-16 08:53:21 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-05-16 08:53:21 [INFO] Found a total of 16757 above threshold images (d>0.800), which are 62.55 %
2023-05-16 08:53:21 [INFO] Found a total of 1339 outlier images (d<0.050), which are 5.00 %
2023-05-16 08:53:21 [INFO] Min distance found 0.476 max distance 0.969
2023-05-16 08:53:21 [INFO] Running connected components for ccthreshold 0.900000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 13394 images
Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data
Similarity: 3.11% (416) belong to 19 similarity clusters (components).
96.89% (12,978) images do not belong to any similarity cluster.
Largest cluster has 566 (4.23%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.8, connected component threshold used is 0.9).
Outliers: 6.23% (835) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.