FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-20 17:57:26 [INFO] Going to loop over dir imagenette2-160
2023-03-20 17:57:26 [INFO] Found total 13394 images to run on
2023-03-20 17:57:54 [INFO] Found total 13394 images to run onimated: 0 Minutes 0 Features
2023-03-20 17:57:55 [INFO] 1657) Finished write_index() NN model
2023-03-20 17:57:55 [INFO] Stored nn model index file fastdup_imagenette/nnf.index
2023-03-20 17:57:56 [INFO] Total time took 30624 ms
2023-03-20 17:57:56 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-03-20 17:57:56 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-03-20 17:57:56 [INFO] Found a total of 16741 above threshold images (d>0.800), which are 41.66 %
2023-03-20 17:57:56 [INFO] Found a total of 1339 outlier images (d<0.050), which are 3.33 %
2023-03-20 17:57:56 [INFO] Min distance found 0.470 max distance 0.969
2023-03-20 17:57:56 [INFO] Running connected components for ccthreshold 0.900000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 13394 images
Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data
Similarity: 2.73% (366) belong to 20 similarity clusters (components).
97.27% (13,028) images do not belong to any similarity cluster.
Largest cluster has 40 (0.30%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.8, connected component threshold used is 0.9).
Outliers: 6.21% (832) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.