FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-01 11:48:22 [INFO] Going to loop over dir food-101/images
2023-03-01 11:48:24 [INFO] Found total 101000 images to run on
2023-03-01 12:19:09 [INFO] Found total 101000 images to run on
2023-03-01 12:29:58 [INFO] 648922) Finished write_index() NN model
2023-03-01 12:29:58 [INFO] Stored nn model index file fastdup_food101/nnf.index
2023-03-01 12:32:14 [INFO] Total time took 2630145 ms
2023-03-01 12:32:14 [INFO] Found a total of 170 fully identical images (d>0.990), which are 0.06 %
2023-03-01 12:32:14 [INFO] Found a total of 88 nearly identical images(d>0.980), which are 0.03 %
2023-03-01 12:32:14 [INFO] Found a total of 5236 above threshold images (d>0.900), which are 1.73 %
2023-03-01 12:32:14 [INFO] Found a total of 10100 outlier images (d<0.050), which are 3.33 %
2023-03-01 12:32:14 [INFO] Min distance found 0.379 max distance 1.000
2023-03-01 12:32:14 [INFO] Running connected components for ccthreshold 0.900000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 101000 images
Valid images are 100.00% (101,000) of the data, invalid are 0.00% (0) of the data
Similarity: 1.70% (1,718) belong to 30 similarity clusters (components).
98.30% (99,282) images do not belong to any similarity cluster.
Largest cluster has 79 (0.08%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.9).
Outliers: 5.97% (6,029) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers(data=True)`.
########################################################################################
Dataset Analysis Summary:
Dataset contains 101000 images
Valid images are 100.00% (101,000) of the data, invalid are 0.00% (0) of the data
Similarity: 1.70% (1,718) belong to 30 similarity clusters (components).
98.30% (99,282) images do not belong to any similarity cluster.
Largest cluster has 79 (0.08%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.9).
Outliers: 5.97% (6,029) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers(data=True)`.