FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-11 13:16:29 [INFO] Going to loop over dir images
2023-07-11 13:16:29 [INFO] Found total 7390 images to run on, 7390 train, 0 test, name list 7390, counter 7390
2023-07-11 13:16:29 [ERROR] Failed to read image images/Abyssinian_34.jpgtes
2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_139.jpgs
2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_145.jpg
2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_167.jpg
2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_177.jpgs
2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_191.jpg
2023-07-11 13:16:45 [INFO] Found total 7390 images to run ontimated: 0 Minutes
Finished histogram 1.707
Finished bucket sort 1.726
2023-07-11 13:16:45 [INFO] 138) Finished write_index() NN model
2023-07-11 13:16:45 [INFO] Stored nn model index file work_dir/nnf.index
2023-07-11 13:16:45 [INFO] Total time took 16247 ms
2023-07-11 13:16:45 [INFO] Found a total of 120 fully identical images (d>0.990), which are 0.81 %
2023-07-11 13:16:45 [INFO] Found a total of 8 nearly identical images(d>0.980), which are 0.05 %
2023-07-11 13:16:45 [INFO] Found a total of 1006 above threshold images (d>0.900), which are 6.81 %
2023-07-11 13:16:45 [INFO] Found a total of 739 outlier images (d<0.050), which are 5.00 %
2023-07-11 13:16:45 [INFO] Min distance found 0.597 max distance 1.000
2023-07-11 13:16:45 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 7390 images
Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data
For a detailed analysis, use `.invalid_instances()`.
Similarity: 1.00% (74) belong to 3 similarity clusters (components).
99.00% (7,316) images do not belong to any similarity cluster.
Largest cluster has 12 (0.16%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.