Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-04 10:46:16 [INFO] Going to loop over dir images_dir/images
2023-07-04 10:46:17 [INFO] Found total 110000 images to run on, 110000 train, 0 test, name list 110000, counter 110000
2023-07-04 10:50:00 [INFO] Found total 110000 images to run onmated: 0 Minutes
Finished histogram 30.955
Finished bucket sort 31.144
2023-07-04 10:50:23 [INFO] 23053) Finished write_index() NN model
2023-07-04 10:50:23 [INFO] Stored nn model index file work_dir/nnf.index
2023-07-04 10:50:35 [INFO] Total time took 258529 ms
2023-07-04 10:50:35 [INFO] Found a total of 54 fully identical images (d>0.990), which are 0.02 %
2023-07-04 10:50:35 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-07-04 10:50:35 [INFO] Found a total of 12656 above threshold images (d>0.900), which are 5.75 %
2023-07-04 10:50:35 [INFO] Found a total of 11001 outlier images (d<0.050), which are 5.00 %
2023-07-04 10:50:35 [INFO] Min distance found 0.597 max distance 1.000
2023-07-04 10:50:35 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 110000 images
Valid images are 100.00% (110,000) of the data, invalid are 0.00% (0) of the data
Similarity: 0.03% (31) belong to 1 similarity clusters (components).
99.97% (109,969) images do not belong to any similarity cluster.
Largest cluster has 4 (0.00%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 6.36% (6,992) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.