Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-10-16 11:43:51 [INFO] Found total 32412 images to run on
Finished histogram 7.114
Finished bucket sort 7.176
2023-10-16 11:43:51 [INFO] 886) Finished write_index() NN model
2023-10-16 11:43:51 [INFO] Stored nn model index file work_dir/nnf.index
2023-10-16 11:43:52 [INFO] Total time took 1454 ms
2023-10-16 11:43:52 [INFO] Found a total of 9530 fully identical images (d>0.990), which are 14.70 % of total graph edges
2023-10-16 11:43:52 [INFO] Found a total of 5040 nearly identical images(d>0.980), which are 7.77 % of total graph edges
2023-10-16 11:43:52 [INFO] Found a total of 27522 above threshold images (d>0.900), which are 42.46 % of total graph edges
2023-10-16 11:43:52 [INFO] Found a total of 3241 outlier images (d<0.050), which are 5.00 % of total graph edges
2023-10-16 11:43:52 [INFO] Min distance found 0.105 max distance 1.000
2023-10-16 11:43:52 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 32412 images
Valid images are 100.00% (32,412) of the data, invalid are 0.00% (0) of the data
Similarity: 41.05% (13,304) belong to 29 similarity clusters (components).
58.95% (19,108) images do not belong to any similarity cluster.
Largest cluster has 86 (0.27%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 7.43% (2,409) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.
########################################################################################
Would you like to see awesome visualizations for some of the most popular academic datasets?
Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup
########################################################################################