Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-13 19:22:31 [INFO] Going to loop over dir /tmp/tmpqm6imqyr.csv
2023-07-13 19:22:31 [INFO] Found total 13394 images to run on, 13394 train, 0 test, name list 13394, counter 13394
2023-07-13 19:23:04 [INFO] Found total 13394 images to run onimated: 0 Minutes
Finished histogram 3.121
Finished bucket sort 3.151
2023-07-13 19:23:04 [INFO] 544) Finished write_index() NN model
2023-07-13 19:23:04 [INFO] Stored nn model index file work_dir/nnf.index
2023-07-13 19:23:05 [INFO] Total time took 34024 ms
2023-07-13 19:23:05 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-07-13 19:23:05 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-07-13 19:23:05 [INFO] Found a total of 16764 above threshold images (d>0.800), which are 62.58 %
2023-07-13 19:23:05 [INFO] Found a total of 1339 outlier images (d<0.050), which are 5.00 %
2023-07-13 19:23:05 [INFO] Min distance found 0.519 max distance 0.969
2023-07-13 19:23:05 [INFO] Running connected components for ccthreshold 0.900000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 13394 images
Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data
Similarity: 3.11% (416) belong to 18 similarity clusters (components).
96.89% (12,978) images do not belong to any similarity cluster.
Largest cluster has 562 (4.20%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.8, connected component threshold used is 0.9).
Outliers: 6.24% (836) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.