Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-08-01 14:04:04 [INFO] Going to loop over dir test
2023-08-01 14:04:04 [INFO] Found total 39997 images to run on, 39997 train, 0 test, name list 39997, counter 39997
2023-08-01 14:05:22 [ERROR] Failed to read image test/scientific_publication/2500126531_2500126536.tif
2023-08-01 14:05:37 [INFO] Found total 39997 images to run onimated: 0 Minutes
Finished histogram 12.923
Finished bucket sort 12.997
2023-08-01 14:05:41 [INFO] 4180) Finished write_index() NN model
2023-08-01 14:05:41 [INFO] Stored nn model index file work_dir/nnf.index
2023-08-01 14:05:44 [INFO] Total time took 100130 ms
2023-08-01 14:05:44 [INFO] Found a total of 1392 fully identical images (d>0.990), which are 1.74 %
2023-08-01 14:05:44 [INFO] Found a total of 10177 nearly identical images(d>0.980), which are 12.72 %
2023-08-01 14:05:44 [INFO] Found a total of 73590 above threshold images (d>0.900), which are 91.99 %
2023-08-01 14:05:44 [INFO] Found a total of 4003 outlier images (d<0.050), which are 5.00 %
2023-08-01 14:05:44 [INFO] Min distance found 0.647 max distance 1.000
2023-08-01 14:05:44 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 39997 images
Valid images are 88.00% (39,996) of the data, invalid are 0.00% (1) of the data
For a detailed analysis, use `.invalid_instances()`.
Similarity: 52.47% (23,850) belong to 33 similarity clusters (components).
35.53% (16,147) images do not belong to any similarity cluster.
Largest cluster has 85,160 (187.37%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 5.09% (2,312) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.
########################################################################################
Would you like to see awesome visualizations for some of the most popular academic datasets?
Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup
########################################################################################