Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-09-21 14:55:06 [INFO] Going to loop over dir pets-download-labelbox
2023-09-21 14:55:06 [INFO] Found total 7378 images to run on, 7378 train, 0 test, name list 7378, counter 7378
2023-09-21 14:55:27 [INFO] Found total 7378 images to run ontimated: 0 Minutes
Finished histogram 1.799
Finished bucket sort 1.818
2023-09-21 14:55:28 [INFO] 125) Finished write_index() NN model
2023-09-21 14:55:28 [INFO] Stored nn model index file work_dir/nnf.index
2023-09-21 14:55:28 [INFO] Total time took 21225 ms
2023-09-21 14:55:28 [INFO] Found a total of 120 fully identical images (d>0.990), which are 0.81 % of total graph edges
2023-09-21 14:55:28 [INFO] Found a total of 8 nearly identical images(d>0.980), which are 0.05 % of total graph edges
2023-09-21 14:55:28 [INFO] Found a total of 1006 above threshold images (d>0.900), which are 6.82 % of total graph edges
2023-09-21 14:55:28 [INFO] Found a total of 739 outlier images (d<0.050), which are 5.01 % of total graph edges
2023-09-21 14:55:28 [INFO] Min distance found 0.597 max distance 1.000
2023-09-21 14:55:28 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 7378 images
Valid images are 100.00% (7,378) of the data, invalid are 0.00% (0) of the data
Similarity: 2.01% (148) belong to 3 similarity clusters (components).
97.99% (7,230) images do not belong to any similarity cluster.
Largest cluster has 12 (0.16%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 6.18% (456) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.
########################################################################################
Would you like to see awesome visualizations for some of the most popular academic datasets?
Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup
########################################################################################