Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
ad88 88
d8" ,d 88
88 88 88
MM88MMM ,adPPYYba, ,adPPYba, MM88MMM ,adPPYb,88 88 88 8b,dPPYba,
88 "" `Y8 I8[ "" 88 a8" `Y88 88 88 88P' "8a
88 ,adPPPPP88 `"Y8ba, 88 8b 88 88 88 88 d8
88 88, ,88 aa ]8I 88, "8a, ,d88 "8a, ,a88 88b, ,a8"
88 `"8bbdP"Y8 `"YbbdP"' "Y888 `"8bbdP"Y8 `"YbbdP'Y8 88`YbbdP"'
88
88
2023-09-26 15:54:05 [INFO] Going to loop over dir /tmp/tmpgtex405b.csv
2023-09-26 15:54:05 [INFO] Found total 23262 images to run on, 23262 train, 0 test, name list 23262, counter 23262
2023-09-26 15:54:19 [ERROR] Image images/cat/image_13685.jpg is too small, image size is 4 x 4, min_input_image_width=10
2023-09-26 15:54:50 [INFO] Found total 23262 images to run onimated: 0 Minutes
Finished histogram 6.086
Finished bucket sort 6.131
2023-09-26 15:54:51 [INFO] 1485) Finished write_index() NN model
2023-09-26 15:54:51 [INFO] Stored nn model index file work_dir/nnf.index
2023-09-26 15:54:53 [INFO] Total time took 47865 ms
2023-09-26 15:54:53 [INFO] Found a total of 50 fully identical images (d>0.990), which are 0.11 % of total graph edges
2023-09-26 15:54:53 [INFO] Found a total of 2 nearly identical images(d>0.980), which are 0.00 % of total graph edges
2023-09-26 15:54:53 [INFO] Found a total of 362 above threshold images (d>0.900), which are 0.78 % of total graph edges
2023-09-26 15:54:53 [INFO] Found a total of 2327 outlier images (d<0.050), which are 5.00 % of total graph edges
2023-09-26 15:54:53 [INFO] Min distance found 0.062 max distance 1.000
2023-09-26 15:54:53 [INFO] Running connected components for ccthreshold 0.960000
.0
########################################################################################
Dataset Analysis Summary:
Dataset contains 23262 images
Valid images are 100.00% (23,261) of the data, invalid are 0.00% (1) of the data
For a detailed analysis, use `.invalid_instances()`.
Similarity: 0.24% (56) belong to 1 similarity clusters (components).
99.76% (23,206) images do not belong to any similarity cluster.
Largest cluster has 4 (0.02%) images.
For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.9, connected component threshold used is 0.96).
Outliers: 5.96% (1,386) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
For a detailed list of outliers, use `.outliers()`.
########################################################################################
Would you like to see awesome visualizations for some of the most popular academic datasets?
Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup
########################################################################################