FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-05-09 09:38:03 [INFO] Going to loop over dir ../../kaggle_datasets/shopee-product-matching
2023-05-09 09:38:03 [WARNING] Found a combination of images and tar files in the input folder. Please remove the tar/zip/tgz files from the input folder
fastdup supports running on webdata format with tar/tgz/zip files but only when there are no images in those input folders
If you want to run only on the tar files and ignore the images please run with turi_param='tar_only=1'
2023-05-09 09:38:03 [INFO] Found total 32415 images to run on, 32415 train, 0 test, name list 32415, counter 32415
2023-05-09 09:38:47 [INFO] Found total 32415 images to run onimated: 0 Minutes
Finished histogram 24.408
Finished bucket sort 24.490
2023-05-09 09:38:48 [INFO] 917) Finished write_index() NN model
2023-05-09 09:38:48 [INFO] Stored nn model index file ../../kaggle_datasets/my-fastdup-workdir/nnf.index
2023-05-09 09:38:49 [INFO] Total time took 45382 ms
2023-05-09 09:38:49 [INFO] Found a total of 8020 fully identical images (d>0.990), which are 12.37 %
2023-05-09 09:38:49 [INFO] Found a total of 3283 nearly identical images(d>0.980), which are 5.06 %
2023-05-09 09:38:49 [INFO] Found a total of 24447 above threshold images (d>0.900), which are 37.71 %
2023-05-09 09:38:49 [INFO] Found a total of 3241 outlier images (d<0.050), which are 5.00 %
2023-05-09 09:38:49 [INFO] Min distance found 0.515 max distance 1.000
2023-05-09 09:38:49 [INFO] Running connected components for ccthreshold 0.960000
.0