%pip install -U fastdup
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621 Requirement already satisfied: fastdup in /Users/dannybickson/homebrew/lib/python3.8/site-packages (0.926) Collecting fastdup Downloading fastdup-1.0-cp38-cp38-macosx_11_0_arm64.whl (32.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32.8/32.8 MB 8.3 MB/s eta 0:00:00m eta 0:00:01[36m0:00:01 Requirement already satisfied: numpy in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (1.24.3) Requirement already satisfied: tqdm in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (4.65.0) Requirement already satisfied: pillow in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (9.5.0) Requirement already satisfied: pyyaml in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (6.0) Requirement already satisfied: pandas in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (2.0.1) Requirement already satisfied: requests==2.28.1 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (2.28.1) Requirement already satisfied: sentry-sdk in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (1.21.1) Requirement already satisfied: packaging in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (23.1) Requirement already satisfied: certifi in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (2022.12.7) Requirement already satisfied: opencv-python-headless<=4.5.5.64 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from fastdup) (4.5.5.64) Requirement already satisfied: charset-normalizer<3,>=2 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from requests==2.28.1->fastdup) (2.1.1) Requirement already satisfied: idna<4,>=2.5 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from requests==2.28.1->fastdup) (3.4) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from requests==2.28.1->fastdup) (1.26.15) Requirement already satisfied: tzdata>=2022.1 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from pandas->fastdup) (2023.3) Requirement already satisfied: pytz>=2020.1 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from pandas->fastdup) (2023.3) Requirement already satisfied: python-dateutil>=2.8.2 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from pandas->fastdup) (2.8.2) Requirement already satisfied: six>=1.5 in /Users/dannybickson/homebrew/lib/python3.8/site-packages (from python-dateutil>=2.8.2->pandas->fastdup) (1.16.0) Installing collected packages: fastdup Attempting uninstall: fastdup Found existing installation: fastdup 0.926 Uninstalling fastdup-0.926: Successfully uninstalled fastdup-0.926 DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621 DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621 Successfully installed fastdup-1.0 Note: you may need to restart the kernel to use updated packages.
import fastdup
import numpy as np
#chnage to your image folder
input_dir = '/Users/dannybickson/visual_database/cxx/unittests/two_images/'
# Run fastup on an input image folder to create embeddings
fd = fastdup.create(input_dir=input_dir, work_dir='out')
fd.run(overwrite=True, print_summary=False)
# Read the embeddings to use them in python
# There are two images in the input_dir, so the embedding matrix is 2x 576.
# Each row in the embedding matrix is an image.
flist, embedding_matrix = fastdup.load_binary_feature(filename='./out/atrain_features.dat')
print('Read embedding matrix of shape', embedding_matrix.shape)
print('Image filenames are')
print(flist)
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson. 2023-05-17 15:11:02 [INFO] Going to loop over dir /Users/dannybickson/visual_database/cxx/unittests/two_images 2023-05-17 15:11:02 [INFO] Found total 2 images to run on, 2 train, 0 test, name list 2, counter 2 2023-05-17 15:11:03 [INFO] Found total 2 images to run onEstimated: 0 Minutess 2023-05-17 15:11:03 [INFO] 19) Finished write_index() NN model 2023-05-17 15:11:03 [INFO] Stored nn model index file out/nnf.index 2023-05-17 15:11:03 [INFO] Total time took 1030 ms 2023-05-17 15:11:03 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 % 2023-05-17 15:11:03 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 % 2023-05-17 15:11:03 [INFO] Found a total of 0 above threshold images (d>0.900), which are 0.00 % 2023-05-17 15:11:03 [INFO] Found a total of 1 outlier images (d<0.050), which are 50.00 % 2023-05-17 15:11:03 [INFO] Min distance found 0.805 max distance 0.805 2023-05-17 15:11:03 [INFO] Running connected components for ccthreshold 0.960000 .0Read a total of 2 images Read embedding matrix of shape (2, 576) Image filenames are ['/Users/dannybickson/visual_database/cxx/unittests/two_images/test_1234.jpg', '/Users/dannybickson/visual_database/cxx/unittests/two_images/train_1274.jpg']
import fastdup
import numpy as np
import os
input_dir = '/Users/dannybickson/visual_database/cxx/unittests/two_images/'
flist = os.listdir(input_dir)
flist = [os.path.join(input_dir, f) for f in flist]
# replace the below code with computation of your own features
matrix = np.random.rand(2, 576).astype('float32')
# save the embedding along the filenames into a working folder
!mkdir -p embedding_input
fastdup.save_binary_feature('embedding_input', flist, matrix)
fastdup.run('~/visual_database/cxx/unittests/two_images/', run_mode=2, work_dir='embedding_input')
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson. 2023-05-17 15:11:12 [INFO] Found total 2 images to run on 2023-05-17 15:11:12 [INFO] 0) Finished write_index() NN model 2023-05-17 15:11:12 [INFO] Stored nn model index file embedding_input/nnf.index 2023-05-17 15:11:12 [INFO] Total time took 64 ms 2023-05-17 15:11:12 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 % 2023-05-17 15:11:12 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 % 2023-05-17 15:11:12 [INFO] Found a total of 0 above threshold images (d>0.900), which are 0.00 % 2023-05-17 15:11:12 [INFO] Found a total of 1 outlier images (d<0.050), which are 50.00 % 2023-05-17 15:11:12 [INFO] Min distance found 0.733 max distance 0.733 2023-05-17 15:11:12 [INFO] Running connected components for ccthreshold 0.960000 .0
0
# Note: files should contain absolute path and not relative path
import fastdup
import numpy as np
import os
input_dir = '/Users/dannybickson/visual_database/cxx/unittests/two_images/'
flist = os.listdir(input_dir)
flist = [os.path.join(input_dir, f) for f in flist]
# replace the below code with computation of your own features
matrix = np.random.rand(2, 576).astype('float32')
fd2 = fastdup.create(input_dir=input_dir, work_dir='output2')
fd2.run(annotations=flist, embeddings=matrix, print_summary=False, overwrite=True)
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson. 2023-05-17 15:13:40 [INFO] Found total 2 images to run on 2023-05-17 15:13:40 [INFO] 0) Finished write_index() NN model 2023-05-17 15:13:40 [INFO] Stored nn model index file out3/nnf.index 2023-05-17 15:13:40 [INFO] Total time took 65 ms 2023-05-17 15:13:40 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 % 2023-05-17 15:13:40 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 % 2023-05-17 15:13:40 [INFO] Found a total of 0 above threshold images (d>0.900), which are 0.00 % 2023-05-17 15:13:40 [INFO] Found a total of 1 outlier images (d<0.050), which are 50.00 % 2023-05-17 15:13:40 [INFO] Min distance found 0.733 max distance 0.733 2023-05-17 15:13:40 [INFO] Running connected components for ccthreshold 0.960000 .0
Next, feel free to check out other tutorials -
If you prefer a no-code platform to inspect and visualize your dataset, try our free cloud product VL Profiler - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser.
Sign up now, it's free.
As usual, feedback is welcome!
Questions? Drop by our Slack channel or open an issue on GitHub.