#!/usr/bin/env python # coding: utf-8 #
# # # # # vl logo. # # #
#
# # Logo # # # Logo # # # Logo # # # Logo # # # Logo # #
# # Analyze Torchvision Datasets # # [![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMAAAABuCAYAAABxyhyZAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4wYOByseuquXkAAAAB1pVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVBkLmUHAAAUvklEQVR42u2d23YaSbKGv8gqqjjofHRPt+dy3qHbPQ+/V9vTaz/Cvps9u9stA8KWEOJUVbEvskBIBlEgKoGCXEtLlmQJKvKPiD8iIyOELV3aqCoEkBhQH0oRxFXwHu1/iKsQRWASMKPfiuyHiYAEuegJO770tqIkBvBBPYgCwIBvwPO//4U4giixMpQITAySADFy8bh18pTtAHuoSABJBSIfogpINd0QH4YCmj6KepCkv+ip/WzSz36cfj+CZAgyBNOFIAKvC8kAuewXVinGcoxLMCxbecZlC/bYh8iDJH38WECmiELVosao/fBjK88oAa8Hfhf8VJ463Hgjs7FvTutHCmVIQujXQEsQBxCb1Oq/fALN9oj64kcmAS8Bb2AVIuyA6YAZIhftrVcGbVaV5AC0DL2qlWMUWqBPynEsKl0MPq/J0+tCMAB5AOlspHHZqDekt4fKsALJIfTLqZXyUkFrfu9Y0z8sgKh176UhBI9g2lDqIufbowx6W1HiCkQHMDyAQdla+JEc836SsXES+4UXWVmGXTD3YB6Ri81QBlm/hSqrtU7HwBF0A0j8OVbdoVj8GMI+eHdQukMu7zbXazYP9Qn0BzAM1izHKTINBlB6AK8NQRu56MhOKoA2QyWpQf8MhocwKFkeL7phsEpFZBLw+1DugN8Gc7cxVkzrR0pyDN0jGFYtTYQNlOWEPP3YUqRKC/yHtdFN5y+qjVCJj2F4BL0jG5C5cMurcu0i4A2h3IbSV/Dba1MEbZyoleNJyuvNhlj7BeBnEig9QuUBzFfk6l4KqwBaP1X650/A36rNmiI2bwjhHZRbyNVXZ7LUZlUZnEDvwvJ7ZItl+VIRWuC1nAXMTl5EGzVleA69cxgEW75ZU8RXGkCtBX4rd1eu9SulewL9Y0t1pCiyxFLgUgyle6h8Q67qsvUKoDdXSv/M8tOtt1SviVGtGw+byLvVb5w2jpX+KfRP0+C2iHKckGdpAEELKvkaldz+sNaP7Ib1zrec7ixowfyhdePl25VtnNavlc4F9A82KKvjyDZX7qBcR65bsjUKoPVTpXtdcKv/SqBsgPI9lG+Q629Ly1hvK9aIdK93wOq/AtGgC9VGLrHB6l31H6GSvIfOebH46TKKEPas9frbzcJy1sax0ruE7nF6LrLDskTs4eTBNwjrKz2UXKkC6MdQ8RWufJAfoXe6JdVGOVKiahuqfy50gKZfTpTeOwv+nQb+FLjWWlBuIJffVoIss7K9/hQqCvQF6hHon5YG7PIqxVBqLQb++pny+OMe/LPcaucMHt9ZI7EpCqAfQ8VWGNvVGynBfyD8uqP7KDadF95ll+PNlfLwUxrs7sE/Uwm6x/D4ozUW61YA/RjqGPiTxKo36Ql2TQnSNF6liZx3JRv4T5X2OxhU9uDPogTDMsQh+j++rk0BxuDXGdHFriqBSaByi1y1MoL/Qnl8D/Ee/JmSCyaG8mdoNeDWs/TbtQK8Cv6dVgKBsA3B14wB75lNGe8tfzbZejGEn6Ftwc/Qfls/ldWZAowDXs32nndGCZS0Pijb6aVNdb7bc/7MaI2s5X9IwT+i3pEVvn5c3BOYpcDPRMDLXgmePWftLlMphDYPld4VdA/34F+E9rQnwC8TP49ZKuVuFga/jjRuCXAUWQlU7HVKvzH/vzZDpXcOndM9+DN51ZT2PEwB/+T/i1jYC2RWAP0t5fzxGy1kUZXAS2zhVpZ69ugMumd7cGeS6wT4mzPA/wJj+nv2eMAsBN5VgLWQSiBQ+QZea74hqZ8pj5dpgeB+vSrTEe15zfJ/Z1wA1cyZoUwKMI6wdXXP9p0SvOWPjQLy0eX2aR/P/s+KN6o0sGXQcwq1tF5Tuhebm/FZVI6a4/sYBbztBcA/qQQLQHE++FXfRn1ee9DyRO1Q/3Qx0Av2bqmJLTeTQfp54s2qB3iggf23emlrFVmiDcg0E6Jw8Cfyt8/zZXnzXrl/99R7Z1MsrabP4SVWdpJBlni2Y0fkPe3FqjTCZOD8c6mT/R358LpR8ueDNCfwv/QEV39CyBwlSHlYaQDlHiSPUOpaa+ENIbFdyiabMWkzVCsND5KSrawcVsBUoVeGYekN/E4gfLB5/3libJwo7dMNusU1Q5beIAV9zMvueWNZimcp3EplOQH+ZS3/5IqBkg2K5dfZSvCqAiyTV129Ekzcvw0ebaZF2uD3M5UZzLqwrq2K4oegh7bxVr+2WNmxYi+/hM255bnaDJXBuaU+6wT/5KX+kSy9Nkg308X+V2VZqmC7fCwhy8lszyrAP0mFvDdQIP2UFrm52pxndOjsCfjlewjvQe5yuSytjapaRTi0F/ajUgagChx8QX76dwbqc6Xcv0+bfK0R/H70JEu9R657OcryyN5bHpoMIE7r/d9Ke2ZFuWY2FfJfBX/icINeeoKqghoIW7m3HpHLR4FHtNlWSvdW+XpHab9RnU19ggxZny9lpXeyvkstiu2RGrYhbDqW5V0GWb4x4H1jpDvzR/rbAuUOq14B8B6omMzVlCvFTP1IiU6gez79KqKJ4egz8u6v+da/fq3c/Zi2JlwDzy8NoPIFgm9r6d5s212e2Tr+l7LMg/bMDIgF+fC9x/Nncv91gF9SlxWB/LS+rmv2MOserT8onR+e1+ooULnPlvNvVpXO8XqaAgg2vVxpZq5KzeVtnLcF2uhNxxb9TcryZWFbHuAfxz46kyFN/4XEsaRGXO3Xvsg/N6Rx6lVLqP0fVJtpckMgGEJwmy0WiQ5sBzx1jHyjUKtD7a+1gv/Zu3rXFA7+gNptauiWOORadqV/e1pSx/+e+6d5f9fg90B+2cD22Vf3os1YEYHuKQStTC06tBEqvWObJ3eV+VFsJ+aDBoSNtdDH1+ODO9FmpPhDGA7yt/wZMkJm6ndix+CXzQT/U/qvI1Q+w+EfUG1lBGPNdmh2Zv7FBrtHTQi+bBz4n8kyrEOvAa3JQzQHxmGKFzDfZX5idQ/+Xzd/KotcPAqlVvaWHEnNbeArCgdfwG9sTNfqmW/1rCvyj0jGFRauVvT965nvvnJosLYF/E9KkO292pqfWtqt2ZEwK7dQam7VnC75tS8jI+jO6MorChA5DH697QL/YqsGUdWNNVHsmUTlNs3Bb9eSX/syqttxIqtYn12fNM/ojyvxGSh0x6x+ZWLKTc4b6kdQa66sUdSa1MAdHF4U7ZnvBOriDfhMPZQowrJjR2tuKj4Ntgnv1ZetlqV86Fkv4IIxRrNiAHFEf3yQn4s7ipS4bKe15G5NxA6dq9wVQmzyS0qFHDGQUTbIgKOqz5T3F/4K7KBisz95q7hJoPwNufhWHGOi4EQJ4pcewJX11/kXFLZ/E6sOsj8CQS9TOcZ2UaG+ODGQ+sR9zPgbeSuAccTx1on9ek2hkr+XU4XgDrnsFM+YuMBJQjrDePRSrvL/SbEVAPFhkHPZswpUhrbxbhGXM4zYPTL60UH6Uyy325Qit/xkWk6vPOYsS+mAdItpQ/7p6FwgtrGvQcDJra9d6P/klfO/9WUSCNsbX+6w8VhJACMYjOT/gqUdoD+QjoDNGZcSg+kXW45JihkHrMQ4qcUwUnz6AxDlbElUwO8DxVYA+WdfXtbs5EODFEOS86YZ+0JFX/qlrEQ5lz8IthvGVaf4xiTWfLNBaebTzO3xvwoF2AX6I2nvoVxfQ8HfkamDCfkrgBY+M+80Ara3v/JWgGCwF/UKlcw4sc4ie2GvTJbDHXlON5hxowCqO7JheZdAJxDFu6EAjjDzVAqx9wAriFDzxr9n+5/uEEVxowBb/hAbZrpy3K2Y/do2BTA7QoHGwJd8hRkEe3Su9CVc5IF2gQLpssPT9u50vUGwi5Ng3Q+CWw3+fdAdGa3kNAjerxWs2E6ryXsNgr2oV4j+/D1AsiNqpqOpKnkrgGfbLu4C/8+T8clYAXLmPwngFT8GkOue4DuoKx8EoGHxDYon+SuAuKgFAkjUzhso+sq9TkdBAqBcbGf6W6i5F2mmSmacdOYd7ggNCgb5W5NEyL3qdBPoT94VH2oZq0HVTSuKXTgMjnv5H1YlBoZHtgFXYfmkqxhDMbY3ozjRtsLTIOnZWbuasyyjiv0oKv2JHdDykiC/9sUSk0jdaV2hyWsEQZSvCRO1s7YGh8WlPy5WMvlyipteLAU/yLQ3tbpujMnwGG0cF8+jusDJxC3Fp85wLjZN0i7UhdaCR9u5IW9OGYUQHxXLgbrqUG543hhLPqSDCnIPEncgGA66dk5X7qllA4+naOOkOAZFcDOeyzx1JzfPXI+j9tT6e4G9gNdLOzc4OBMYlKF/Ugzr/3uoTvpTvThhfoK8EXdTOiKeTekoFAM67wqmY0eVujCZ3TP05mqrZamfyhb8ruZT6DQFcNm6JIFCt4oLu2ByzgaNYwEfupd2uv32qsDaEiRjBRgPLHOVhoodziVwvjopDXJk0Xo16F2m3am3DPof07y/K7Sb57PpzHfCdKj0aDGVQK46QuXenZcToHsO/Qu0WdWtAn+CWzLw2phU+dlRNmiSCm2JEixcemA6UBq6lWXvHKS8PeB3zYSnjOcyUy3zOpTgX5urBFo/UwYnaPMw+3uUDgQPbtyqAl4MwV9Qb2+8QdGPoY7KY5yS/fiVGODZ8hxLJNncmEDrZ0rnB7h/D9FZdvxf9oXSnb0lpjkrgRdD+BkeGlA3kGyuQRmD33XQ62ViROmb/JRyM9dvUp5Uct1DtLURKvEZPF7awXeopTS1P5F39WwT45tVpfMTdE9z8vVis03lz9BuwK3HuLx91Kcr2YyB5PpbesrrmvNPBr9T5tOZmS51LVJK3dRJgt6cqjbWE9Bp41jp/wDtH57AD7YIbYGUo1w8CpU78Ib5yGoW+Ec/j6zlW7dn1U+h4q0J/HMwPdMy6L9Cd4cTk+skhtNDiN7bU9WwBV7bUoq8ZdSsKoMTGJxDrzr74Q/qUP4r05QW/VJWen9fvRcwE7TnJfinuX/j3hvop5TurIPyTCK8NHs2tf8qL/fJ/2bOpIaepuDvv4eoDFKBwQGU79H6vSJ3uSiCNg+VQQ06J9A/TOd8vQLW7hn4j0B9vvyve6I335T+4YrGJ8nrln/aGtXXS1pykMi4FiY3nj85enedVcA+r7Zrko0JWE5egv/FW/SGEDxC2AFpQ6mPnHWX3kS9rShRBfQQeke2ribJOi5TIHyAgz+Ry/mDqrVZVgY/wv3F27zAKNuzCPin7fhokKWu1iOMga9rpjuTBF9ef8b5m/fJQZHSTPBPs36JrbYs9yB5hFJadmCG49YkL72ENkOFtKlUHMCwAqYKj+lU95cFIlmRVG1C9S/L9efGFSdK++/Qr7zuXeZZ/qy0J4sieBOP7gsMdSGFGJe2j1LnLprjLWj95w1m9zNrUh5eYCrtmfMLiUAS2PYg5tBeQTRpTx6JgQH6v/10Y9LHaweggd3xJB1koenfk2WjfoXemfVKPM7H2+U30ZtDJQrtM+RNe7LIPprY36Ha85j/CnU8oG7SmpuJr0fJimTie5s2tsBktwPzZfWvtF5j1Upwsgj4M0b3s3pKjlrtrZT5iq3/r/0Huc5Aheo1pfvjYgHxKmjPMt5BXjGAymbXMhqmnvourSfySw4lEqsA/+RmjTdNp3/kcutNbZp0cJmpVEKuOkK5bpUm05sR9+BnwuqPPqIXX2964YrJBv4FHMVEOm0Vwl0V+DdiKXRPYHiRTV+vvwnVxvyzgXl5/v16BaOyiK5kNLQf+oLI26cAnRYJ/OmK0+uJ9bNsttFvQaU13/KvKuDdlZXmMxZJ8S7kAeRDT95ULHdSQPCDzeoMKtC9Qpvza/Lloi+Ub6H2dTqqR5b/oQHNPfgXiWbnZX3eRoGYuDizyMtogcE/+ZD9Q+gdZ5PjRdvGA5X2kzAVm9Ha057FwW8WB/9SCjDWMn+BjTktOvjTlRjoX6M3F9lqhS7vhPKNPVTTPe1ZGvz+8gd6yzN65emYWXeQ9swSyjCA/hXa6KpcduY+rVy3RG/SG/SmtQf/EuB/S3r+TSLWT6GOO/nqroP/hXE4rCM//Tv7qeofofKg8GWP+szILVnwL0N93kSBnlGheIYf2VXwjzZnwXYl8lNf+Cp2U/c6MH95bwf/mxVgHBSPKkdlD/7xikv27sACndvk576MqeVeCWYbF2+5jM/KKdBUSnSwB/8z8R7fQOkLctFboMis/DQhJdlj/pm5ltVWsK6+OOD2TLn/AXoHS1Y9Fgj84QPU/kKuWrK0QXF9eXyTKY9Jy3JW7FBWHwPeXCm969dvVRU2ABZboVpuQbmBXN2/LdHw32VloLvtCQzgCfLL6i/x5HcrqHGs9K5snUzmiyYFsPqlAdSa4NVXdntt3KRg1yjRxIFrXlc5c2XoeltRhidpZ4Uyy1082Qarjz3EqtxDcItct/LxrOvq1rGOQHfE9z/ke4fZSYhqvcGlHeszDIqjBCq2C3TQscVtXiv3y/vjIRJxQRXB8JTidHCB312HgGaoRIfQv4B+DRJ/+xUhGELwFcJvme4G7xVhTpA7qun52V3nCudJyjEt6h8/KYLqlqRLxWa2/KGlO95tpptgucnyY3oSD6ylhc2qLP6oGdoH9w281rd5txUlPoLuMUQ1iEr2ruympU5HbQ29BIIeBHfg3a8V+FO9gaZWdNQCJdlw0JN/gLvRCvBEjapKcmC7JcST7UlYo0mbEIsfQ+kBwjvwu8jFt432Vfp7WVF9fq1xU9Yk65V8exNtjQI827z6kRJVgZod+hCV0mZSki9N0gnLJGq7SwSP6aSXNnhdO/pom+LzkVcoiZ0DHbOePvyj0gUdfaFroTpboQDPPcOJkoSQlFPvENh2KIk8eQid9hQ6+/Fe/kg05fQRyMBOdQkGwD2YrpN2jM6UYbSSiedfJVUyL/7uCPS6fpqzlQrwHU0isHdv4zJEFfvZT6Ue++nPZtT2ecnTZy+CJE2dmC54fdtcS4aZGlwVRiH81DOMAmgzkYmBp/KL74wGz0soJ3sLKeBJWqXZ25a0xpZu4rjbW1o6mRjQNKUw8hAmJcGSpE8a2by9RFtHafJXinRqp0mbiCb6HPyj3kBi+TuSNilT3VjrnmX9P9CP0/2YVCGzAAAAAElFTkSuQmCC&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-torchvision-datasets.ipynb) # [![Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJYAAACWCAYAAAA8AXHiAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH5wURChYLYQ3XmQAAClFJREFUeNrtnWmsXVUVgL913n0dgJYKVFGh94ESiIqWaEIwRkhfheBALZJqRSXS0slSK5WOtAydR9S2SmLxh1H6gxjaGA2xXI0aGo2a1sYBceDdQpqWaq1CKOW+d5c/9n6vt6/TG/b2nmF9yfvXrnPOvV/2XmfftdcGwzAMwzAMwzAMwzCMoEizb6CRckUBEuA64H0Cpf7coSoAxxF+I8KfUbRjXKoesTCUmn0Dp2Es8AOBcn+1F/FyKXtVmQh0NPthikrS7Bs4De9hAFJ1I+7/XQO8o9kPUmRSJZZ3qVUGP3slpHM0Lgzp+vAlZUmfMWBSNWIZ+cHEMqJgYhlRMLGMKJhYRhRMLCMKJpYRBRPLiIKJZUTBxDKiYGIZUTCxjCiYWEYUTCwjCiaWEQUTy4iCiWVEwcQyomBiGVEwsYwomFhGFEwsIwomlhEFE8uIgollRMHEMqJgYhlRMLGMKJhYRhRMLCMKJpYRBRPLiIKJZUTBxDKiYGIZUTCxjCiYWEYUTCwjCiaWEQUTy4iCiWVEIV0nU+QQf6JZCXei2Q3AeQHCCnAI+CFwuNqevvM8TKyIeKmGA3OAecDowJfYUDvG/HJFNW1ymViR8FINAxYCC0QYGvoaqryzdTgloNbs5+2N5VgRaJBqAfGkOgBsE0mfVGBiBcdLNRS4H1gYSar9wIzkMD9GIW3TINhUGJQGqb4KLBZhWOhrqPICMEtKPF0fnU6pwMQKRoNU84AlkaT6GzBDEiramV6pwKbCIHiphgBfAR4QYXjQCyio8hwwNREqWk+3VGAj1qBp81IpzAWWBpcKUPgDMF2E3fWU5lS9sRFrELSdGKnmAA+KBFn8PAlV9gJTRNitGZEKbMQaMOWKotAKzBZ4KJJUvwWmCezJklRgYg0In1O1ArOBhxHOD3oBBYVfAdOBfSpQHZcdqcCmwn7T8NvfLOARES4IGV/d3y+BKZJRqcDE6hcNUs0ElseQCuVnwFSEP6nC/gxKBSZWn2mQajqwUoQRwS+i/AS4B3heFarjsykVmFh9okGqacCq0FIpoMqPgGki/B2F/RlK1E+HJe/nwEvVAkzFSTUyZHxVAHYA9wIvkWQzp+qNjVhnoUGqKcAaES4MGd9L9STuReAl6YKOm7IvFdiIdUbKFUWVFhG+CKyNIJUC24H7gEO1TjhwSz6kAhPrtLRVFJQE4S5gnQijQsZXpQ58F1da88/qy8Dk/EgFJtYplCtKXUnESbVehDeFjK9KF/AdXGXpkSytpvcHy7EaKPuRSoTPAxtEuChkfIVO4DFgPjmWCkysHtoqikCCcCewMbhUSg1lC7AYOJpnqcCmQqCnSiEBPiuwCeHikPFVeQP4GrAceDXvUoGJ1V2lIMBnvFSXhIyvynFgA7AKeK0IUkHBxSpXlHodSRI+DTyKhN33p8oxYA2wHjhWFKmgwGKVKwpdSNLCJODrIrw5ZHxVXsNNfY8Cx4skFRQ0eS9XlM5OhBbuII5UrwLLgE0UUCoo4IhV/qmigpRK3A58Q4S3hIyvyn+BB3DLCrUiSgUFE6tcUbQTkRYmAptFuDRkfFWOAouAbUBnUaWCAk2F5YpSHgXSwgScVG8NGV+Vf+H2FH6bgksFBRmxyhWlJFA9ym3AFhHeFjK+KoeBeQrfF6gXXSoowIhVriilEnQqnwC2Crw9ZHxVDgJzVE2qRnI/YtVbobPGx4CtIlwWMrbv+HKvwFMIqetR1UxyPWKJQFLjVuCbIlweMnZ3x5fXW3lKTapTyPWIpcp44FsijAkc13V8SXh6WC1bG0n/X+RVrDrwEWCCCOWQgX3Hl5kiPEMGmnM0i7yKNRSYHbrpme/4MgPh5yh0mFRnJJdiiSAQXKoOYKoIz2al40szyXXyHpgSuD2AptS5MbH6iF+q2IhypQCXu+JA4wyYWP3jemCVwij74M6OfT79QNwceAeueW1r2UatM2Ji9RMRWoAvA3e2lnp2Sxu9MLEGgG9ftLzWyY0CjDG5TsHEGiA+md+gcJW9JZ6KiTUIBD4ArAYusinxZEysweCGqom4M3OGmFwnMLEGiQgJrsntF67YgbSZXICJFQTfivvhFz7JOATKz5hcxRRLUb9DORi+3Hm9KlcjMGZXseUqnFjqeqjvBO5T5Ujg8NcBa4FLpHCf7MkU6vF9F70ngS+hPAZsVKUzVHy/Mn8bbgvY0CIn84URy3fR+x4u0T6AUAc2A9s14PfvS3ZmAnerIkWVqxBi+S5624C5Ai9rqaee6hVgKbA75PX8CWAPinBzUVfmcy+WKjVgK76LXke7sP9Gv1buiquqwHy/OSIYfuv+OoV3CcX7TTHXYvmGZ5uAJcB/eld9VscLKCTCs8Ay38wjGALvBdZB2KYjWSC3YvneVKuBRzhLF71qu6Cu89oTwBafi4XBXfKjOLGHFWnUyqVYPhnfhBPrnF30OtoFFWrAeoSdIe/FJ/PTgHuoFyeZz6VYwBu4o9n63JvK/6sjKItV+X3Im/EHjy8l4VaRYuRbeRWr33T482tEeA6XzB8KGV9cG8p1qlwL+V+ZN7EaaMi3dgErVHk96AWEd+P6kV6a5PyTz/nj9Z9qu/iTc3kceJyQi6fu7xZgmcJ5eZ4STazT4POyY8AKhV0hV+Z9Mnc3MF2UJK9lNibWGah3AcJBXBHf8yFj+63/S1T4eF7LbEysM/DizW7xVIQ9wCJV/h0yvrjTL9aqMjaPb4om1lnoTuZRduJqrYJVQgCIcA1uQ0bQ1pVpwMQ6B9V2AaEL2AI8ETTfcrQDDwHn52nUMrH6QEMlxDLCV0IA3AXMUqUlL3KZWH3F5VuxKiGGAAtFmID0nEaWaUysPlId7/KtlliVEO58xDUo71ey/6ZoYvWDarvQ5b7v8JUQgAhX4Y6guyzrTbhMrH7i8y1XCQE7Q48rAjfhSn0uyHK+ZWINnCPAIpS9QaO6kepzwBzIbjJvYg2A7lIcgb8ACyJUQrQC9wOfyuriqYk1QPyP1dTrcSohRBgFrFblesjehgwTaxBU24UkOVEJEXrxVOBKXDI/RshWDZeJNUgaKyGAXSHLbHy+9SEfe2SWdldn6FbTi18ZOAjMV5d3hYvtgk8G5gKlrORbJlYAuk+oEGEvsDhCJUQJd8jmpKwk8yZWIHoqIYhWCTESWKnKB7Owu9rECojPt6JVQojQhiuzuSLtC/MmVmB8ThSlEsJzA7ASuDDNU6KJFZiOca7yFKJVQgBMwuVcqT3EwMSKQHW82/4srhJiaYRKiBbcW+LkJKWHGJhYkeg4kcxvBzZHqIQYAayod/LhNG7ISJ1YEUp/m0ZDJcQGYEfo+P6c65Uoo9NWZpMqsbxUtQBy1SHs6/5A6ekJ4da39kS4xFjg6mY/Z29SJZZnH/CPgbrlpfwj8NdmPwj4xVPpqYSYrcqvVekKMjK7GC3AqGY/Z2/SeHTvPuB2lGtxK859/woUAY4Dv3tlCC+OqDX7URzVcUK5okjCbq0zEbdkcDFhDms9BPyi2c9oGIZhGIZhGIZhGIYRmf8B3B08w5ZeUIcAAAAldEVYdGRhdGU6Y3JlYXRlADIwMjMtMDUtMTdUMDk6Mzk6MTArMDA6MDBf+TKDAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDIzLTA1LTE3VDA5OjM5OjEwKzAwOjAwLqSKPwAAAABJRU5ErkJggg==&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-torchvision-datasets.ipynb) # [![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/analyzing-torchvision-datasets) # # This notebook shows how you can analyze [Torchvision Datasets](https://pytorch.org/vision/main/datasets.html) for issues using fastdup. # ## Installation # # First, let's install the necessary packages. # In[ ]: get_ipython().system('pip install -Uq fastdup torchvision') # Now, test the installation. If there's no error message, we are ready to go. # In[1]: import fastdup fastdup.__version__ # ## Download Dataset # Torchvision provides many built-in datasets in the `torchvision.datasets` module. The datasets span across various tasks such as image classification, object detection, and segmentation to name a few. # # Let's download the [Caltech 256](https://data.caltech.edu/records/nyy15-4j048) dataset to our local directory. # # Caltech 256 dataset consists of 256 object categories containing a total of 30607 images for image classification. # In[2]: from torchvision.datasets import Caltech256 caltech256 = Caltech256(root='./', download=True) # The datasets is downloaded into the `caltech256` folder in the root directory. # In[3]: caltech256.root # ## Construct Annotation DataFrame # Although you can run fasdup without the annotations, specifying the labels lets us do more analysis with fastdup such as inspecting mislabels. # Since the dataset is labeled, let's make use of the labels and feed them into fastdup. # # fastdup expects the labels to be formatted into a Pandas `DataFrame` with the columns `filename` and `label`. # Let's loop over the directory recursively search for the filenames and labels, and format them into a DataFrame. # In[4]: import glob import os import pandas as pd # Define the path path = "caltech256/" # Define patterns for tif image found in the dataset patterns = ['*jpg', '*jpeg'] # Use glob to get all image filenames for both extensions filenames = [f for pattern in patterns for f in glob.glob(path + '**/' + pattern, recursive=True)] # Extract the parent folder name for each filename label = [os.path.basename(os.path.dirname(filename)) for filename in filenames] # Convert to a pandas DataFrame and add the title label column df = pd.DataFrame({ 'filename': filenames, 'label': label }) df # ## Run fastdup # One the dataset download completes, analyze the image folder where the dataset is stored. # # Point `input_dir` to the directory where the images are stored. # In[5]: fd = fastdup.create(input_dir="caltech256") fd.run(annotations=df) # ## View Galleries # # You can use all of fastdup gallery methods to view duplicates, clusters, etc. # # ```python # fd.vis.duplicates_gallery() # create a visual gallery of duplicates # fd.vis.outliers_gallery() # create a visual gallery of anomalies # fd.vis.component_gallery() # create a visualization of connected components # fd.vis.stats_gallery() # create a visualization of images statistics (e.g. blur) # fd.vis.similarity_gallery() # create a gallery of similar images # ``` # Lets view some of the image clusters in the dataset. # In[6]: fd.vis.component_gallery() # And also inspect duplicates. # In[7]: fd.vis.duplicates_gallery() # You can also see potential mislabels. # In[8]: fd.vis.similarity_gallery(slice='diff') # ## Wrap Up # In this tutorial, we showed how you can analyze datasets from Torchvision Datasets using fastdup. # # Next, feel free to check out other tutorials - # # + ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here! # + 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start. # + 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go! # + 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. # # ## VL Profiler - A faster and easier way to diagnose and visualize dataset issues # # If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. # # VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost! # # [Sign up](https://app.visual-layer.com) now. # # [![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/github_banner_profiler.gif)](https://app.visual-layer.com) # # As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues). #
# # # # # vl logo. # #
# GitHub • # Join Slack Community • # Discussion Forum #
# #
# Blog • # Documentation • # About Us #
# #
# LinkedIn • # Twitter #