In this notebook , we look at a sample of the images and their description
import datasets
import ipyplot
dset = datasets.load_from_disk("../data/processed")
Sample row in the dataset
dset[0]
{'photo_id': 'XMyPniM9LF0', 'photo_url': 'https://unsplash.com/photos/XMyPniM9LF0', 'photo_image_url': 'https://images.unsplash.com/uploads/14119492946973137ce46/f1f2ebf3', 'photo_submitted_at': '2014-09-29 00:08:38.594364', 'photo_featured': 't', 'photo_width': 4272, 'photo_height': 2848, 'photo_aspect_ratio': 1.5, 'photo_description': 'Woman exploring a forest', 'photographer_username': 'michellespencer77', 'photographer_first_name': 'Michelle', 'photographer_last_name': 'Spencer', 'exif_camera_make': 'Canon', 'exif_camera_model': 'Canon EOS REBEL T3', 'exif_iso': 400.0, 'exif_aperture_value': '1.8', 'exif_focal_length': '50.0', 'exif_exposure_time': '1/100', 'photo_location_name': None, 'photo_location_latitude': None, 'photo_location_longitude': None, 'photo_location_country': None, 'photo_location_city': None, 'stats_views': 2375421, 'stats_downloads': 6967, 'ai_description': 'woman walking in the middle of forest', 'ai_primary_landmark_name': None, 'ai_primary_landmark_latitude': None, 'ai_primary_landmark_longitude': None, 'ai_primary_landmark_confidence': None, 'blur_hash': 'L56bVcRRIWMh.gVunlS4SMbsRRxr', 'description_final': 'Woman exploring a forest', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x427>}
num_images = 20
images = dset[:num_images]["photo_image_url"]
labels = dset[:num_images]["description_final"]
ipyplot.plot_images(images,labels , max_images=num_images, img_width=150)
dset_top = dset.sort("stats_downloads", reverse=True).select(range(num_images))
Loading cached sorted indices for dataset at ../data/processed/cache-ce9a216c140b30a6.arrow
num_images = 25
images = dset_top[:num_images]["image"]
labels = dset_top[:num_images]["description_final"]
ipyplot.plot_images(images,labels , max_images=num_images, img_width=150)
/opt/conda/envs/workshop/lib/python3.7/site-packages/ipyplot/_utils.py:95: FutureWarning: The input object of type 'JpegImageFile' is an array-like implementing one of the corresponding protocols (`__array__`, `__array_interface__` or `__array_struct__`); but not a sequence (or 0-D). In the future, this object will be coerced as if it was first converted using `np.array(obj)`. To retain the old behaviour, you have to either modify the type 'JpegImageFile', or assign to an empty array created with `np.empty(correct_shape, dtype=object)`. return np.asarray(seq, dtype=type(seq[0]))
The images are of very high quality by skilled photographers.
Unfortunately the description provided by them / or supplemented by labelling services, can be a bit a lacking. 😅