To run any of Eden's notebooks, please check the guides on our Wiki page.
There you will find instructions on how to deploy the notebooks on your local system, on Google Colab, or on MyBinder, as well as other useful links, troubleshooting tips, and more.
Note: If you find any issues while executing the notebook, don't hesitate to open an issue on Github. We will try to reply as soon as possible.
In this notebook, CLIP model is going to be used to extract meaningful visual features from agricultural data. Then, on the top of those features, a linear model is trained to create the final classifier (Linear Probe). CLIP (Contrastive Language–Image Pre-training) is a model developed by OpenAI, designed to understand and generate content across both visual (images) and textual (language) modalities. It represents a significant step in the evolution of foundation models, particularly in the realm of multimodal AI.
Foundation models refer to a class of large-scale machine learning models that are trained on a broad range of data sources to acquire a wide set of capabilities. These models can be adapted or fine-tuned for various specific tasks and applications. On the other hand,
Multimodality in machine learning refers to models that can understand, interpret, or generate data from multiple different modalities, such as text, images, audio, and video.
Linear Probe involves training a simple linear model, like logistic regression, on the features extracted from one of the layers of a pre-trained neural network. The objective is to see how well these features can be used for a specific task, such as classification.
!pip install -q git+https://github.com/openai/CLIP.git
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
from pathlib import Path
import clip
import numpy as np
import os
import cv2
from tqdm import tqdm
from glob import glob
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
SEED = 2023
Check the docstrings for more information.
def read_data(path_list, im_size=(224, 224)):
"""
Given the list of paths where the images are stored <path_list>,
and the size for image decimation <im_size>, it returns 2 Numpy Arrays
with the images and labels.
Parameters:
path_list (List[String]): The list of paths to the images.
im_size (Tuple): The height and width values.
Returns:
X (ndarray): Images
y (ndarray): Labels
"""
X = []
y = []
# Extract filenames of the datasets we ingest and create a label dictionary
tag2idx = {tag.split(os.path.sep)[-1]: i for i, tag in enumerate(path_list)}
for path in path_list:
for im_file in tqdm(glob(path + "*/*")): # Read all files in path
try:
# os.path.separator is OS agnostic (either '/' or '\'),[-2] to grab folder name
label = im_file.split(os.path.sep)[-2]
im = cv2.imread(im_file, cv2.IMREAD_COLOR)
# By default OpenCV reads with BGR format, convert to RGB
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
# Resize to appropriate dimensions
im = cv2.resize(im, im_size, interpolation=cv2.INTER_AREA)
X.append(im/255.0)
y.append(tag2idx[label]) # Append the label name to y
except Exception as e:
# In case annotations or metadata are found
print(e)
X = np.array(X) # Convert list to numpy array.
y = np.array(y).astype(np.uint8)
return X, y
It extracts the visual features
def get_features(dataset, batch_size=32):
"""
This function uses CLIP to extract the visual features from the images.
"""
all_features = []
all_labels = []
with torch.no_grad():
for images, labels in tqdm(DataLoader(dataset, batch_size=batch_size)):
features = model.encode_image(images.to(device))
all_features.append(features)
all_labels.append(labels)
return torch.cat(all_features).cpu().numpy(), \
torch.cat(all_labels).cpu().numpy()
class WeedIdentificationDataset(Dataset):
def __init__(self, x, y, transforms=None):
super(WeedIdentificationDataset, self).__init__()
self.x = x
self.y = y
self.transforms = transforms
def __len__(self):
return len(self.x)
def __getitem__(self, idx):
images = self.x[idx]
labels = self.y[idx]
if self.transforms:
images = self.transforms(images)
return images, labels
# Eden datasets we will work on
PATH_LIST = [
"Black nightsade-Solanum nigrum-220519-Weed-zz",
"Tomato-Solanum lycopersicum-240519-Healthy-zz"
]
IM_SIZE = (224, 224)
i = 0
for path in PATH_LIST:
# Define paths in an OS agnostic way.
PATH_LIST[i] = str(
Path(Path.cwd()).parents[0].joinpath("eden_library_datasets").joinpath(path)
)
i += 1
x, y = read_data(PATH_LIST, IM_SIZE)
85%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 105/124 [00:21<00:03, 4.78it/s]
OpenCV(4.8.1) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:25<00:00, 4.87it/s] 45%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 90/201 [00:29<00:36, 3.07it/s]Corrupt JPEG data: 65 extraneous bytes before marker 0xd9 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 174/201 [00:57<00:08, 3.06it/s]
OpenCV(4.8.1) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 201/201 [01:05<00:00, 3.06it/s]
# Using CUDA if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
cuda
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size = 0.25, shuffle=True, stratify=y, random_state=SEED
)
x_train = torch.tensor(x_train, dtype=torch.float32).permute(0, 3, 1, 2)
x_test = torch.tensor(x_test, dtype=torch.float32).permute(0, 3, 1, 2)
naive_preprocess = transforms.Compose([
transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711))
]
)
train = WeedIdentificationDataset(x_train, y_train, naive_preprocess)
test = WeedIdentificationDataset(x_test, y_test, naive_preprocess)
model, preprocess = clip.load("ViT-B/32", device)
train_features, train_labels = get_features(train)
test_features, test_labels = get_features(test)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 34.30it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 41.36it/s]
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Perform logistic regression
classifier = LogisticRegression()
classifier.fit(train_features, train_labels)
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression()
predictions = classifier.predict(test_features)
accuracy_score(test_labels, predictions)
1.0
We are using the t-SNE algorithm to project the original dimensions into a 2 or 3 dimensional space. t-Distributed Stochastic Neighbor Embedding (t-SNE) is a machine learning algorithm used mainly for the visualization of high-dimensional data. It's particularly useful in the fields of data science and machine learning for understanding complex datasets. t-SNE is a form of manifold learning. It works by capturing the local structure of the high-dimensional space and then revealing these structures in a lower-dimensional space, often revealing clusters or groupings in the data. The key feature of t-SNE is its ability to preserve local structures and relationships between points. Similar data points in the high-dimensional space will be close to each other in the low-dimensional space. The resulting visualization often shows distinct clusters, which can be extremely useful for exploratory data analysis, identifying patterns, or even for communicating findings.
tsne = TSNE(n_components = 3, random_state=0)
projections = tsne.fit_transform(train_features)
/home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:800: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2. warnings.warn( /home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:810: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2. warnings.warn(
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(projections[:,0], projections[:,1], projections[:,2],
c=train_labels, cmap='viridis', linewidth=0.5);
tsne = TSNE(n_components = 2, random_state=0)
projections = tsne.fit_transform(train_features)
/home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:800: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2. warnings.warn( /home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:810: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2. warnings.warn(
fig = plt.figure()
plt.scatter(projections[:,0], projections[:,1],
c=train_labels, cmap='viridis', linewidth=0.5);
plt.show()
x_train_flattened = x_train.permute(0, 2, 3, 1).reshape(x_train.shape[0],
x_train.shape[1] * x_train.shape[2] * x_train.shape[3])
tsne = TSNE(n_components = 3, random_state=0)
projections = tsne.fit_transform(x_train_flattened)
/home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:800: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2. warnings.warn( /home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:810: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2. warnings.warn(
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(projections[:,0], projections[:,1], projections[:,2],
c=train_labels, cmap='viridis', linewidth=0.5);
tsne = TSNE(n_components = 2, random_state=0)
projections = tsne.fit_transform(x_train_flattened)
/home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:800: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2. warnings.warn( /home/borjaeg/anaconda3/envs/paper_pytorch_transformers/lib/python3.9/site-packages/sklearn/manifold/_t_sne.py:810: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2. warnings.warn(
fig = plt.figure()
plt.scatter(projections[:,0], projections[:,1],
c=train_labels, cmap='viridis', linewidth=0.5);
plt.show()
The use of CLIP and related models for feature extraction should be explored to boost performances on downstream tasks such as image classification or object detection.