The Semantic Router library can also be used for detection of specific images or videos, for example the detection of Not Shrek For Work (NSFW) and Shrek For Work (SFW) images as we will demonstrate in this walkthrough.
We start by installing the library:
!pip install -qU \
"semantic-router[vision]" \
datasets==2.17.0
!pip install datasets
WARNING: Ignoring invalid distribution ~illow (C:\Users\Siraj\Documents\Personal\Work\Aurelio\Virtual Environments\semantic_router_3\Lib\site-packages) WARNING: Ignoring invalid distribution ~illow (C:\Users\Siraj\Documents\Personal\Work\Aurelio\Virtual Environments\semantic_router_3\Lib\site-packages) WARNING: Ignoring invalid distribution ~illow (C:\Users\Siraj\Documents\Personal\Work\Aurelio\Virtual Environments\semantic_router_3\Lib\site-packages) [notice] A new release of pip is available: 23.1.2 -> 24.0 [notice] To update, run: python.exe -m pip install --upgrade pip
Requirement already satisfied: datasets in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (2.17.0) Requirement already satisfied: filelock in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (3.14.0) Requirement already satisfied: numpy>=1.17 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (1.26.4) Requirement already satisfied: pyarrow>=12.0.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (16.0.0) Requirement already satisfied: pyarrow-hotfix in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (0.6) Requirement already satisfied: dill<0.3.9,>=0.3.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (0.3.8) Requirement already satisfied: pandas in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (2.2.2) Requirement already satisfied: requests>=2.19.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (2.31.0) Requirement already satisfied: tqdm>=4.62.1 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (4.66.2) Requirement already satisfied: xxhash in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (3.4.1) Requirement already satisfied: multiprocess in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (0.70.16) Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (2023.10.0) Requirement already satisfied: aiohttp in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (3.9.4) Requirement already satisfied: huggingface-hub>=0.19.4 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (0.20.3) Requirement already satisfied: packaging in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (24.0) Requirement already satisfied: pyyaml>=5.1 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from datasets) (6.0.1) Requirement already satisfied: aiosignal>=1.1.2 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from aiohttp->datasets) (23.2.0) Requirement already satisfied: frozenlist>=1.1.1 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from aiohttp->datasets) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from aiohttp->datasets) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from aiohttp->datasets) (1.9.4) Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from huggingface-hub>=0.19.4->datasets) (4.11.0) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from requests>=2.19.0->datasets) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from requests>=2.19.0->datasets) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from requests>=2.19.0->datasets) (2.2.1) Requirement already satisfied: certifi>=2017.4.17 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from requests>=2.19.0->datasets) (2024.2.2) Requirement already satisfied: colorama in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from tqdm>=4.62.1->datasets) (0.4.6) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from pandas->datasets) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from pandas->datasets) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from pandas->datasets) (2024.1) Requirement already satisfied: six>=1.5 in c:\users\siraj\documents\personal\work\aurelio\virtual environments\semantic_router_3\lib\site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.16.0)
[notice] A new release of pip is available: 23.1.2 -> 24.0 [notice] To update, run: python.exe -m pip install --upgrade pip
We start by downloading a multi-modal dataset, we'll be using the aurelio-ai/shrek-detection
dataset from Hugging Face.
from datasets import load_dataset
data = load_dataset("aurelio-ai/shrek-detection", split="train", trust_remote_code=True)
data[3]["image"]
c:\Users\Siraj\Documents\Personal\Work\Aurelio\Virtual Environments\semantic_router_3\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
We will grab the images that are labeled with is_shrek
:
shrek_pics = [d["image"] for d in data if d["is_shrek"]]
not_shrek_pics = [d["image"] for d in data if not d["is_shrek"]]
print(f"We have {len(shrek_pics)} shrek pics, and {len(not_shrek_pics)} not shrek pics")
We have 5 shrek pics, and 20 not shrek pics
We start by defining a dictionary mapping routes to example phrases that should trigger those routes.
from semantic_router import Route
shrek = Route(
name="shrek",
utterances=shrek_pics,
)
Let's define another for good measure:
not_shrek = Route(
name="not_shrek",
utterances=not_shrek_pics,
)
routes = [shrek, not_shrek]
Now we initialize our embedding model:
from semantic_router.encoders.clip import CLIPEncoder
encoder = CLIPEncoder()
Now we define the RouteLayer
. When called, the route layer will consume text (a query) and output the category (Route
) it belongs to — to initialize a RouteLayer
we need our encoder
model and a list of routes
.
from semantic_router.layer import RouteLayer
rl = RouteLayer(encoder=encoder, routes=routes)
2024-05-07 15:57:45 INFO semantic_router.utils.logger local
Now we can test it with text to see if we hit the routes that we defined with images:
rl("don't you love politics?")
RouteChoice(name=None, function_call=None, similarity_score=None)
rl("shrek")
RouteChoice(name='shrek', function_call=None, similarity_score=None)
rl("dwayne the rock johnson")
RouteChoice(name='not_shrek', function_call=None, similarity_score=None)
Everything is being classified accurately, let's pull in some images that we haven't seen before and see if we can classify them as NSFW or SFW.
test_data = load_dataset(
"aurelio-ai/shrek-detection", split="test", trust_remote_code=True
)
test_data
Dataset({ features: ['text', 'image', 'is_shrek'], num_rows: 11 })
test_data[0]["image"]
rl(test_data[0]["image"]).name
'shrek'
test_data[1]["image"]
rl(test_data[1]["image"]).name
'shrek'
test_data[2]["image"]