The HybridRouter
in the Semantic Router library can improve making performance particularly for niche use-cases that contain specific terminology, such as finance or medical.
It helps us provide more importance to making based on the keywords contained in our utterances and user queries.
We start by installing the library:
#!pip install -qU semantic-router==0.1.0
We start by defining a dictionary mapping s to example phrases that should trigger those s.
from semantic_router.route import Route
politics = Route(
name="politics",
utterances=[
"isn't politics the best thing ever",
"why don't you tell me about your political opinions",
"don't you just love the president",
"don't you just hate the president",
"they're going to destroy this country!",
"they will save the country!",
],
)
/Users/jamesbriggs/Library/Caches/pypoetry/virtualenvs/semantic-router-C1zr4a78-py3.12/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Let's define another for good measure:
chitchat = Route(
name="chitchat",
utterances=[
"how's the weather today?",
"how are things going?",
"lovely weather today",
"the weather is horrendous",
"let's go to the chippy",
],
)
routes = [politics, chitchat]
Now we initialize our embedding models, we use a dense encoder from OpenAI and a sparse encoder from Aurelio. The AurelioSparseEncoder
we use here provides a remote sparse encoder that can significantly improve routing accuracy when combined with dense embeddings.
Semantic Router supports other local sparse encoders like TfidfEncoder
or BM25Encoder
. Compared to these, the AurelioSparseEncoder
:
We initialize both like so:
import os
from semantic_router.encoders import OpenAIEncoder, AurelioSparseEncoder
from getpass import getpass
# get OpenAI API key from https://platform.openai.com/
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
"Enter OpenAI API Key: "
)
dense_encoder = OpenAIEncoder(name="text-embedding-3-small", score_threshold=0.3)
# get Aurelio API key from https://platform.aurelio.ai
# use "SRHYBRIDROUTER" for free credits
os.environ["AURELIO_API_KEY"] = os.getenv("AURELIO_API_KEY") or getpass(
"Enter Aurelio API Key: "
)
# Using Aurelio's BM25 sparse encoder
sparse_encoder = AurelioSparseEncoder(name="bm25")
Now we define the RouteLayer
. When called, the route layer will consume text (a query) and output the category (Route
) it belongs to — to initialize a RouteLayer
we need our encoder
model and a list of routes
.
from semantic_router.routers import HybridRouter
router = HybridRouter(
encoder=dense_encoder,
sparse_encoder=sparse_encoder,
routes=routes,
alpha=0.5, # Balance between dense (0) and sparse (1) embeddings
)
2024-11-24 00:37:14 INFO semantic_router.utils.logger Downloading and initializing default sBM25 model parameters. 2024-11-24 00:37:22 INFO semantic_router.utils.logger Encoding route politics 2024-11-24 00:37:23 INFO semantic_router.utils.logger Encoding route chitchat
router("don't you love politics?")
RouteChoice(name='politics', function_call=None, similarity_score=1.1909239963848142)
router("how's the weather today?")
RouteChoice(name='chitchat', function_call=None, similarity_score=2.0)