There are many reasons users might choose to roll their own LLMs rather than use a third-party service. Whether it's due to cost, privacy or compliance, Semantic Router supports the use of "local" LLMs through llama.cpp
.
Below is an example of using semantic router which leverages Ollama in order to utilize the OpenHermes LLM.
# !pip install -qU "semantic_router[local]==0.0.28"
from semantic_router.encoders import HuggingFaceEncoder
encoder = HuggingFaceEncoder()
from semantic_router import Route
# we could use this as a guide for our chatbot to avoid political conversations
politics = Route(
name="politics",
utterances=[
"isn't politics the best thing ever",
"why don't you tell me about your political opinions",
"don't you just love the president" "don't you just hate the president",
"they're going to destroy this country!",
"they will save the country!",
],
)
# this could be used as an indicator to our chatbot to switch to a more
# conversational prompt
chitchat = Route(
name="chitchat",
utterances=[
"how's the weather today?",
"how are things going?",
"lovely weather today",
"the weather is horrendous",
"let's go to the chippy",
],
)
# we place both of our decisions together into single list
routes = [politics, chitchat]
from semantic_router.layer import RouteLayer
from semantic_router.llms.ollama import OllamaLLM
llm = OllamaLLM(
llm_name="openhermes"
) # Change llm_name if you want to use a different LLM with dynamic routes.
rl = RouteLayer(encoder=encoder, routes=routes, llm=llm)
rl("don't you love politics?").name
rl("how's the weather today?").name
rl("I'm interested in learning about llama 2").name
Dynamic routes work by associating a function with a route. If the input utterance is similar enough to the utterances of the route, such that route is chosen by the semantic router, then this triggers a secondary process:
The LLM we specified in the RouteLayer
(we specified Ollama, which isn't strictly an LLM, but which defaults to using the OpenHermes
LLM), is then usde to take a function_schema
, and the input utterance, and extract values from the input utterance which can be used as arguments for function
described by the the funcion_schema
. The returned values can then be used in the function
to obtain an output.
So, in short, it's a way of generating function
inputs from an utterance, if that utterance matches the route utterances closely enough.
In the below example the utterance "what is the time in new york city?" is used to trigger the "get_time" route, which has the function_schema
of a likewise named get_time()
function associated with it. Then Ollama is used to run OpenHermes
locally, which extracts the correctly formatted IANA timezone ("America/New York"
), based on this utterance and information we provide it about the function
in the function_schema
. The returned stirng "America/New York" can then be used directly in the get_time()
function to return the actual time in New York city.
from datetime import datetime
from zoneinfo import ZoneInfo
def get_time(timezone: str) -> str:
"""
Finds the current time in a specific timezone.
:param timezone: The timezone to find the current time in, should
be a valid timezone from the IANA Time Zone Database like
"America/New_York" or "Europe/London". Do NOT put the place
name itself like "rome", or "new york", you must provide
the IANA format.
:type timezone: str
:return: The current time in the specified timezone.
"""
now = datetime.now(ZoneInfo(timezone))
return now.strftime("%H:%M")
get_time("America/New_York")
from semantic_router.utils.function_call import get_schema
schema = get_schema(get_time)
schema
time_route = Route(
name="get_time",
utterances=[
"what is the time in new york city?",
"what is the time in london?",
"I live in Rome, what time is it?",
],
function_schemas=[schema],
)
rl.add(time_route)
out = rl("what is the time in new york city?")
print(out)
get_time(**out.function_call)