Notebook

Building Semantic Memory with Embeddings¶

So far, we've mostly been treating the kernel as a stateless orchestration engine. We send text into a model API and receive text out.

In a previous notebook, we used kernel arguments to pass in additional text into prompts to enrich them with more data. This allowed us to create a basic chat experience.

However, if you solely relied on kernel arguments, you would quickly realize that eventually your prompt would grow so large that you would run into the model's token limit. What we need is a way to persist state and build both short-term and long-term memory to empower even more intelligent applications.

To do this, we dive into the key concept of Semantic Memory in the Semantic Kernel.

In [ ]:

#r "nuget: Microsoft.SemanticKernel, 1.11.1"
#r "nuget: Microsoft.SemanticKernel.Plugins.Memory, 1.11.1-alpha"
#r "nuget: System.Linq.Async, 6.0.1"

#!import config/Settings.cs

using Microsoft.SemanticKernel;
using Kernel = Microsoft.SemanticKernel.Kernel;

var builder = Kernel.CreateBuilder();

// Configure AI service credentials used by the kernel
var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

if (useAzureOpenAI)
    builder.AddAzureOpenAIChatCompletion(model, azureEndpoint, apiKey);
else
    builder.AddOpenAIChatCompletion(model, apiKey, orgId);

var kernel = builder.Build();

In order to use memory, we need to instantiate the Memory Plugin with a Memory Storage and an Embedding backend. In this example, we make use of the VolatileMemoryStore which can be thought of as a temporary in-memory storage (not to be confused with Semantic Memory).

This memory is not written to disk and is only available during the app session.

When developing your app you will have the option to plug in persistent storage like Azure Cosmos Db, PostgreSQL, SQLite, etc. Semantic Memory allows also to index external data sources, without duplicating all the information, more on that later.

In [ ]:

using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.OpenAI;

// Memory functionality is experimental
#pragma warning disable SKEXP0001, SKEXP0010, SKEXP0050

var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

var memoryBuilder = new MemoryBuilder();

if (useAzureOpenAI)
{
    memoryBuilder.WithAzureOpenAITextEmbeddingGeneration(
        "text-embedding-ada-002",
        azureEndpoint, 
        apiKey,
        "model-id");
}
else
{
    memoryBuilder.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey);
}

memoryBuilder.WithMemoryStore(new VolatileMemoryStore());

var memory = memoryBuilder.Build();

At its core, Semantic Memory is a set of data structures that allow you to store the meaning of text that come from different data sources, and optionally to store the source text too.

These texts can be from the web, e-mail providers, chats, a database, or from your local directory, and are hooked up to the Semantic Kernel through data source connectors.

The texts are embedded or compressed into a vector of floats representing mathematically the texts' contents and meaning.

You can read more about embeddings here.

Manually adding memories¶

Let's create some initial memories "About Me". We can add memories to our VolatileMemoryStore by using SaveInformationAsync

In [ ]:

const string MemoryCollectionName = "aboutMe";

await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "My name is Andrea");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "I currently work as a tourist operator");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "I currently live in Seattle and have been living there since 2005");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "I visited France and Italy five times since 2015");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "My family is from New York");

Let's try searching the memory:

In [ ]:

var questions = new[]
{
    "what is my name?",
    "where do I live?",
    "where is my family from?",
    "where have I travelled?",
    "what do I do for work?",
};

foreach (var q in questions)
{
    var response = await memory.SearchAsync(MemoryCollectionName, q).FirstOrDefaultAsync();
    Console.WriteLine("Q: " + q);
    Console.WriteLine("A: " + response?.Relevance.ToString() + "\t" + response?.Metadata.Text);
}

Let's now revisit our chat sample from the previous notebook. If you remember, we used kernel arguments to fill the prompt with a history that continuously got populated as we chatted with the bot. Let's add also memory to it!

This is done by using the TextMemoryPlugin which exposes the recall native function.

recall takes an input ask and performs a similarity search on the contents that have been embedded in the Memory Store. By default, recall returns the most relevant memory.

In [ ]:

using Microsoft.SemanticKernel.Plugins.Memory;

#pragma warning disable SKEXP0001, SKEXP0050

// TextMemoryPlugin provides the "recall" function
kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));

In [ ]:

const string skPrompt = @"
ChatBot can have a conversation with you about any topic.
It can give explicit instructions or say 'I don't know' if it does not have an answer.

Information about me, from previous conversations:
- {{$fact1}} {{recall $fact1}}
- {{$fact2}} {{recall $fact2}}
- {{$fact3}} {{recall $fact3}}
- {{$fact4}} {{recall $fact4}}
- {{$fact5}} {{recall $fact5}}

Chat:
{{$history}}
User: {{$userInput}}
ChatBot: ";

var chatFunction = kernel.CreateFunctionFromPrompt(skPrompt, new OpenAIPromptExecutionSettings { MaxTokens = 200, Temperature = 0.8 });

The RelevanceParam is used in memory search and is a measure of the relevance score from 0.0 to 1.0, where 1.0 means a perfect match. We encourage users to experiment with different values.

In [ ]:

#pragma warning disable SKEXP0001, SKEXP0050

var arguments = new KernelArguments();

arguments["fact1"] = "what is my name?";
arguments["fact2"] = "where do I live?";
arguments["fact3"] = "where is my family from?";
arguments["fact4"] = "where have I travelled?";
arguments["fact5"] = "what do I do for work?";

arguments[TextMemoryPlugin.CollectionParam] = MemoryCollectionName;
arguments[TextMemoryPlugin.LimitParam] = "2";
arguments[TextMemoryPlugin.RelevanceParam] = "0.8";

Now that we've included our memories, let's chat!

In [ ]:

var history = "";
arguments["history"] = history;
Func<string, Task> Chat = async (string input) => {
    // Save new message in the kernel arguments
    arguments["userInput"] = input;

    // Process the user message and get an answer
    var answer = await chatFunction.InvokeAsync(kernel, arguments);

    // Append the new interaction to the chat history
    var result = $"\nUser: {input}\nChatBot: {answer}\n";

    history += result;
    arguments["history"] = history;
    
    // Show the bot response
    Console.WriteLine(result);
};

In [ ]:

await Chat("Hello, I think we've met before, remember? my name is...");

In [ ]:

await Chat("I want to plan a trip and visit my family. Do you know where that is?");

In [ ]:

await Chat("Great! What are some fun things to do there?");

Adding documents to your memory¶

Many times in your applications you'll want to bring in external documents into your memory. Let's see how we can do this using our VolatileMemoryStore.

Let's first get some data using some of the links in the Semantic Kernel repo.

In [ ]:

const string memoryCollectionName = "SKGitHub";

var githubFiles = new Dictionary<string, string>()
{
    ["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
        = "README: Installation, getting started, and how to contribute",
    ["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
        = "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function",
    ["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/00-getting-started.ipynb"]
        = "Jupyter notebook describing how to get started with the Semantic Kernel",
    ["https://github.com/microsoft/semantic-kernel/tree/main/samples/plugins/ChatPlugin/ChatGPT"]
        = "Sample demonstrating how to create a chat plugin interfacing with ChatGPT",
    ["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Plugins/Plugins.Memory/VolatileMemoryStore.cs"]
        = "C# class that defines a volatile embedding store",
};

Let's build a new Memory.

In [ ]:

// Memory functionality is experimental
#pragma warning disable SKEXP0001, SKEXP0010, SKEXP0050

var memoryBuilder = new MemoryBuilder();

if (useAzureOpenAI)
{
    memoryBuilder.WithAzureOpenAITextEmbeddingGeneration(
        "text-embedding-ada-002",
        azureEndpoint, 
        apiKey,
        "model-id");
}
else
{
    memoryBuilder.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey);
}

memoryBuilder.WithMemoryStore(new VolatileMemoryStore());

var memory = memoryBuilder.Build();

Now let's add these files to our VolatileMemoryStore using SaveReferenceAsync.

In [ ]:

Console.WriteLine("Adding some GitHub file URLs and their descriptions to a volatile Semantic Memory.");
var i = 0;
foreach (var entry in githubFiles)
{
    await memory.SaveReferenceAsync(
        collection: memoryCollectionName,
        description: entry.Value,
        text: entry.Value,
        externalId: entry.Key,
        externalSourceName: "GitHub"
    );
    Console.WriteLine($"  URL {++i} saved");
}

In [ ]:

string ask = "I love Jupyter notebooks, how should I get started?";
Console.WriteLine("===========================\n" +
                    "Query: " + ask + "\n");

var memories = memory.SearchAsync(memoryCollectionName, ask, limit: 5, minRelevanceScore: 0.77);

i = 0;
await foreach (var memory in memories)
{
    Console.WriteLine($"Result {++i}:");
    Console.WriteLine("  URL:     : " + memory.Metadata.Id);
    Console.WriteLine("  Title    : " + memory.Metadata.Description);
    Console.WriteLine("  Relevance: " + memory.Relevance);
    Console.WriteLine();
}

Now you might be wondering what happens if you have so much data that it doesn't fit into your RAM? That's where you want to make use of an external Vector Database made specifically for storing and retrieving embeddings.

Stay tuned for that!