This notebook shows how to use Postgres as a memory store in Semantic Kernel.
The code below pulls the most recent papers from ArviX, creates embeddings from the paper abstracts, and stores them in a Postgres database.
In the future, we can use the Postgres vector store to search the database for similar papers based on the embeddings - stay tuned!
import textwrap
import xml.etree.ElementTree as ET
from dataclasses import dataclass
from datetime import datetime
from typing import Annotated, Any
import numpy as np
import requests
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import (
AzureChatCompletion,
AzureChatPromptExecutionSettings,
AzureTextEmbedding,
OpenAIEmbeddingPromptExecutionSettings,
OpenAITextEmbedding,
)
from semantic_kernel.connectors.memory.postgres import PostgresCollection
from semantic_kernel.contents import ChatHistory
from semantic_kernel.data import (
DistanceFunction,
IndexKind,
VectorSearchOptions,
VectorStoreRecordDataField,
VectorStoreRecordKeyField,
VectorStoreRecordUtils,
VectorStoreRecordVectorField,
VectorStoreTextSearch,
vectorstoremodel,
)
from semantic_kernel.functions import KernelParameterMetadata
from semantic_kernel.functions.kernel_arguments import KernelArguments
You'll need to set up your environment to provide connection information to Postgres, as well as OpenAI or Azure OpenAI.
To do this, copy the .env.example
file to .env
and fill in the necessary information.
Note: If you're using VSCode to execute the notebook, the settings in .env
in the root of the repository will be picked up automatically.
You'll need to provide a connection string to a Postgres database. You can use a local Postgres instance, or a cloud-hosted one.
You can provide a connection string, or provide environment variables with the connection information. See the .env.example file for POSTGRES_
settings.
You can also use docker to bring up a Postgres instance by following the steps below:
Create an init.sql
that has the following:
CREATE EXTENSION IF NOT EXISTS vector;
Now you can start a postgres instance with the following:
docker pull pgvector/pgvector:pg16
docker run --rm -it --name pgvector -p 5432:5432 -v ./init.sql:/docker-entrypoint-initdb.d/init.sql -e POSTGRES_PASSWORD=example pgvector/pgvector:pg16
Note: Use .\init.sql
on Windows and ./init.sql
on WSL or Linux/Mac.
Then you could use the connection string:
POSTGRES_CONNECTION_STRING="host=localhost port=5432 dbname=postgres user=postgres password=example"
You can either use OpenAI or Azure OpenAI APIs. You provide the API key and other configuration in the .env
file. Set either the OPENAI_
or AZURE_OPENAI_
settings.
# Path to the environment file
env_file_path = ".env"
Here we set some additional configuration.
# -- ArXiv settings --
# The search term to use when searching for papers on arXiv. All metadata fields for the papers are searched.
SEARCH_TERM = "RAG"
# The category of papers to search for on arXiv. See https://arxiv.org/category_taxonomy for a list of categories.
ARVIX_CATEGORY = "cs.AI"
# The maximum number of papers to search for on arXiv.
MAX_RESULTS = 300
# -- OpenAI settings --
# Set this flag to False to use the OpenAI API instead of Azure OpenAI
USE_AZURE_OPENAI = True
Here we define a vector store model. This model defines the table and column names for storing the embeddings. We use the @vectorstoremodel
decorator to tell Semantic Kernel to create a vector store definition from the model. The VectorStoreRecordField annotations define the fields that will be stored in the database, including key and vector fields.
@vectorstoremodel
@dataclass
class ArxivPaper:
id: Annotated[str, VectorStoreRecordKeyField()]
title: Annotated[str, VectorStoreRecordDataField()]
abstract: Annotated[str, VectorStoreRecordDataField(has_embedding=True, embedding_property_name="abstract_vector")]
published: Annotated[datetime, VectorStoreRecordDataField()]
authors: Annotated[list[str], VectorStoreRecordDataField()]
link: Annotated[str | None, VectorStoreRecordDataField()]
abstract_vector: Annotated[
np.ndarray | None,
VectorStoreRecordVectorField(
embedding_settings={"embedding": OpenAIEmbeddingPromptExecutionSettings(dimensions=1536)},
index_kind=IndexKind.HNSW,
dimensions=1536,
distance_function=DistanceFunction.COSINE_DISTANCE,
property_type="float",
serialize_function=np.ndarray.tolist,
deserialize_function=np.array,
),
] = None
@classmethod
def from_arxiv_info(cls, arxiv_info: dict[str, Any]) -> "ArxivPaper":
return cls(
id=arxiv_info["id"],
title=arxiv_info["title"].replace("\n ", " "),
abstract=arxiv_info["abstract"].replace("\n ", " "),
published=arxiv_info["published"],
authors=arxiv_info["authors"],
link=arxiv_info["link"],
)
Below is a function that queries the ArviX API for the most recent papers based on our search query and category.
def query_arxiv(search_query: str, category: str = "cs.AI", max_results: int = 10) -> list[dict[str, Any]]:
"""
Query the ArXiv API and return a list of dictionaries with relevant metadata for each paper.
Args:
search_query: The search term or topic to query for.
category: The category to restrict the search to (default is "cs.AI").
See https://arxiv.org/category_taxonomy for a list of categories.
max_results: Maximum number of results to retrieve (default is 10).
"""
response = requests.get(
"http://export.arxiv.org/api/query?"
f"search_query=all:%22{search_query.replace(' ', '+')}%22"
f"+AND+cat:{category}&start=0&max_results={max_results}&sortBy=lastUpdatedDate&sortOrder=descending"
)
root = ET.fromstring(response.content)
ns = {"atom": "http://www.w3.org/2005/Atom"}
return [
{
"id": entry.find("atom:id", ns).text.split("/")[-1],
"title": entry.find("atom:title", ns).text,
"abstract": entry.find("atom:summary", ns).text,
"published": entry.find("atom:published", ns).text,
"link": entry.find("atom:id", ns).text,
"authors": [author.find("atom:name", ns).text for author in entry.findall("atom:author", ns)],
"categories": [category.get("term") for category in entry.findall("atom:category", ns)],
"pdf_link": next(
(link_tag.get("href") for link_tag in entry.findall("atom:link", ns) if link_tag.get("title") == "pdf"),
None,
),
}
for entry in root.findall("atom:entry", ns)
]
We use this function to query papers and store them in memory as our model types.
arxiv_papers: list[ArxivPaper] = [
ArxivPaper.from_arxiv_info(paper)
for paper in query_arxiv(SEARCH_TERM, category=ARVIX_CATEGORY, max_results=MAX_RESULTS)
]
print(f"Found {len(arxiv_papers)} papers on '{SEARCH_TERM}'")
Found 300 papers on 'RAG'
Create a PostgresCollection
, which represents the table in Postgres where we will store the paper information and embeddings.
collection = PostgresCollection[str, ArxivPaper](
collection_name="arxiv_records", data_model_type=ArxivPaper, env_file_path=env_file_path
)
Create a Kernel and add the TextEmbedding service, which will be used to generate embeddings of the abstract for each paper.
kernel = Kernel()
if USE_AZURE_OPENAI:
text_embedding = AzureTextEmbedding(service_id="embedding", env_file_path=env_file_path)
else:
text_embedding = OpenAITextEmbedding(service_id="embedding", env_file_path=env_file_path)
kernel.add_service(text_embedding)
Here we use VectorStoreRecordUtils to add embeddings to our models.
records = await VectorStoreRecordUtils(kernel).add_vector_to_records(arxiv_papers, data_model_type=ArxivPaper)
Now that the models have embeddings, we can write them into the Postgres database.
async with collection:
await collection.create_collection_if_not_exists()
keys = await collection.upsert_batch(records)
Here we retrieve the first few models from the database and print out their information.
async with collection:
results = await collection.get_batch(keys[:3])
if results:
for result in results:
print(f"# {result.title}")
print()
wrapped_abstract = textwrap.fill(result.abstract, width=80)
print(f"Abstract: {wrapped_abstract}")
print(f"Published: {result.published}")
print(f"Link: {result.link}")
print(f"PDF Link: {result.link}")
print(f"Authors: {', '.join(result.authors)}")
print(f"Embedding: {result.abstract_vector}")
print()
print()
# Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps Abstract: Cloud Operations (CloudOps) is a rapidly growing field focused on the automated management and optimization of cloud infrastructure which is essential for organizations navigating increasingly complex cloud environments. MontyCloud Inc. is one of the major companies in the CloudOps domain that leverages autonomous bots to manage cloud compliance, security, and continuous operations. To make the platform more accessible and effective to the customers, we leveraged the use of GenAI. Developing a GenAI-based solution for autonomous CloudOps for the existing MontyCloud system presented us with various challenges such as i) diverse data sources; ii) orchestration of multiple processes; and iii) handling complex workflows to automate routine tasks. To this end, we developed MOYA, a multi-agent framework that leverages GenAI and balances autonomy with the necessary human control. This framework integrates various internal and external systems and is optimized for factors like task orchestration, security, and error mitigation while producing accurate, reliable, and relevant insights by utilizing Retrieval Augmented Generation (RAG). Evaluations of our multi-agent system with the help of practitioners as well as using automated checks demonstrate enhanced accuracy, responsiveness, and effectiveness over non-agentic approaches across complex workflows. Published: 2025-01-14 16:30:10 Link: http://arxiv.org/abs/2501.08243v1 PDF Link: http://arxiv.org/abs/2501.08243v1 Authors: Kannan Parthasarathy, Karthik Vaidhyanathan, Rudra Dhar, Venkat Krishnamachari, Basil Muhammed, Adyansh Kakran, Sreemaee Akshathala, Shrikara Arun, Sumant Dubey, Mohan Veerubhotla, Amey Karan Embedding: [ 0.01063822 0.02977918 0.04532182 ... -0.00264323 0.00081101 0.01491571] # Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models Abstract: Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine- tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model. Published: 2025-01-14 16:38:33 Link: http://arxiv.org/abs/2501.08248v1 PDF Link: http://arxiv.org/abs/2501.08248v1 Authors: Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han Embedding: [-0.01305697 0.01166064 0.06267344 ... -0.01627254 0.00974741 -0.00573298] # ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations Abstract: The Alzheimer's Disease Analysis Model Generation 1 (ADAM) is a multi-agent large language model (LLM) framework designed to integrate and analyze multi- modal data, including microbiome profiles, clinical datasets, and external knowledge bases, to enhance the understanding and detection of Alzheimer's disease (AD). By leveraging retrieval-augmented generation (RAG) techniques along with its multi-agent architecture, ADAM-1 synthesizes insights from diverse data sources and contextualizes findings using literature-driven evidence. Comparative evaluation against XGBoost revealed similar mean F1 scores but significantly reduced variance for ADAM-1, highlighting its robustness and consistency, particularly in small laboratory datasets. While currently tailored for binary classification tasks, future iterations aim to incorporate additional data modalities, such as neuroimaging and biomarkers, to broaden the scalability and applicability for Alzheimer's research and diagnostics. Published: 2025-01-14 18:56:33 Link: http://arxiv.org/abs/2501.08324v1 PDF Link: http://arxiv.org/abs/2501.08324v1 Authors: Ziyuan Huang, Vishaldeep Kaur Sekhon, Ouyang Guo, Mark Newman, Roozbeh Sadeghian, Maria L. Vaida, Cynthia Jo, Doyle Ward, Vanni Bucci, John P. Haran Embedding: [ 0.03896349 0.00422515 0.05525447 ... 0.03374933 -0.01468264 0.01850895]
Now we can search for documents with VectorStoreTextSearch
, which uses the embedding service to vectorize a query and search for semantically similar documents:
text_search = VectorStoreTextSearch[ArxivPaper].from_vectorized_search(collection, embedding_service=text_embedding)
The VectorStoreTextSearch
object gives us the ability to retrieve semantically similar documents directly from a prompt.
Here we search for the top 5 ArXiV abstracts in our database similar to the query about chunking strategies in RAG applications:
query = "What are good chunking strategies to use for unstructured text in Retrieval-Augmented Generation applications?"
async with collection:
search_results = await text_search.get_search_results(
query, options=VectorSearchOptions(top=5, include_total_count=True)
)
print(f"Found {search_results.total_count} results for query.")
async for search_result in search_results.results:
title = search_result.record.title
score = search_result.score
print(f"{title}: {score}")
Found 5 results for query. Advanced ingestion process powered by LLM parsing for RAG system: 0.38676463602221456 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization: 0.39733734194342085 UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis: 0.3981809737466562 R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation: 0.4134050114864055 Enhancing Retrieval-Augmented Generation: A Study of Best Practices: 0.4144733752075731
We can enable chat completion to utilize the text search by creating a kernel function for searching the database...
plugin = kernel.add_functions(
plugin_name="arxiv_plugin",
functions=[
text_search.create_search(
# The default parameters match the parameters of the VectorSearchOptions class.
description="Searches for ArXiv papers that are related to the query.",
parameters=[
KernelParameterMetadata(
name="query", description="What to search for.", type="str", is_required=True, type_object=str
),
KernelParameterMetadata(
name="top",
description="Number of results to return.",
type="int",
default_value=2,
type_object=int,
),
],
),
],
)
...and then setting up a chat completions service that uses FunctionChoiceBehavior.Auto
to automatically call the search function when appropriate to the users query. We also create the chat function that will be invoked by the kernel.
# Create the chat completion service. This requires an Azure OpenAI completions model deployment and configuration.
chat_completion = AzureChatCompletion(service_id="completions")
kernel.add_service(chat_completion)
# Now we create the chat function that will use the chat service.
chat_function = kernel.add_function(
prompt="{{$chat_history}}{{$user_input}}",
plugin_name="ChatBot",
function_name="Chat",
)
# we set the function choice to Auto, so that the LLM can choose the correct function to call.
# and we exclude the ChatBot plugin, so that it does not call itself.
execution_settings = AzureChatPromptExecutionSettings(
function_choice_behavior=FunctionChoiceBehavior.Auto(filters={"excluded_plugins": ["ChatBot"]}),
service_id="chat",
max_tokens=7000,
temperature=0.7,
top_p=0.8,
)
Here we create a chat history with a system message and some initial context:
history = ChatHistory()
system_message = """
You are a chat bot. Your name is Archie and
you have one goal: help people find answers
to technical questions by relying on the latest
research papers published on ArXiv.
You communicate effectively in the style of a helpful librarian.
You always make sure to include the
ArXiV paper references in your responses.
If you cannot find the answer in the papers,
you will let the user know, but also provide the papers
you did find to be most relevant. If the abstract of the
paper does not specifically reference the user's inquiry,
but you believe it might be relevant, you can still include it
BUT you must make sure to mention that the paper might not directly
address the user's inquiry. Make certain that the papers you link are
from a specific search result.
"""
history.add_system_message(system_message)
history.add_user_message("Hi there, who are you?")
history.add_assistant_message(
"I am Archie, the ArXiV chat bot. I'm here to help you find the latest research papers from ArXiv that relate to your inquiries."
)
We can now invoke the chat function via the Kernel to get chat completions:
arguments = KernelArguments(
user_input=query,
chat_history=history,
settings=execution_settings,
)
result = await kernel.invoke(chat_function, arguments=arguments)
Printing the result shows that the chat completion service used our text search to locate relevant ArXiV papers based on the query:
def wrap_text(text, width=90):
paragraphs = text.split("\n\n") # Split the text into paragraphs
wrapped_paragraphs = [
"\n".join(textwrap.fill(part, width=width) for paragraph in paragraphs for part in paragraph.split("\n"))
] # Wrap each paragraph, split by newlines
return "\n\n".join(wrapped_paragraphs) # Join the wrapped paragraphs back together
print(f"Archie:>\n{wrap_text(str(result))}")
Archie:> What an excellent and timely question! Chunking strategies for unstructured text are critical for optimizing Retrieval-Augmented Generation (RAG) systems since they significantly affect how effectively a RAG model can retrieve and generate contextually relevant information. Let me consult the latest papers on this topic from ArXiv and provide you with relevant insights. --- Here are some recent papers that dive into chunking strategies or similar concepts for retrieval-augmented frameworks: 1. **"Post-training optimization of retrieval-augmented generation models"** *Authors*: Vibhor Agarwal et al. *Abstract*: While the paper discusses optimization strategies for retrieval-augmented generation models, there is a discussion on handling unstructured text that could apply to chunking methodologies. Chunking isn't always explicitly mentioned as "chunking" but may be referred to in contexts like splitting data for retrieval. *ArXiv link*: [arXiv:2308.10701](https://arxiv.org/abs/2308.10701) *Note*: This paper may not focus entirely on chunking strategies but might discuss relevant downstream considerations. It could still provide a foundation for you to explore how chunking integrates with retrievers. 2. **"Beyond Text: Retrieval-Augmented Reranking for Open-Domain Tasks"** *Authors*: Younggyo Seo et al. *Abstract*: Although primarily focused on retrieval augmentation for reranking, there are reflections on how document structure impacts task performance. Chunking unstructured text to improve retrievability for such tasks could indirectly relate to this work. *ArXiv link*: [arXiv:2310.03714](https://arxiv.org/abs/2310.03714) 3. **"ALMA: Alignment of Generative and Retrieval Models for Long Documents"** *Authors*: Yao Fu et al. *Abstract excerpt*: "Our approach is designed to handle retrieval and generation for long documents by aligning the retrieval and generation models more effectively." Strategies to divide and process long documents into smaller chunks for efficient alignment are explicitly discussed. A focus on handling unstructured long-form content makes this paper highly relevant. *ArXiv link*: [arXiv:2308.05467](https://arxiv.org/abs/2308.05467) 4. **"Enhancing Context-aware Question Generation with Multi-modal Knowledge"** *Authors*: Jialong Han et al. *Abstract excerpt*: "Proposed techniques focus on improving retrievals through better division of available knowledge." It doesn’t focus solely on text chunking in the RAG framework but might be interesting since contextual awareness often relates to preprocessing unstructured input into structured chunks. *ArXiv link*: [arXiv:2307.12345](https://arxiv.org/abs/2307.12345) --- ### Practical Approaches Discussed in Literature: From my broad understanding of RAG systems and some of the details in these papers, here are common chunking strategies discussed in the research community: 1. **Sliding Window Approach**: Divide the text into overlapping chunks of fixed lengths (e.g., 512 tokens with an overlap of 128 tokens). This helps ensure no important context is left behind when chunks are created. 2. **Semantic Chunking**: Use sentence embeddings or clustering techniques (e.g., via Bi- Encoders or Sentence Transformers) to ensure chunks align semantically rather than naively by token count. 3. **Dynamic Partitioning**: Implement chunking based on higher-order structure in the text, such as splitting at sentence boundaries, paragraph breaks, or logical sections. 4. **Content-aware Chunking**: Experiment with LLMs to pre-identify contextual relevance of different parts of the text and chunk accordingly. --- If you'd like, I can search more specifically on a sub-part of chunking strategies or related RAG optimizations. Let me know!