在这个笔记本中,我们分享了如何为您的LLM应用程序实现防护栏的示例。防护栏是一个泛指,用于指代旨在引导您的应用程序的侦探控制。鉴于LLMs固有的随机性,更大的可控性是一个常见要求,因此在将LLM从原型推向生产时,创建有效的防护栏已成为性能优化的最常见领域之一。
防护栏的种类非常多样,几乎可以部署到您能想象到的任何可能出现LLMs问题的情境中。本笔记旨在提供简单的示例,可以扩展以满足您独特的用例,同时概述在决定是否实施防护栏以及如何实施时需要考虑的权衡。
本笔记将重点介绍以下内容:
注意: 本笔记将防护栏视为围绕LLM的侦探控制的泛指 - 对于提供预构建防护栏框架分发的官方库,请查看以下内容:
import openai
GPT_MODEL = 'gpt-3.5-turbo'
输入防护措施旨在防止不适当的内容首先传递到LLM中 - 一些常见的用例包括:
在所有这些情况下,它们都充当预防性控制,运行在LLM之前或与LLM并行,并在满足这些标准之一时触发您的应用程序以采取不同的行为。
在设计防护栏时,重要的是要考虑准确性、延迟和成本之间的权衡,您要尽量在对您的底线和用户体验影响最小的情况下实现最大准确性。
我们将从一个简单的主题防护栏开始,旨在检测与主题无关的问题,并在触发时阻止LLM回答。这个防护栏由一个简单的提示组成,使用gpt-3.5-turbo
,在准确性上最大化延迟/成本,但如果我们想进一步优化,我们可以考虑:
babbage-002
或开源产品,如Llama,在提供足够的训练示例时可以表现得相当不错。当使用开源产品时,您还可以调整用于推断的机器,以最大化成本或减少延迟。这个简单的防护栏旨在确保LLM只回答预定义的一组主题,并对超出范围的查询以固定消息做出响应。
为了最小化延迟,一种常见的设计是将您的防护栏与主要的LLM调用一起异步发送。如果您的防护栏被触发,您将返回它们的响应,否则返回LLM的响应。
我们将采用这种方法,创建一个execute_chat_with_guardrails
函数,该函数将并行运行我们的LLM的get_chat_response
和topical_guardrail
防护栏,并仅在防护栏返回allowed
时返回LLM的响应。
在开发设计时,您应始终考虑防护栏的限制。一些需要注意的关键限制包括:
如果您可以将防护栏与基于规则或更传统的机器学习模型结合起来进行检测,这可以缓解一些风险。我们还看到一些客户只考虑最新消息的防护栏,以减轻模型被长对话混淆的风险。
我们还建议逐步推出,并积极监控对话,以便发现提示注入或越狱的情况,并添加更多防护栏以覆盖这些新类型的行为,或将它们包含为现有防护栏的训练示例。
system_prompt = "You are a helpful assistant."
bad_request = "I want to talk about horses"
good_request = "What are the best breeds of dog for people that like cats?"
import asyncio
async def get_chat_response(user_request):
print("Getting LLM response")
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_request},
]
response = openai.chat.completions.create(
model=GPT_MODEL, messages=messages, temperature=0.5
)
print("Got LLM response")
return response.choices[0].message.content
async def topical_guardrail(user_request):
print("Checking topical guardrail")
messages = [
{
"role": "system",
"content": "Your role is to assess whether the user question is allowed or not. The allowed topics are cats and dogs. If the topic is allowed, say 'allowed' otherwise say 'not_allowed'",
},
{"role": "user", "content": user_request},
]
response = openai.chat.completions.create(
model=GPT_MODEL, messages=messages, temperature=0
)
print("Got guardrail response")
return response.choices[0].message.content
async def execute_chat_with_guardrail(user_request):
topical_guardrail_task = asyncio.create_task(topical_guardrail(user_request))
chat_task = asyncio.create_task(get_chat_response(user_request))
while True:
done, _ = await asyncio.wait(
[topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED
)
if topical_guardrail_task in done:
guardrail_response = topical_guardrail_task.result()
if guardrail_response == "not_allowed":
chat_task.cancel()
print("Topical guardrail triggered")
return "I can only talk about cats and dogs, the best animals that ever lived."
elif chat_task in done:
chat_response = chat_task.result()
return chat_response
else:
await asyncio.sleep(0.1) # 在再次检查任务之前先休息一会儿
# 使用正确的请求调用主函数 - 这应该会成功。
response = await execute_chat_with_guardrail(good_request)
print(response)
Checking topical guardrail Got guardrail response Getting LLM response Got LLM response If you're a cat lover considering getting a dog, it's important to choose a breed that typically has a more cat-like temperament. Here are some dog breeds that are known to be more cat-friendly: 1. Basenji: Known as the "barkless dog," Basenjis are independent, clean, and have a cat-like grooming habit. 2. Shiba Inu: Shiba Inus are often described as having a cat-like personality. They are independent, clean, and tend to be reserved with strangers. 3. Greyhound: Greyhounds are quiet, low-energy dogs that enjoy lounging around, much like cats. They are also known for their gentle and calm nature. 4. Bichon Frise: Bichon Frises are small, friendly dogs that are often compared to cats due to their playful and curious nature. They are also hypoallergenic, making them a good choice for those with allergies. 5. Cavalier King Charles Spaniel: These dogs are affectionate, gentle, and adaptable, making them a good match for cat lovers. They are known for their desire to be close to their owners and their calm demeanor. Remember, individual dogs can have different personalities, so it's important to spend time with the specific dog you're considering to see if their temperament aligns with your preferences.
# 使用正确的请求调用主函数 - 这应该会被阻止
response = await execute_chat_with_guardrail(bad_request)
print(response)
Checking topical guardrail Got guardrail response Getting LLM response Got LLM response Topical guardrail triggered I can only talk about cats and dogs, the best animals that ever lived.
看起来我们的护栏起作用了 - 第一个问题被允许通过,但第二个问题因为与主题无关而被阻止了。现在我们将扩展这个概念,以便对从LLM得到的回应进行调节。
输出保护措施规定了LLM返回的内容。这些措施可以采取多种形式,其中一些最常见的包括:
function_call
时,arguments
中返回了预期的模式。在这里,我们实现了一个使用G-Eval评估方法的版本来评分LLM响应中不良内容存在的审查保护措施。这种方法在我们的其他笔记本中有更详细的演示。
为了实现这一点,我们将建立一个可扩展的内容审查框架,它接受一个domain
并对一段content
应用criteria
,使用一组steps
:
我们的输出保护措施将评估LLM的响应,并阻止任何得分为3或更高的内容。设置此阈值是优化的一个常见领域 - 我们建议建立一个评估集,并使用混淆矩阵对结果进行评分,以设置您的保护措施的正确容忍度。这里的权衡通常是:
例如,对于越狱,您可能希望设置一个非常低的阈值,因为如果您的LLM被劫持并用于生成危险内容最终出现在社交媒体上,那么对您的业务的风险就非常高。然而,对于我们的用例,我们愿意接受一些假阴性,因为最糟糕的情况可能是有人最终拥有了一只可能更适合拉布拉多犬而不是比熊犬的狗,尽管这很悲伤,但可能不会给我们的业务造成长期损害(我们希望如此)。
domain = "animal breed recommendation"
animal_advice_criteria = """
评估内容中是否存在对特定猫或狗品种的明确推荐。
内容应仅包含关于猫和狗的一般建议,而非购买特定品种的指导。"""
animal_advice_steps = """
1. Read the content and the criteria carefully.
2. Assess how much explicit recommendation of cat or dog breeds is contained in the content.
3. Assign an animal advice score from 1 to 5, with 1 being no explicit cat or dog breed advice, and 5 being multiple named cat or dog breeds.
"""
moderation_system_prompt = """
You are a moderation assistant. Your role is to detect content about {domain} in the text provided, and mark the severity of that content.
## {domain}
### Criteria
{scoring_criteria}
### Instructions
{scoring_steps}
### Content
{content}
### Evaluation (score only!)
"""
async def moderation_guardrail(chat_response):
print("Checking moderation guardrail")
mod_messages = [
{"role": "user", "content": moderation_system_prompt.format(
domain=domain,
scoring_criteria=animal_advice_criteria,
scoring_steps=animal_advice_steps,
content=chat_response
)},
]
response = openai.chat.completions.create(
model=GPT_MODEL, messages=mod_messages, temperature=0
)
print("Got moderation response")
return response.choices[0].message.content
async def execute_all_guardrails(user_request):
topical_guardrail_task = asyncio.create_task(topical_guardrail(user_request))
chat_task = asyncio.create_task(get_chat_response(user_request))
while True:
done, _ = await asyncio.wait(
[topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED
)
if topical_guardrail_task in done:
guardrail_response = topical_guardrail_task.result()
if guardrail_response == "not_allowed":
chat_task.cancel()
print("Topical guardrail triggered")
return "I can only talk about cats and dogs, the best animals that ever lived."
elif chat_task in done:
chat_response = chat_task.result()
moderation_response = await moderation_guardrail(chat_response)
if int(moderation_response) >= 3:
print(f"Moderation guardrail flagged with a score of {int(moderation_response)}")
return "Sorry, we're not permitted to give animal breed advice. I can help you with any general queries you might have."
else:
print('Passed moderation')
return chat_response
else:
await asyncio.sleep(0.1) # 在再次检查任务之前先休息一会儿
# 添加一个请求,该请求应同时通过我们的主题守卫和我们的内容审核守卫。
great_request = 'What is some advice you can give to a new dog owner?'
tests = [good_request,bad_request,great_request]
for test in tests:
result = await execute_all_guardrails(test)
print(result)
print('\n\n')
Checking topical guardrail Got guardrail response Getting LLM response Got LLM response Checking moderation guardrail Got moderation response Moderation guardrail flagged with a score of 5 Sorry, we're not permitted to give animal breed advice. I can help you with any general queries you might have. Checking topical guardrail Got guardrail response Getting LLM response Got LLM response Topical guardrail triggered I can only talk about cats and dogs, the best animals that ever lived. Checking topical guardrail Got guardrail response Getting LLM response Got LLM response Checking moderation guardrail Got moderation response Passed moderation As a new dog owner, here are some helpful tips: 1. Choose the right breed: Research different dog breeds to find one that suits your lifestyle, activity level, and living situation. Some breeds require more exercise and attention than others. 2. Puppy-proof your home: Make sure your home is safe for your new furry friend. Remove any toxic plants, secure loose wires, and store household chemicals out of reach. 3. Establish a routine: Dogs thrive on routine, so establish a consistent schedule for feeding, exercise, and bathroom breaks. This will help your dog feel secure and reduce any anxiety. 4. Socialize your dog: Expose your dog to different people, animals, and environments from an early age. This will help them become well-adjusted and comfortable in various situations. 5. Train your dog: Basic obedience training is essential for your dog's safety and your peace of mind. Teach commands like sit, stay, and come, and use positive reinforcement techniques such as treats and praise. 6. Provide mental and physical stimulation: Dogs need both mental and physical exercise to stay happy and healthy. Engage in activities like walks, playtime, puzzle toys, and training sessions to keep your dog mentally stimulated. 7. Proper nutrition: Feed your dog a balanced and appropriate diet based on their age, size, and specific needs. Consult with a veterinarian to determine the best food options for your dog. 8. Regular veterinary care: Schedule regular check-ups with a veterinarian to ensure your dog's health and well-being. Vaccinations, parasite prevention, and dental care are important aspects of their overall care. 9. Be patient and consistent: Dogs require time, patience, and consistency to learn and adapt to their new environment. Stay positive, be patient with their training, and provide clear and consistent boundaries. 10. Show love and affection: Dogs are social animals that thrive on love and affection. Spend quality time with your dog, offer praise and cuddles, and make them feel like an important part of your family. Remember, being a responsible dog owner involves commitment, time, and effort. With proper care and attention, you can build a strong bond with your new furry companion.
在LLMs中,Guardrails是一个充满活力和不断发展的主题,我们希望这个笔记本为您提供了关于Guardrails核心概念的有效介绍。总结一下:
我们期待看到您如何推动这一点,以及随着生态系统的成熟,对Guardrails的思考如何发展。