This notebook demonstrates how to use OpenAI DALL-E 3 to generate images, in combination with other LLM features like text and embedding generation.
Here, we use Chat Completion to generate a random image description and DALL-E 3 to create an image from that description, showing the image inline.
Lastly, the notebook asks the user to describe the image. The embedding of the user's description is compared to the original description, using Cosine Similarity, and returning a score from 0 to 1, where 1 means exact match.
// Usual setup: importing Semantic Kernel SDK and SkiaSharp, used to display images inline.
#r "nuget: Microsoft.SemanticKernel, 1.23.0"
#r "nuget: System.Numerics.Tensors, 8.0.0"
#r "nuget: SkiaSharp, 2.88.3"
#!import config/Settings.cs
#!import config/Utils.cs
#!import config/SkiaUtils.cs
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.TextToImage;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using System.Numerics.Tensors;
The notebook uses:
Note:: For Azure OpenAI, your endpoint should have DALL-E API enabled.
using Kernel = Microsoft.SemanticKernel.Kernel;
#pragma warning disable SKEXP0001, SKEXP0010
// Load OpenAI credentials from config/settings.json
var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();
// Configure the three AI features: text embedding (using Ada), chat completion, image generation (DALL-E 3)
var builder = Kernel.CreateBuilder();
if(useAzureOpenAI)
{
builder.AddAzureOpenAITextEmbeddingGeneration("text-embedding-ada-002", azureEndpoint, apiKey);
builder.AddAzureOpenAIChatCompletion(model, azureEndpoint, apiKey);
builder.AddAzureOpenAITextToImage("dall-e-3", azureEndpoint, apiKey);
}
else
{
builder.AddOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey, orgId);
builder.AddOpenAIChatCompletion(model, apiKey, orgId);
builder.AddOpenAITextToImage(apiKey, orgId);
}
var kernel = builder.Build();
// Get AI service instance used to generate images
var dallE = kernel.GetRequiredService<ITextToImageService>();
// Get AI service instance used to extract embedding from a text
var textEmbedding = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
genImgDescription is a Semantic Function used to generate a random image description. The function takes in input a random number to increase the diversity of its output.
The random image description is then given to Dall-E 3 asking to create an image.
#pragma warning disable SKEXP0001
var prompt = @"
Think about an artificial object correlated to number {{$input}}.
Describe the image with one detailed sentence. The description cannot contain numbers.";
var executionSettings = new OpenAIPromptExecutionSettings
{
MaxTokens = 256,
Temperature = 1
};
// Create a semantic function that generate a random image description.
var genImgDescription = kernel.CreateFunctionFromPrompt(prompt, executionSettings);
var random = new Random().Next(0, 200);
var imageDescriptionResult = await kernel.InvokeAsync(genImgDescription, new() { ["input"] = random });
var imageDescription = imageDescriptionResult.ToString();
// Use DALL-E 3 to generate an image. OpenAI in this case returns a URL (though you can ask to return a base64 image)
var imageUrl = await dallE.GenerateImageAsync(imageDescription.Trim(), 1024, 1024);
await SkiaUtils.ShowImage(imageUrl, 1024, 1024);
Try to guess what the image is about, describing the content.
You'll get a score at the end 😉
// Prompt the user to guess what the image is
var guess = await InteractiveKernel.GetInputAsync("Describe the image in your words");
// Compare user guess with real description and calculate score
var origEmbedding = await textEmbedding.GenerateEmbeddingsAsync(new List<string> { imageDescription } );
var guessEmbedding = await textEmbedding.GenerateEmbeddingsAsync(new List<string> { guess } );
var similarity = TensorPrimitives.CosineSimilarity(origEmbedding.First().Span, guessEmbedding.First().Span);
Console.WriteLine($"Your description:\n{Utils.WordWrap(guess, 90)}\n");
Console.WriteLine($"Real description:\n{Utils.WordWrap(imageDescription.Trim(), 90)}\n");
Console.WriteLine($"Score: {similarity:0.00}\n\n");
//Uncomment this line to see the URL provided by OpenAI
//Console.WriteLine(imageUrl);