There's more to LLMs than text generation. It's also possible to generate images from text descriptions. Having images as a modality can be highly useful in a number of areas from MedTech, architecture, tourism, game development and more. In this chapter, we will look into the two most popular image generation models, DALL-E and Midjourney.
In this lesson, we will cover:
After completing this lesson, you will be able to:
Image generation applications are a great way to explore the capabilities of Generative AI. They can be used for, for example:
Image editing and synthesis. You can generate images for a variety of use cases, such as image editing and image synthesis.
Applied to a variety of industries. They can also be used to generate images for a variety of industries like Medtech, Tourism, Game development and more.
As part of this lesson, we will continue to work with our startup, Edu4All, in this lesson. The students will create images for their assessments, exactly what images is up to the students, but they could be illustrations for their own fairytale or create a new character for their story or help them visualize their ideas and concepts.
Here's what Edu4All's students could generate for example if they're working in class on monuments:
using a prompt like
"Dog next to Eiffel Tower in early morning sunlight"
DALL-E and Midjourney are two of the most popular image generation models, they allow you to use prompts to generate images.
Let's start with DALL-E, which is a Generative AI model that generates images from text descriptions.
DALL-E is a combination of two models, CLIP and diffused attention.
CLIP, is a model that generates embeddings, which are numerical representations of data, from images and text.
Diffused attention, is a model that generates images from embeddings. DALL-E is trained on a dataset of images and text and can be used to generate images from text descriptions. For example, DALL-E can be used to generate images of a cat in a hat, or a dog with a mohawk.
Midjourney works in a similar way to DALL-E, it generates images from text prompts. Midjourney, can also be used to generate images using prompts like “a cat in a hat”, or a “dog with a mohawk”.
Image cred Wikipedia, image generated by Midjourney
First, DALL-E. DALL-E is a Generative AI model based on the transformer architecture with an autoregressive transformer.
An autoregressive transformer defines how a model generates images from text descriptions, it generates one pixel at a time, and then uses the generated pixels to generate the next pixel. Passing through multiple layers in a neural network, until the image is complete.
With this process, DALL-E, controls attributes, objects, characteristics, and more in the image it generates. However, DALL-E 2 and 3 have more control over the generated image,
So what does it take to build an image generation application? You need the following libraries:
Create a file .env with the following content:
OPENAI_API_KEY='<add your OpenAI key here>'
Collect the above libraries in a file called requirements.txt like so:
python-dotenv
openai
pillow
requests
Next, create virtual environment and install the libraries:
# create virtual env
! python3 -m venv venv
# activate environment
! source venv/bin/activate
# install libraries
# pip install -r requirements.txt, if using a requirements.txt file
! pip install python-dotenv openai pillow requests
[!NOTE] For Windows, use the following commands to create and activate your virtual environment:
```bash
python3 -m venv venv
venv\Scripts\activate.bat
````
Add the following code in file called app.py:
import openai
import os
import requests
from PIL import Image
import dotenv
# import dotenv
dotenv.load_dotenv()
# Create OpenAI object
client = OpenAI()
try:
# Create an image by using the image generation API
generation_response = client.images.generate(
model="dall-e-3",
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=1
)
# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')
# If the directory doesn't exist, create it
if not os.path.isdir(image_dir):
os.mkdir(image_dir)
# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')
# Retrieve the generated image
print(generation_response)
image_url = generation_response.data[0].url # extract image URL from response
generated_image = requests.get(image_url).content # download the image
with open(image_path, "wb") as image_file:
image_file.write(generated_image)
# Display the image in the default image viewer
image = Image.open(image_path)
image.show()
# catch exceptions
except client.error.InvalidRequestError as err:
print(err)
Let's explain this code:
First, we import the libraries we need, including the OpenAI library, the dotenv library, the requests library, and the Pillow library.
import openai
import os
import requests
from PIL import Image
import dotenv
After that, we create the object, which will get the API key from your .env
.
# Create OpenAI object
client = OpenAI()
Next, we generate the image:
# Create an image by using the image generation API
generation_response = client.images.generate(
model="dall-e-3",
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=1
)
The above code responds with a JSON object that contains the URL of the generated image. We can use the URL to download the image and save it to a file.
Lastly, we open the image and use the standard image viewer to display it:
image = Image.open(image_path)
image.show()
Let's look at the code that generates the image in more detail:
generation_response = client.images.generate(
model="dall-e-3",
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=1
)
There are more things you can do with images that we will cover in the next section.
You've seen so far how we were able to generate an image using a few lines in Python. However, there are more things you can do with images.
You can also do the following:
Perform edits. By providing an existing image a mask and a prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done.
response = openai.images.edit(
image=open("base_image.png", "rb"),
mask=open("mask.png", "rb"),
prompt="An image of a rabbit with a hat on its head.",
n=1,
size="1024x1024"
)
image_url = response.data[0].url
The base image would only contain the rabbit but the final image would have the hat on the rabbit.
Create variations. The idea is that you take an existing image and ask that variations are created. To create a variation, you provide an image and a text prompt and code like so:
response = openai.images.create_variation(
image=open("bunny-lollipop.png", "rb"),
n=1,
size="1024x1024"
)
image_url = response.data[0].url