Notebook

Updated Document Layout Analysis¶

Although the previous results are not bad, we might want to train for more epochs and train on YOLOv8s for better accuracy.

Package installation¶

In [ ]:

%pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.0.133-py3-none-any.whl (627 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 628.0/628.0 kB 12.2 MB/s eta 0:00:00
Requirement already satisfied: matplotlib>=3.2.2 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (3.7.1)
Requirement already satisfied: opencv-python>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (4.7.0.72)
Requirement already satisfied: Pillow>=7.1.2 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (8.4.0)
Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (6.0)
Requirement already satisfied: requests>=2.23.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (2.27.1)
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (1.10.1)
Requirement already satisfied: torch>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (2.0.1+cu118)
Requirement already satisfied: torchvision>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (0.15.2+cu118)
Requirement already satisfied: tqdm>=4.64.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (4.65.0)
Requirement already satisfied: pandas>=1.1.4 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (1.5.3)
Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (0.12.2)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from ultralytics) (5.9.5)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (4.40.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.4.4)
Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (23.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (3.1.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->ultralytics) (2022.7.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (3.4)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (4.7.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7.0->ultralytics) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7.0->ultralytics) (16.0.6)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib>=3.2.2->ultralytics) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.7.0->ultralytics) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.7.0->ultralytics) (1.3.0)
Installing collected packages: ultralytics
Successfully installed ultralytics-8.0.133

In [ ]:

import os # for handling the directory
from google.colab import drive # to access the drive

Mount to Drive folder

In [ ]:

drive.mount('/content/drive')
# Pointing the directory to the shared project folder
os.chdir('/content/drive/MyDrive/DLA_project/')
cwd = os.getcwd() # cwd = current working directory**

Mounted at /content/drive

Download pretrained YOLOv8s model and save to drive folder

In [ ]:

!wget -P /content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt

--2023-07-06 02:40:14--  https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/521807533/404b29b7-e374-406c-ab38-7d0796e5b627?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230706%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230706T024014Z&X-Amz-Expires=300&X-Amz-Signature=8fdd6a2595800291eb39283608399a50bbe1e115660e2fdcea5003bd6dc87c99&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=521807533&response-content-disposition=attachment%3B%20filename%3Dyolov8s.pt&response-content-type=application%2Foctet-stream [following]
--2023-07-06 02:40:14--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/521807533/404b29b7-e374-406c-ab38-7d0796e5b627?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230706%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230706T024014Z&X-Amz-Expires=300&X-Amz-Signature=8fdd6a2595800291eb39283608399a50bbe1e115660e2fdcea5003bd6dc87c99&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=521807533&response-content-disposition=attachment%3B%20filename%3Dyolov8s.pt&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22573363 (22M) [application/octet-stream]
Saving to: ‘/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt’

yolov8s.pt          100%[===================>]  21.53M  43.9MB/s    in 0.5s    

2023-07-06 02:40:14 (43.9 MB/s) - ‘/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt’ saved [22573363/22573363]

Train custom dataset¶

Note: Nake sure to have the "DLA project" folder saved as a shortcut under "MyDrive"

Get path to labels and images folder

In [ ]:

dataset_folder = os.path.join(cwd, 'datasets/doclaynet_base') # Base dataset: 6910 train, 648 val, 499 test

Out[ ]:

"\n@article{doclaynet2022,\n  title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},\n  doi = {10.1145/3534678.353904},\n  url = {https://doi.org/10.1145/3534678.3539043},\n  author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},\n  year = {2022},\n  isbn = {9781450393850},\n  publisher = {Association for Computing Machinery},\n  address = {New York, NY, USA},\n  booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},\n  pages = {3743–3751},\n  numpages = {9},\n  location = {Washington DC, USA},\n  series = {KDD '22}\n}\n"

Training¶

Check available GPUs

In [ ]:

!nvidia-smi

Mon Jul 10 08:30:57 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In [ ]:

import torch
torch.cuda.is_available()

Out[ ]:

True

Start training

In [ ]:

!yolo task=detect mode=train model=/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt data=/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/datasets/doclaynet.yaml epochs=100 imgsz=640 workers=8 batch=32 device=0

Inference¶

In [ ]:

!yolo task=detect mode=predict model=runs/detect/train2/weights/best.pt conf=0.5 source='test_images(2)/*.png'

Ultralytics YOLOv8.0.132 🚀 Python-3.10.12 torch-2.0.1+cu118 CPU
Model summary (fused): 168 layers, 11129841 parameters, 0 gradients

image 1/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.40.58 AM.png: 640x480 6 List-items, 2 Section-headers, 1 Table, 4 Texts, 456.7ms
image 2/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.08 AM.png: 640x480 38 List-items, 3 Texts, 423.2ms
image 3/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.28 AM.png: 640x480 1 Section-header, 4 Texts, 13 Titles, 667.6ms
image 4/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.39 AM.png: 640x512 1 Table, 5 Texts, 3 Titles, 740.5ms
image 5/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.52 AM.png: 640x512 1 Section-header, 6 Texts, 3 Titles, 439.8ms
image 6/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.07 AM.png: 640x512 6 Section-headers, 10 Texts, 411.6ms
image 7/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.19 AM.png: 640x544 7 Section-headers, 7 Texts, 438.5ms
image 8/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.40 AM.png: 640x544 6 List-items, 4 Section-headers, 4 Texts, 2 Titles, 432.9ms
image 9/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.50 AM.png: 640x512 2 List-items, 5 Section-headers, 8 Texts, 418.6ms
image 10/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.43.03 AM.png: 640x512 1 List-item, 3 Section-headers, 5 Texts, 2 Titles, 414.8ms
Speed: 3.8ms preprocess, 484.4ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 512)
Results saved to runs/detect/predict3

Show an example:

In [ ]:

from PIL import Image
example = Image.open(cwd + '/runs/detect/predict3/Screenshot 2023-07-11 at 8.40.58 AM.png')
example.show()

Evaluate¶

In [ ]:

!yolo task=detect mode=val model=runs/detect/train2/weights/best.pt name=yolov8_updated_eval data=ultralytics/ultralytics/datasets/doclaynet.yaml imgsz=640

Ultralytics YOLOv8.0.133 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)
Model summary (fused): 168 layers, 11129841 parameters, 0 gradients
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 17.2MB/s]
val: Scanning /content/drive/MyDrive/DLA_project/datasets/doclaynet_base/val/labels.cache... 648 images, 4 backgrounds, 1 corrupt: 100% 648/648 [00:00<?, ?it/s]
val: WARNING ⚠️ /content/drive/MyDrive/DLA_project/datasets/doclaynet_base/val/images/3507c388db887b90376c6325ed6221dc522720c3095d70842b33283b9003b9f3.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [     1.0014]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 41/41 [00:38<00:00,  1.06it/s]
                   all        647       9796      0.835      0.763      0.816      0.637
               Caption        647        214      0.897      0.815      0.875      0.761
              Footnote        647         48      0.745      0.365      0.392      0.279
               Formula        647        178       0.87       0.68      0.755      0.566
             List-item        647       1171       0.78      0.884      0.868      0.746
           Page-footer        647        541      0.923      0.893      0.937      0.539
           Page-header        647        635      0.941      0.699      0.905      0.607
               Picture        647        160      0.739      0.781      0.775      0.719
        Section-header        647       1707      0.869      0.831        0.9      0.587
                 Table        647        236      0.789      0.775      0.829      0.747
                  Text        647       4864      0.894      0.905      0.941      0.807
                 Title        647         42      0.737      0.762      0.806      0.649
Speed: 0.8ms preprocess, 9.8ms inference, 0.0ms loss, 2.6ms postprocess per image
Results saved to runs/detect/yolov8_updated_eval

Evaluation Metrics Breakdown¶

All metrics

In [ ]:

from PIL import Image
results = Image.open(cwd + '/runs/detect/train2/results.png')
results.show()

For result evolution over 100 epochs, see this link

Box loss: measures how tight the predicting bounding boxes are to the ground truth boxes.

Class loss: measures the accuracy of the model's classification of whether a predicted box contains an object or "background"

We can see from the graph and actual result data that val loss w.r.t box loss, class loss and dfl loss remains high, indicating areas of improvement for this model.

As a result, we could improve by training on a more robust dataset and training for more epochs (can start with 300), so that over time, train and val loss could converge to nearly the same.

Considering the precision-recall curve

In [ ]:

precision_recall_curve = Image.open(cwd + '/runs/detect/yolov8_updated_eval/PR_curve.png')
precision_recall_curve.show()

Considering mAP metrics:

Better at detecting Page-footers, Page-headers, Tables, Text, Caption
Need improvement: Footnote (may be a scale issue-could resolve by increasing the training size)
However, for the general task at hand, we might prioritize recall over precision.

In such case, we might improve the model by increasing image's size when training (but would also need to consider processing time as a trade-off)

Next step¶

Consider training a larger dataset so that each class has comparable exposure to the learning model

"Images per class. ≥1.5k images per class"

"Instances per class. ≥10k instances (labeled objects) per class total"

Consider adding background images to dataset to reduce false positives

"Background images. Background images are images with no objects that are added to a dataset to reduce False Positives (FP). We recommend about 0-10% background images to help reduce FPs (COCO has 1000 background images for reference, 1% of the total)."

Take into account devices this application will be used on the select the most suitable pretrained model

Larger models like YOLOv5x will produce better results in nearly all cases, but have more parameters and are slower to run. For mobile applications we recommend YOLOv5s/m, for cloud or desktop applications we recommend YOLOv5l/x.