Although the previous results are not bad, we might want to train for more epochs and train on YOLOv8s for better accuracy.
%pip install ultralytics
Collecting ultralytics Downloading ultralytics-8.0.133-py3-none-any.whl (627 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 628.0/628.0 kB 12.2 MB/s eta 0:00:00 Requirement already satisfied: matplotlib>=3.2.2 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (3.7.1) Requirement already satisfied: opencv-python>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (4.7.0.72) Requirement already satisfied: Pillow>=7.1.2 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (8.4.0) Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (6.0) Requirement already satisfied: requests>=2.23.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (2.27.1) Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (1.10.1) Requirement already satisfied: torch>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (2.0.1+cu118) Requirement already satisfied: torchvision>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (0.15.2+cu118) Requirement already satisfied: tqdm>=4.64.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (4.65.0) Requirement already satisfied: pandas>=1.1.4 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (1.5.3) Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics) (0.12.2) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from ultralytics) (5.9.5) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.1.0) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (4.40.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.4.4) Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (1.22.4) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (23.1) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (3.1.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.2.2->ultralytics) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->ultralytics) (2022.7.1) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (2023.5.7) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (2.0.12) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.23.0->ultralytics) (3.4) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.12.2) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (4.7.1) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (1.11.1) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.1) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (3.1.2) Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7.0->ultralytics) (2.0.0) Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7.0->ultralytics) (3.25.2) Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7.0->ultralytics) (16.0.6) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib>=3.2.2->ultralytics) (1.16.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.7.0->ultralytics) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.7.0->ultralytics) (1.3.0) Installing collected packages: ultralytics Successfully installed ultralytics-8.0.133
import os # for handling the directory
from google.colab import drive # to access the drive
Mount to Drive folder
drive.mount('/content/drive')
# Pointing the directory to the shared project folder
os.chdir('/content/drive/MyDrive/DLA_project/')
cwd = os.getcwd() # cwd = current working directory**
Mounted at /content/drive
Download pretrained YOLOv8s model and save to drive folder
!wget -P /content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt
--2023-07-06 02:40:14-- https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt Resolving github.com (github.com)... 140.82.112.4 Connecting to github.com (github.com)|140.82.112.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/521807533/404b29b7-e374-406c-ab38-7d0796e5b627?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230706%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230706T024014Z&X-Amz-Expires=300&X-Amz-Signature=8fdd6a2595800291eb39283608399a50bbe1e115660e2fdcea5003bd6dc87c99&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=521807533&response-content-disposition=attachment%3B%20filename%3Dyolov8s.pt&response-content-type=application%2Foctet-stream [following] --2023-07-06 02:40:14-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/521807533/404b29b7-e374-406c-ab38-7d0796e5b627?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230706%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230706T024014Z&X-Amz-Expires=300&X-Amz-Signature=8fdd6a2595800291eb39283608399a50bbe1e115660e2fdcea5003bd6dc87c99&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=521807533&response-content-disposition=attachment%3B%20filename%3Dyolov8s.pt&response-content-type=application%2Foctet-stream Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 22573363 (22M) [application/octet-stream] Saving to: ‘/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt’ yolov8s.pt 100%[===================>] 21.53M 43.9MB/s in 0.5s 2023-07-06 02:40:14 (43.9 MB/s) - ‘/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt’ saved [22573363/22573363]
Note: Nake sure to have the "DLA project" folder saved as a shortcut under "MyDrive"
Get path to labels and images folder
dataset_folder = os.path.join(cwd, 'datasets/doclaynet_base') # Base dataset: 6910 train, 648 val, 499 test
"\n@article{doclaynet2022,\n title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},\n doi = {10.1145/3534678.353904},\n url = {https://doi.org/10.1145/3534678.3539043},\n author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},\n year = {2022},\n isbn = {9781450393850},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},\n pages = {3743–3751},\n numpages = {9},\n location = {Washington DC, USA},\n series = {KDD '22}\n}\n"
Check available GPUs
!nvidia-smi
Mon Jul 10 08:30:57 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 46C P8 11W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
import torch
torch.cuda.is_available()
True
Start training
!yolo task=detect mode=train model=/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/yolo/yolov8s.pt data=/content/drive/MyDrive/DLA_project/ultralytics/ultralytics/datasets/doclaynet.yaml epochs=100 imgsz=640 workers=8 batch=32 device=0
!yolo task=detect mode=predict model=runs/detect/train2/weights/best.pt conf=0.5 source='test_images(2)/*.png'
Ultralytics YOLOv8.0.132 🚀 Python-3.10.12 torch-2.0.1+cu118 CPU
Model summary (fused): 168 layers, 11129841 parameters, 0 gradients
image 1/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.40.58 AM.png: 640x480 6 List-items, 2 Section-headers, 1 Table, 4 Texts, 456.7ms
image 2/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.08 AM.png: 640x480 38 List-items, 3 Texts, 423.2ms
image 3/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.28 AM.png: 640x480 1 Section-header, 4 Texts, 13 Titles, 667.6ms
image 4/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.39 AM.png: 640x512 1 Table, 5 Texts, 3 Titles, 740.5ms
image 5/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.41.52 AM.png: 640x512 1 Section-header, 6 Texts, 3 Titles, 439.8ms
image 6/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.07 AM.png: 640x512 6 Section-headers, 10 Texts, 411.6ms
image 7/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.19 AM.png: 640x544 7 Section-headers, 7 Texts, 438.5ms
image 8/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.40 AM.png: 640x544 6 List-items, 4 Section-headers, 4 Texts, 2 Titles, 432.9ms
image 9/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.42.50 AM.png: 640x512 2 List-items, 5 Section-headers, 8 Texts, 418.6ms
image 10/10 /content/drive/MyDrive/DLA_project/test_images(2)/Screenshot 2023-07-11 at 8.43.03 AM.png: 640x512 1 List-item, 3 Section-headers, 5 Texts, 2 Titles, 414.8ms
Speed: 3.8ms preprocess, 484.4ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 512)
Results saved to runs/detect/predict3
Show an example:
from PIL import Image
example = Image.open(cwd + '/runs/detect/predict3/Screenshot 2023-07-11 at 8.40.58 AM.png')
example.show()
!yolo task=detect mode=val model=runs/detect/train2/weights/best.pt name=yolov8_updated_eval data=ultralytics/ultralytics/datasets/doclaynet.yaml imgsz=640
Ultralytics YOLOv8.0.133 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB) Model summary (fused): 168 layers, 11129841 parameters, 0 gradients Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf... 100% 755k/755k [00:00<00:00, 17.2MB/s] val: Scanning /content/drive/MyDrive/DLA_project/datasets/doclaynet_base/val/labels.cache... 648 images, 4 backgrounds, 1 corrupt: 100% 648/648 [00:00<?, ?it/s] val: WARNING ⚠️ /content/drive/MyDrive/DLA_project/datasets/doclaynet_base/val/images/3507c388db887b90376c6325ed6221dc522720c3095d70842b33283b9003b9f3.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1.0014] Class Images Instances Box(P R mAP50 mAP50-95): 100% 41/41 [00:38<00:00, 1.06it/s] all 647 9796 0.835 0.763 0.816 0.637 Caption 647 214 0.897 0.815 0.875 0.761 Footnote 647 48 0.745 0.365 0.392 0.279 Formula 647 178 0.87 0.68 0.755 0.566 List-item 647 1171 0.78 0.884 0.868 0.746 Page-footer 647 541 0.923 0.893 0.937 0.539 Page-header 647 635 0.941 0.699 0.905 0.607 Picture 647 160 0.739 0.781 0.775 0.719 Section-header 647 1707 0.869 0.831 0.9 0.587 Table 647 236 0.789 0.775 0.829 0.747 Text 647 4864 0.894 0.905 0.941 0.807 Title 647 42 0.737 0.762 0.806 0.649 Speed: 0.8ms preprocess, 9.8ms inference, 0.0ms loss, 2.6ms postprocess per image Results saved to runs/detect/yolov8_updated_eval
All metrics
from PIL import Image
results = Image.open(cwd + '/runs/detect/train2/results.png')
results.show()
For result evolution over 100 epochs, see this link
Box loss: measures how tight the predicting bounding boxes are to the ground truth boxes.
Class loss: measures the accuracy of the model's classification of whether a predicted box contains an object or "background"
We can see from the graph and actual result data that val loss w.r.t box loss, class loss and dfl loss remains high, indicating areas of improvement for this model.
As a result, we could improve by training on a more robust dataset and training for more epochs (can start with 300), so that over time, train and val loss could converge to nearly the same.
Considering the precision-recall curve
precision_recall_curve = Image.open(cwd + '/runs/detect/yolov8_updated_eval/PR_curve.png')
precision_recall_curve.show()
Considering mAP metrics:
In such case, we might improve the model by increasing image's size when training (but would also need to consider processing time as a trade-off)
"Images per class. ≥1.5k images per class"
"Instances per class. ≥10k instances (labeled objects) per class total"
"Background images. Background images are images with no objects that are added to a dataset to reduce False Positives (FP). We recommend about 0-10% background images to help reduce FPs (COCO has 1000 background images for reference, 1% of the total)."
Larger models like YOLOv5x will produce better results in nearly all cases, but have more parameters and are slower to run. For mobile applications we recommend YOLOv5s/m, for cloud or desktop applications we recommend YOLOv5l/x.