In [ ]:

# all_slow

In [ ]:

from fastai2.basics import *
from fastai2.callback.all import *
from fastai2.vision.all import *
from fastai2.medical.imaging import *

import pydicom

import pandas as pd

In [ ]:

from nbdev.showdoc import *

Tutorial - Binary classification of chest X-rays¶

In this tutorial we will build a classifier that distinguishes between chest X-rays with pneumothorax and chest X-rays without pneumothorax. The image data is loaded directly from the DICOM source files, so no prior DICOM data handling is needed.

Download and import of X-ray DICOM files¶

First, we will use the untar_data function to download the siim_small folder containing a subset (250 DICOM files, ~30MB) of the SIIM-ACR Pneumothorax Segmentation [1] dataset. The downloaded siim_small folder will be stored in your ~/.fastai/data/ directory. The variable pneumothorax-source will store the absolute path to the siim_small folder as soon as the download is complete.

In [ ]:

pneumothorax_source = untar_data(URLs.SIIM_SMALL)

The siim_small folder has the following directory/file structure:

Plotting the DICOM data¶

To analyze our dataset, we load the paths to the DICOM files with the get_dicom_files function. When calling the function, we append train/ to the pneumothorax_source path to choose the folder where the DICOM files are located. We store the path to each DICOM file in the items list.

In [ ]:

items = get_dicom_files(pneumothorax_source/f"train/")

Next, we split the items list into a train trn and validation val list using the RandomSplitter function:

In [ ]:

trn,val = RandomSplitter()(items)

To plot an X-ray, we can select an entry in the items list and load the DICOM file with dcmread. Then, we can plot it with the function show.

In [ ]:

patient = 3
xray_sample = dcmread(items[patient])
xray_sample.show()

Next, we need to load the labels for the dataset. We import the labels.csv file using pandas and print the first five entries. The file column shows the relative path to the .dcm file and the label column indicates whether the chest x-ray has a pneumothorax or not.

In [ ]:

df = pd.read_csv(pneumothorax_source/f"labels.csv")
df.head()

Out[ ]:

	file	label
0	train/No Pneumothorax/000000.dcm	No Pneumothorax
1	train/Pneumothorax/000001.dcm	Pneumothorax
2	train/No Pneumothorax/000002.dcm	No Pneumothorax
3	train/Pneumothorax/000003.dcm	Pneumothorax
4	train/Pneumothorax/000004.dcm	Pneumothorax

Now, we use the DataBlock class to prepare the DICOM data for training.

In [ ]:

pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
                   get_x=lambda x:pneumothorax_source/f"{x[0]}",
                   get_y=lambda x:x[1],
                   batch_tfms=aug_transforms(size=224))

Additionally, we plot a first batch with the specified transformations:

In [ ]:

dls = pneumothorax.dataloaders(df.values)
dls.show_batch(max_n=16)

Training¶

We can then use the cnn_learner function and initiate the training.

In [ ]:

learn = cnn_learner(dls, resnet34, metrics=accuracy)
learn.fit_one_cycle(1)

epoch	train_loss	valid_loss	accuracy	time
0	1.250138	1.026524	0.560000	00:03

In [ ]:

learn.predict(pneumothorax_source/f"train/Pneumothorax/000004.dcm")

Out[ ]:

('Pneumothorax', tensor(1), tensor([0.2858, 0.7142]))

In [ ]:

tta = learn.tta(use_max=True)

100.00% [4/4 00:02<00:00]

100.00% [1/1 00:00<00:00]

In [ ]:

learn.show_results(max_n=16)

In [ ]:

interp = Interpretation.from_learner(learn)

In [ ]:

interp.plot_top_losses(2)

Citations:

[1] Filice R et al. Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset. J Digit Imaging (2019). https://doi.org/10.1007/s10278-019-00299-9

In [ ]: