# all_slow
from fastai2.basics import *
from fastai2.callback.all import *
from fastai2.vision.all import *
from fastai2.medical.imaging import *
import pydicom
import pandas as pd
from nbdev.showdoc import *
In this tutorial we will build a classifier that distinguishes between chest X-rays with pneumothorax and chest X-rays without pneumothorax. The image data is loaded directly from the DICOM source files, so no prior DICOM data handling is needed.
First, we will use the untar_data
function to download the siim_small folder containing a subset (250 DICOM files, ~30MB) of the SIIM-ACR Pneumothorax Segmentation [1] dataset.
The downloaded siim_small folder will be stored in your ~/.fastai/data/ directory. The variable pneumothorax-source
will store the absolute path to the siim_small folder as soon as the download is complete.
pneumothorax_source = untar_data(URLs.SIIM_SMALL)
The siim_small folder has the following directory/file structure:
To analyze our dataset, we load the paths to the DICOM files with the get_dicom_files
function. When calling the function, we append train/ to the pneumothorax_source
path to choose the folder where the DICOM files are located. We store the path to each DICOM file in the items
list.
items = get_dicom_files(pneumothorax_source/f"train/")
Next, we split the items
list into a train trn
and validation val
list using the RandomSplitter
function:
trn,val = RandomSplitter()(items)
To plot an X-ray, we can select an entry in the items
list and load the DICOM file with dcmread
. Then, we can plot it with the function show
.
patient = 3
xray_sample = dcmread(items[patient])
xray_sample.show()
Next, we need to load the labels for the dataset. We import the labels.csv file using pandas and print the first five entries. The file column shows the relative path to the .dcm file and the label column indicates whether the chest x-ray has a pneumothorax or not.
df = pd.read_csv(pneumothorax_source/f"labels.csv")
df.head()
file | label | |
---|---|---|
0 | train/No Pneumothorax/000000.dcm | No Pneumothorax |
1 | train/Pneumothorax/000001.dcm | Pneumothorax |
2 | train/No Pneumothorax/000002.dcm | No Pneumothorax |
3 | train/Pneumothorax/000003.dcm | Pneumothorax |
4 | train/Pneumothorax/000004.dcm | Pneumothorax |
Now, we use the DataBlock
class to prepare the DICOM data for training.
pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
get_x=lambda x:pneumothorax_source/f"{x[0]}",
get_y=lambda x:x[1],
batch_tfms=aug_transforms(size=224))
Additionally, we plot a first batch with the specified transformations:
dls = pneumothorax.dataloaders(df.values)
dls.show_batch(max_n=16)
We can then use the cnn_learner
function and initiate the training.
learn = cnn_learner(dls, resnet34, metrics=accuracy)
learn.fit_one_cycle(1)
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.250138 | 1.026524 | 0.560000 | 00:03 |
learn.predict(pneumothorax_source/f"train/Pneumothorax/000004.dcm")
('Pneumothorax', tensor(1), tensor([0.2858, 0.7142]))
tta = learn.tta(use_max=True)
learn.show_results(max_n=16)
interp = Interpretation.from_learner(learn)
interp.plot_top_losses(2)
Citations:
[1] Filice R et al. Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset. J Digit Imaging (2019). https://doi.org/10.1007/s10278-019-00299-9