Histopathological Cancer Detection with Deep Neural NetworksAntonio de PerioBlockedUnblockFollowFollowingApr 20(Note: The related Jupyter notebook and original post can be found here: https://www.
com/post/histopathological-cancer-detection)Being able to automate the detection of metastasised cancer in pathological scans with machine learning and deep neural networks is an area of medical imaging and diagnostics with promising potential for clinical usefulness.
Here we explore a particular dataset prepared for this type of of analysis and diagnostics — The PatchCamelyon Dataset (PCam).
PCam is a binary classification image dataset containing approximately 300,000 labeled low-resolution images of lymph node sections extracted from digital histopathological scans.
Each image is labelled by trained pathologists for the presence of metastasised cancer.
The goal of this work is to train a convolutional neural network on the PCam dataset and achieve close to, or near state-of-the-art results.
As we’ll see, with the Fastai library, we achieve 98.
6% accuracy in predicting cancer in the PCam dataset.
We approach this by preparing and training a neural network with the following features:1.
Transfer learning with a convolutional neural net (Resnet50) as our backbone.
The following data augmentations: Image resizing, random cropping, and horizontal and vertical axis image flipping.
Fit one cycle method to optimise learning rate selection for our training.
Discriminative learning rates to fine-tune.
In addition we apply the following out of the box optimisations throughout our training:1.
This notebook presents research and an analysis of this dataset using Fastai + PyTorch and is provided as a reference, tutorial, and open source resource for others to refer to.
It is not intended to be a production ready resource for serious clinical application.
We work here instead with low resolution versions of the original high-res clinical scans in the Camelyon16 dataset for education and research.
This proves useful ground to prototype and test the effectiveness of various deep learning algorithms.
The DataExamples above of a metastatic region (from Camelyon16)Original Source: Camelyon16PCam is actually a subset of the Camelyon16 dataset; a set of high resolution whole-slide images (WSI) of lymph node sections.
This dataset is made available by the Diagnostic Image Analysis Group (DIAG) and Department of Pathology of the Radboud University Medical Center (Radboudumc) in Nijmegen, The Netherlands.
The following is an excerpt from their website: https://camelyon16.
org/Data/The data in this challenge contains a total of 400 whole-slide images (WSIs) of sentinel lymph node from two independent datasets collected in Radboud University Medical Center (Nijmegen, the Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands).
The first training dataset consists of 170 WSIs of lymph node (100 Normal and 70 containing metastases) and the second 100 WSIs (including 60 normal slides and 40 slides containing metastases).
The test dataset consists of 130 WSIs which are collected from both Universities.
PatchCam (Kaggle)PCam was prepared by Bas Veeling, a Phd student in machine learning for health from the Netherlands, specifically to help machine learning practitioners interested in working on this particular problem.
It consists of 327,680, 96×96 colour images.
An excellent overview of the dataset can be found here: http://basveeling.
nl/posts/pcam/, and also available via download on github where there is further information on the data: https://github.
com/basveeling/pcamThis particular dataset is downloaded directly from Kaggle through the Kaggle API, and is a version of the original PCam (PatchCamelyon) datasets but with duplicates removed.
PCam is intended to be a good dataset to perform fundamental machine learning analysis.
As the name suggests, it’s a smaller version of the significantly larger Camelyon16 dataset used to perform similar analysis (https://camelyon16.
org/Data/)From the author’s words:PCam packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST.
Models can easily be trained on a single GPU in a couple hours, and achieve competitive scores in the Camelyon16 tasks of tumor detection and whole-slide image diagnosis.
Furthermore, the balance between task-difficulty and tractability makes it a prime suspect for fundamental machine learning research on topics as active learning, model uncertainty, and explainability.
HardwareWe perform our training on an Ubuntu 18 machine with a single RTX 2070 GPU using 16bit precision.
Fast AI ImportsKaggle SDK/API and downloading datasetThe data we are using lives on Kaggle.
We use Kaggle’s SDK to download the dataset directly from there.
To work with the Kaggle SDK and API you will need to create a Kaggle API token in your Kaggle account.
When logged into Kaggle, navigate to “My Account” then scroll down to where you can see “Create New API Token”.
This will download a JSON file to your computer with your username and token string.
Copy these contents to you ~/.
json token file.
Data PreparationWith our data now downloaded, we create an ImageDataBunch object to help us load the data into our model, set data augmentations, and split our data into train and test sets.
ImageDataBunch wraps up a lot of functionality to help us prepare our data into a format that we can work with when we train it.
Let’s go through some of the key functions it performs below:Data AugmentationBy default ImageDataBunch performs a number of modifications and augmentations to the dataset:Centre crop the imagesThere’s also some randomness introduced on where and how it crops for the purposes of data augmentationIt’s important that all the images need to be of the same size for the model to be able to train on.
Image FlippingThere are various other data augmentations we could also use.
But one of the key ones that we activate is image flipping on the vertical.
For pathology scans this is a reasonable data augmentation to activate, as there is little importance on whether the scan is oriented on the vertical axis or horizontal axis,By default fastai will flip on the horizontal, but we need to turn on flipping on the vertical.
Batch SizeWe’ll be using the 1cycle policy (fit_one_cycle()) to train our network (more on this later).
This is a hyper parameter optimisation that allows us to use higher learning rates.
Higher learning rates acts as a form of regularisation in 1cycle policy.
Recall that a small batch size adds regularisation, so when using large batch sizes in 1cycle learning it allows for larger learning rates to be used.
The recommendation here is to use a batch size that is the largest our GPU supports when using 1cycle policy to train.
Training, validation and test setsWe specify the folder location of the data (where the subfolders train and test exist along with the csv data)ImageDataBunch under the hood splits out the images (in the train sub-folder) into a training set and validation set (defaulting to an 80/20 percent split).
There are 176,020 images in the training set and about 44,005 in the validation set.
We also specify the location of the test sub-folder, that contains unlabelled images.
Our learning model will measure accuracy and the error rates against this datasetThe CSV file containing the data labels is also specifiedImage size on base architecture and target architectureImages in the target PCam dataset are square images 96×96.
However, when bringing a pre-trained ImageNet model into our network, which was trained on larger images, we need to set the size accordingly to respect the image sizes in that dataset.
We choose 224 for size as a good default to start with.
Normalising the imagesOnce we have setup the ImageDataBunch object, we also normalise the images.
Normalising the images uses the mean and standard deviation of the images to transform the image values into a standardised distribution that is more efficient for a neural network to train on.
Below we take a look at some random samples of the data so we can get some understanding of what we are feeding into our network.
This is a binary classification problem so there’s only two classes (0–1)Learner (CNN Resnet50)Once we have a correctly setup the ImageDataBunch object, we can now pass this, along with a pre-trained ImageNet model, to a cnn_learner.
We will be using Resnet50 as our backbone.
Fastai wraps up a lot of state-of-the-art computer vision learning in its cnn_learner:Connects our pre-trained model with a layer group of fully connected layers.
ReLU activationsBatch normalisationMax poolingDrop outImportantly, we also specify a backbone network, that has been pre-trained on the ImageNet dataset, so that we can use transfer learning in our training.
Transfer learningStarting with a backbone network from a well-performing model that was already pre-trained on another dataset is a method called transfer learning.
Transfer learning works on the premise that instead of training your data from scratch, you can use the learning (ie the learned weights) from another machine learning model as a starting point.
This is an incredibly effective method of training, and underpins current state-of-the-art practices in training deep neural networks.
When using pre-trained models we leverage, in particular, the learned features that are most in common with both the pre-trained model and the target dataset (PCam).
So for example, for models pre-trained on ImageNet such as Resnet50, training will leverage the common features (for example such as lines, geometry, patterns) that have already been learnt from the base dataset (in particular in the first few layers) to train on the target dataset.
For our model, we’ll be using Resnet50.
Resnet50 is a residual neural net trained on ImageNet data using 50 layers, and will provide a good starting point for our network.
Training and fit one cycleFit one cycleWe will be training our network with a method called fit one cycle.
This optimisation is a way of applying a variable learning rate across the total number of epochs in our training run for a particular layer group.
This has proven to be an extremely effective way to tune the learning rate hyperparameter for training.
Fit one cycle varies the learning rate from a minimum value at the first epoch (by default lr_max/div_factor), up to a pre-determined maximum value (lr_max), before descending again to a minimum across the remaining epochs.
This min-max-min learning rate variance is called a cycle.
An excellent overview can be found here in the fastai docs https://docs.
html along with a more detailed explanation in the original paper by Leslie Smith , where this method of hyperparameter tuning was proposed.
So how then do we determine the most suitable maximum learning rate to enable fit one cycle?.We run fastai’s lr_find() method.
Running lr_find before unfreezing the network yields the graph below.
We want to choose a learning rate just before the loss starts to exponentially increase.
From a visual observation of the resulting learning rate plot, starting with a learning rate of 1e-02 seems to be a reasonable choice for an initial lr value.
FreezeBy default we start with our network frozen.
This means that the layers of our pre-trained Resnet50 model have trainable=False applied, and training begins only on the target dataset.
The learning rate we provide to fit_one_cycle() applies only to that layer group for this initial training run.
Analysing first resultsAnalysing the graph of the initial training run, we can see that the training loss and validation loss both steadily decrease and begin to converge while the training progresses.
With the validation loss steadily decreasing, there are no clear signs of significant overfitting or underfitting.
Accuracy at the moment is 97.
We can learn more about this training run by using Fastai’s confusion matrix and plotting our top losses.
The confusion matrix is a handy tool to help us obtain more detail on the effectiveness of the training so far.
Specifically, we get some clarity on the amount of false positives and false negatives predicted by our neural net.
Plotting our top losses allows us to examine specific images in more detail.
Fastai generates a heatmap of images that we predicted incorrectly.
The heatmap allows us to examine areas of images which confused our network.
Its useful to do this so we obtain better context around how our model is behaving on each test run, and direct us to clues as to how to improve it.
Fine-tuning, unfreezing, and discriminative learning ratesInitial results are already good on the first training run.
But with some more fine-tuning, we can actually do a little better.
Transfer learning + Fine-tuning = Better GeneralisationTransfer learning alone brings us much further than training our network from scratch.
But this method is prone to optimisation difficulties present between fragile co-adpated layers when connecting a per-trained network.
We counter this by fine-tuning our model; making the all layers of our network, including the pre-trained Resnet50 layers, to be trainable.
When we unfreeze we train across all of our layers.
(See )This leads to better results and a better ability to generalise to new examples.
Discriminative Learning Rates and 1cycleWith all of our layers in our network unfrozen and open for training, we can now also make use of discriminative learning rates in conjunction with fit_one_cycle to improve our optimisations even further.
Discriminative learning rates lets us apply specific learning rates to layer groups in our network, optimising for each group.
Fit one cycle then operates on these values and uses them to vary learning rates according to the 1cycle policy.
html#Discriminative-layer-training)How do we find the best range of learning rates to use for fit 1cycle?.We can use lr_find() to help us with that.
Analysing our lr plot above, we choose a range of learning rates just before the loss begins to radically increase and apply that as a slice to our fit_one_cycle method below.
From our plot above, it seems reasonable to select an upper bound rate of 1e-4, and as a recommended rule for our lower bound rate, we can select a value 10x smaller than our upper-bound, in this case 1e-5.
The lower bound rate will apply to the layers in our pre-trained Resnet50 layer group.
The weights here are already well learned so we can proceed with a slower learning rate for this group of layers.
The upper bound rate gets applied to the final layer group of fully connected layers previously trained in our last training run on the target dataset.
The layers in this group will benefit from a faster learning rate.
Final analysisIn the final fine-tuning training run, we can see that our training loss and validation loss begin to diverge from each other now mid training, and that the training loss is progressively improving at a much faster rate than validation loss, steadily decreasing until stabilising to a steady range of values in the final epochs of the run.
Any further increases in our validation loss, in the presence of a continually decreasing training loss, would result in overfitting, failing to generalise well to new examples.
Training for too long would risk this.
Finalising the at this point in our training yields a fine-tuned accuracy of 98.
6% over our stage 1 training run result.
ConclusionWith an approach using deep convolutional neural networks, transfer learning, and fit one cycle optimisations, we can achieve 98.
6% accuracy on detecting cancer in the PCam dataset.
Fastai + Pytorch provides an excellent framework to implement deep neural networks for computer vision problems.
References Practical Deep Learning for Coders, v3.
“Rotation Equivariant CNNs for Digital Pathology”.
03962 Ehteshami Bejnordi et al.
Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.
JAMA: The Journal of the American Medical Association, 318(22), 2199–2210.
14585 Camelyon16 Challenge https://camelyon16.
Histopathologic Cancer Detection — Identify metastatic tissue in histopathologic scans of lymph node sections https://www.
com/c/histopathologic-cancer-detection Jason Yosinski.
“How transferable are features in deep neural networks?.“.
LG] Leslie N.
“A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay”.