Medical image segmentation, or not enough data?
The adoption of machine learning applications has continually gained momentum in recent years across multiple sectors. For example, time series analysis for high-speed trading, recommendation algorithms for targeted advertising and predictive maintenance to reduce downtime in manufacturing and logistics. In this blog, we will explore Medical Imaging (MI) and, more specifically, medical image segmentation in the context of medical imaging.
What is image segmentation and why do we do it?
Medical image segmentation is the process of dividing images into segments and identifying the different elements of these images. This could be to identify which parts of an image are humans in a self-driving car’s guidance algorithm, which is a form of object detection. Another example would be to enable facial/ iris detection or to identify the lungs in a chest X-ray.
Some of the applications of these examples may seem obvious. The self-driving car is designed to recognize and avoid people, whilst facial and iris detection can help identify people. For the lungs, however, this application might be a bit a little more subtle.
If you can identify all the pixels (or voxels in 3D) of the image that are the lungs, then you may be able to identify the lung volume for the patient and track how this changes over time. We can then compare this data with other metrics recorded using spirometry.
What data do you need to segment images and how do you get it?
There are many factors to take into consideration, but generally speaking, you need lots of pictures of the exact thing you are trying to segment together with another image (per to-be-segmented image) with a mask of the object in the original. As is the case with a lot of machine learning applications, you need to tell the computer what it is that you want it to know, and tell it again and again with as many varied examples as you can find.
As for the image segmentation method required to get these images, there are varying levels of complication. For self-driving cars, you can take a camera out to the street, and you may be able to get a number of appropriate images relatively easily. You’ll then need to draw the outline of the object to be segmented on a computer, which can be a long and intricate process.
For medical image segmentation, it’s even trickier, as certain medical images require exposure to radiation (which can be dangerous in large amounts) and some are very expensive (PET, MRI, etc.). Human medical images are necessarily highly private data that requires lots of legal precautions to appropriately disseminate. This means that the raw data (the images) tends to be very expensive to obtain for the purposes of training machine learning models.
There is the added challenge of labelling these images to make the masks you need to train your models. For some tasks, such as labelling what is or is not the lung, this may be a relatively obvious thing, but for others such as what is or is not a tumour, it is very difficult. Sometimes even expert radiologists or oncologists do not agree. This means that several trained experts need to label the same image and make sure that they agree before you can use the images. This need for multiple experts per image makes the labelling time-consuming in many cases, and so medical imaging datasets are smaller than corresponding datasets for other use cases.
What image segmentation models do you train and how long does it take?
There are many image segmentation algorithms available. Names like U-Net, YOLO, and variational autoencoder will not mean much to you if segmentation is a relatively new concept to you, but all of these algorithms work a little bit differently. Many of them work by trying to determine what’s going on in different regions of your input image and which element you’re interested in. While the workings of machine learning models are often mysterious, it may help to think of it a little bit like this.
Is it red-pink? Are there whitish-green droplet shapes on it? That might be a strawberry.
Is it grey-black? Is there a curved edge? It might be a tire.
How effectively a particular model will determine these things will depend on how many varied examples it is shown, and how the model is set up in the first place.
The time it takes to train these models will depend again on the architecture of the model and how much data you provide it with. In some instances, it could take as little as 10 minutes for simple models like watershed or thresholding, or as much as a week if you are training U-Net models to process tens of thousands of 3D images.
To conclude, there is a large choice of models and training time is variable; it could take less time than you took to read this article, or it could take a week!
How much data is enough for a segmentation image, and how do you make up for what you lack?
This is the big question. How much data you need depends intimately on the type of model you are training and what your requirements are. Requirements for machine learning models are difficult to nail down; even what metric you are going to ask for is a difficult question in many cases). Accuracy isn’t the best metric but it is one that is easy to understand, so let’s use that.
If you need your model to have an accuracy rate of 99.9% then you will need a very clever model and a lot of data. If you need your model to have a 90% accuracy rate, you will require significantly less data.
If you want to distinguish multiple different objects in your images then you will need more data. If it is just a single image, you’ll require less. Unfortunately, there is no short and simple answer. I know from experience that a few hundred 3D segmentation images may be enough to train a performant 3D model, but one hundred 2D segmentation images were so few that it was pointless trying with that dataset. State-of-the-art models have been trained on image segmentation datasets of around 330,000 2D images (the COCO dataset), but what if you don’t have hundreds of thousands of labelled segmentation images at your disposal? Luckily, there are some tricks that might help.
The good news is that you don’t have to start from scratch. You can take a pre-existing trained model such as YOLO and tune it to your needs. Just like you can be trained to detect a particular type of object in images very quickly because you know how to find them in general, so too can machine learning models. This is called transfer learning and is a very powerful technique.
You can also try image augmentation. This can take a lot of different forms, but the general concept is to make your images look like other images by altering the brightness, changing the background or changing the colour of your object to try to help your model adapt to new images. This is something that humans can do intuitively. If I show you a picture of a skinny orange cat on a light background you can also probably identify an overweight grey cat on a dark background, because you can mentally transpose the image.
In medical image segmentation, you might have other aspects you can use to augment your training images, such as adding noise, rotating or switching around elements of your image. You can also retrain models that have been trained to identify organs from HRCT data to ophthalmological CT data.
You will need a lot of images, but you can partially compensate for poor data by repurposing previous work and maximizing the value of the images you do have.
Image segmentation is an interesting and challenging task that requires vast amounts of labelled data, which can be time-consuming and expensive to obtain. Here, we have only talked about supervised segmentation (where you have all the data labelled), but there are increasing numbers of unsupervised or semi-supervised models (where you have none or part of the data labelled) which are reporting exciting results.
Whether it is for identifying tyres, cats, lungs or tumours, medical image segmentation is a method that requires you to make the most of your data to produce the best results.