The human body is far more complicated than a game of Go, so healthcare AI solutions need to bring in expert knowledge early on.
This article is written by Mengling ‘Mornin’ Feng, who is an assistant professor at the National University of Singapore.
The Digital Mammography DREAM Challenge was nothing short of a data scientist’s dream come true. Organised by Sage Bionetworks and the Dialogue for Reverse Engineering Assessment and Methods (DREAM), the competition offered 640,000 de-identified mammograms and computational resources sponsored by IBM and Amazon Web Services for coders to use artificial intelligence (AI) to improve breast cancer detection.
But more than the opportunity to tackle a huge dataset with nearly unlimited computational power—not to mention US$1.2 million in prize money—my team and I were drawn to the prospect of solving real-world problems and making a difference to patients and doctors.
One in eight women are expected to develop breast cancer during their lifetimes. Routine mammograms can help catch breast cancer early, but reading mammograms is a labour-intensive and time-consuming task. Because of relatively high error rates and the large variance between radiologist readings, the current breast cancer diagnosis protocol requires double-blind reads by two independent radiologists. If we could instead use AI to recognise cancerous lesions, it would greatly speed up diagnosis and might eventually even replace one of the readings.
With so much data at our disposal, we thought we would crack the problem easily, and got straight to work. Key teammates on this project include Dr Mikael Hartman, Dr Ngiam Kee Yuan, Du Hao, Dr Feng Jiashi, Jie Zequn and Wang Sijia. The network we designed was so complicated, it took us nearly two weeks to train it. To our great disappointment, our AI agent turned out to be barely better than a coin flip at predicting the likelihood of breast cancer from a mammogram.
We were left scratching our heads: what went wrong?
To figure out what happened, we did what we should have done right at the start: we consulted the experts. After talking to radiologists and breast cancer surgeons, we realised that as computer scientists, we made a few crucial mistakes.
First, we had focused on the quantity of the data available to us, forgetting that quality is just as important. Mammograms are very large files—a single image is over a million pixels. To fit these high-resolution images and our deep neural network into the GPU memory, we had to reduce the resolution of the files, and ended up removing information that was important for an accurate diagnosis. A human radiologist scrutinises a mammogram for abnormal features called microcalcifications, which appear as tiny white dots— when we reduced the resolution, these white dots were masked out and lost.
The second mistake we made was treating each mammogram as an independent image. In practice, each patient has four scans in two different views taken for each breast. Radiologists then read the scans side by side to see if there are any asymmetric changes, and to contrast between views to get a sense of how solid the tumour mass is.
Finally, we treated the cancer diagnosis problem as a binary one when it is not easily classified in reality. The data given to us had been labelled cancer or non-cancer; however, radiologists look for more than a dozen different kinds of abnormal lesions before reaching a diagnosis.
If at first you don’t succeed
With a better understanding of the problem, we tried again. Instead of looking at the entire mammogram and coming up with a binary yes or no diagnosis, we used a divide and conquer strategy, cutting the full mammogram into smaller patches. This reduced the size of the files while maintaining the high resolution required to identify microcalcifications. The question was thus reduced from “Does this patient have cancer?” to “Does this patch have abnormal lesions associated with cancer, and if so, where?”.
We then stitched the individual pieces back together into the full mammogram, translating it into a heat map marking out abnormal lesions. Based on the number and intensity of the abnormal lesions found, we then calculated a risk score for cancer.
To deal with the issue of non-independence, we used an ensemble learning framework to combine results from four AI agents, each looking at one view of a single breast, to reach a final conclusion. Finally, we supplemented the dataset with other open source data with detailed annotations, describing the lesions down to the pixel level.
With these improvements, we were able to increase the accuracy of our model from the initial score of 0.5 to an area of 0.91 under the receiver operating curve, a figure close to human accuracy. This result puts us among the top five teams in the world, and we are now collaborating with the other teams to further improve accuracy.
Lessons learnt and an open invitation
To summarise what we learnt from this experience, I would boil it down to three main points:
Quality, not just quantity. Although more healthcare data is becoming available, this data may not be enough compared to the complexity of human beings. We need high-quality, curated and well-annotated data—not just large amounts of raw data. This, however, requires a huge commitment from the physicians and healthcare workers, which brings me to my next point!
Involve experts early. Unlike other image classification or machine learning tasks, healthcare is very domain-specific, and it is often difficult for a layman to understand the pain points faced by medical practitioners. On the other hand, while physicians are very intelligent people, not all of them are trained in computer science or have an in-depth knowledge of AI. The challenge is for computer scientists and physicians to come together to understand clinical problems and translate them into AI-learning problems.
Never underestimate the complexity of the human body. Although AI agents like AlphaGo and AlphaZero have already surpassed human abilities, healthcare AI has a more challenging goal. Go may be a complicated game, but all its rules are well understood and explicitly known; in contrast, the mysteries of the human body are profound. We know much more today than a decade ago, but we are still trying to infer the ‘rules’ of the human body from the data.
I hope my sharing of these valuable lessons has been helpful, and I invite you to put what you have learnt into practice at the NUS-NUH-MIT Healthcare AI Datathon happening on 6-8 July 2018. Whether you are a physician with a problem that could be solved with AI, a data scientist looking for a clinical perspective, or an entrepreneur looking for fresh ideas, the Datathon will provide ample opportunities to meet collaborators.
Keen to share your thoughts on deep tech? Find out how you can contribute here.