Convolutional Neural Networks in Biomedical Image Processing: A Review

Medicine

Biology

Deep learning

Machine learning

Artificial intelligence

A brief review of convolutional neural networks in biomedical image processing

Author

Josh Gregory

Published

December 24, 2024

Abstract

Purpose of Review

The goal of this review is to investigate the current developments of Convolutional Neural Networks (CNNs) within biomedical imaging applications. The review will begin with an overview of what CNNs are and how they are different from other machine learning (ML) techniques.

Recent Findings

The most cutting-edge research involves implementing the attention mechanism in combination with other networks, such as UNet (creating so-called “ensemble models”). These aim to take the best versions of two or more neural networks, leveraging the strengths of each in creating one better model. The two leading ensemble networks in this space are UNETR and SwinUNETR. These techniques apply CNNs in combination with attention mechanisms to yield results that are currently state-of-the art.

Summary

From this review, we can see that CNNs have demonstrated effectiveness in aiding radiologists and clinicians in a wide range of biomedical imaging tasks. Furthermore, we can see that even with the advent of new ML architectures, most feature a CNN backbone, highlighting the effectiveness of CNNs, both currently and likely for the future.

Introduction

Imaging forms a crucial part of our medical system. Both computed tomography (CT) and magnetic resonance (MR) imaging modalities allow clinicians to obtain a vast amount of data, such as imaging of tumors throughout the body and within various organs. While the sheer number of images are helpful, they also pose a challenge for the industry. On the one hand, the vast amounts of data mean that radiologists quickly become fatigued and overwhelmed. Up to 30% of radiological errors are attributed to radiologist fatigue¹.

On the other hand, such large quantities of image data have been able to provide researchers with large numbers of testing datasets. Many imaging datasets, such as The Cancer Imaging Archive (TCIA)² and the Medical Segmentation Decathlon (MSD)³. The result of the development of these large datasets has also fostered large competitions, leading to increased attention from researchers and the broader medical imaging community as a whole.

Convolutioal Neural Networks

Motivation

Traditional machine learning (ML) techniques include support vector machines (SVMs), decision tree learning, and random forests^4–6. The main disadvantage of these techniques is that they require the necessary features to be extracted and placed into the network before training. A challenge arises when ample data is available, but the extraction of such features is either unknown or difficult. An example of this is the usage of libraries such as PyRadiomics⁷. This approach involves passing images through many different feature extractors, such as first-order features, shape features, and the Gray Level Co-occurrence Matrix. These features are then passed into a network, such as a multi-layered perceptron⁸.

However, this process has a significant weakness: it assumes that the filters the images are passed through capture learnable features (i.e. portions of the image that help the model help make predictions). Additionally, each image would require significant pre-processing, which can add computational time that may not be used for learning. The filters that may be used also run the risk of not being relevant for the final network at all, leading to wasted computational time.

This is where convolutional neural networks (CNNs) come in. They aim to solve these weaknesses by taking in the raw image only and creating filters dynamically. This leads to less image pre-processing time, as only the image is passed to the network instead of image features. Image features are also extracted during the learning process, as opposed to extracting many image features a priori and hoping that some of them capture image features.

CNN Architecture

In CNNs, the only input to the network is an image. Typically, these images are split up into tensors with dimension $n \times m \times d$, with each picture being $n \times m$ pixels and a color channel of $d$, which is one for a grayscale image and three for typical red, green, blue (RGB) image, as can be seen in the popular VGG-x variety of CNNs¹⁰.

The Convolutional Layer

As seen in the image above, each image has a convolutional layer, whereby a “kernel” or “filter” (a learnable parameter) is passed over each image during the forward pass step of model training. During this convolution step, each filter is passed over the height and width of each image. Since both the images and kernels are passed in as 2D arrays, a dot product is computed between the image and the kernel during this convolution step. Once a single filter has been passed over a single image, this output is fed into an activation layer.

While the preceding image only shows a single image, it is important to note that there are many convolutional layers, and therefore, many different filters, that are passed over each image during the forward pass. This in turn is directly solving one of the key issues brought up during the previous discussion on feature engineering. Namely, that of the filters during pre-processing being representative of the dataset. Since the filters are learned throughout the training process, not only does no pre-processing need to be computed, the specific filters do not need to be known a priori. Instead, one can specify the number of convolutional layers in a network, which would roughly correspond to the number of filters the network would learn¹⁰.

Activation Functions

To capture nonlinearities in data, various activation functions are used. Several are given here for completeness. One of the steps during model training is hyperparameter optimization, whereby various models would be trained simultaneously, but with slight changes. An example would be an identical model, but each with different activation functions, such as those listed below.

Sigmoid

The sigmoid function is given as Equation 1:

\[ f(x) = \frac{1}{ 1 + e^{-x} } \tag{1}\] which takes a real number $x$ and compresses it between 0 and 1.

Hyperbolic Tangent

The hyperbolic tangent (tanh) function is given as Equation 2:

\[ f(x) = \frac{1 - e^{-2x}}{ 1 + e^{-2x} } \tag{2}\] which takes any real number $x$ and compresses it to the range between -1 and 1.

Rectified Linear Unit (ReLU)

\[ f(x) = \max \left( 0, x \right) \tag{3}\]

which takes any real number $x$ and sets the output to be $x$ if $x$ is negative, otherwise sets it to $x$. The ReLU is the most common activation function in CNNs due to its fast computation time¹¹.

Softmax

This activation function is primarily used for multi-class classification. It takes an output and assigns $y$ a probability that it applies to each class $i$:

\[ P (y = i) = \frac{ e^{z_i} }{ \sum_{j=1}^{K} e^{z_j} } \tag{4}\] where $z_i$ is the raw neural network output (i.e. output from the previous layer), and $K$ is the total number of classes.

Fully Connected Layer

The fully-connected layer is similar to that in the multilayer perceptron (MLP)⁸. That is, each input node is connected to each output node multiplied by the weights and biases of the input node. This fully-connected layer is then connected to either 1.) a softmax activation function if performing multi-class classification, or 2.) a linear activation function if performing regression^12,13.

Data Preprocessing

While CNNs do not require the kind of feature engineering commonly found in MLPs, some amount of image preprocessing is typically done. Such pre-processing steps include random flips, rotations, scaling, and cropping¹⁴. When these so-called data augmentation pipelines are applied probabilistically, one is also able to effectively make copies of a single dataset with very little additional effort.

Award-winning CNN Architectures

Typically when a CNN–or any neural network–is applied to a new space, existing models are used and then given data for a specific use case. In this section, a brief overview of various award-winning CNN architectures are examined.

Award-winning CNN architectures
Architecture	Year	Architecture	Year
LeNet-5¹⁵	1998	ResNet¹⁶	2016
AlexNet citation¹⁴	2012	Xception citation¹⁷	2017
VGGNet citation¹⁸	2014	DenseNet citation¹⁹	2017
GoogLeNet²⁰	2014	U-Net²¹	2015

It is important to note that this list is by no means exhaustive, and more network architectures emerge constantly. What follows are both improvements of the architectures listed in the table above as well as entirely unique architectures.

CNN Applications in Medical Imaging–Classification

COVID-19

Most of the studies in this section aim to solve a major issue with testing that was experienced during the COVID-19 pandemic. Namely, the reverse transcriptase-polymerase (RT-PCR) test was the most ordered test²². However, this test is expensive, slow, and was in extremely high demand during the pandemic. As such, many of the following CNN papers attempted to circumvent this issue by using imaging, typically chest X-Rays or CT scans of the chest, in order to diagnose patients much faster than the RT-PCR test is able to achieve.

Twice Transfer Learning

The first study²³ under examination proposed a fine-tuned CNN pretrained on the ImageNet dataset using the DenseNet architecture¹⁹. After pretraining, the model was fine-tuned on the NIH ChestX-ray14 dataset²⁴, with COVID-19-specific data coming from the authors’ own database that is not publicly available. As is typical in machine learning, the authors explored many variations of the network, most specifically different fine-tuning schemes (e.g. training with only ImageNet, ImageNet and the NIH ChestX-ray14 database, etc.). From this empirical experimentation, the authors showed that transfer learning on the NIH ChestX-ray14 database yielded the best results. While the authors claim that they were able to achieve nearly 100% on their training dataset, it does show good convergence. The authors do note that more data should be used to better analyze the resulting model’s generalizability.

Transfer Learning with AlexNet

Here we examine one of the first studies on using X-Ray and CT imaging to help diagnose COVID-19²⁵. This paper used the AlexNet architecture¹⁴ with transfer learning to allow the network to learn how to differentiate between patient X-Rays and CT scans that both do and do not have clinically diagnosed COVID-19. Unlike the previous study, the authors here also release their training dataset as part of their work. In addition, the authors also propose a custom CNN that is simpler than AlexNet, ideally to allow for increased throughput. The accuracy of both networks were good ($>$90%), however the dataset featured a small number of X-ray images (100), which could lead to the model overfitting and memorizing training data.

Heart Disease

CNN and LSTM

Two or more networks that are combined to make one larger model are called “ensemble networks”²⁶. Here, the authors used a 24-layer CNN in combination with a Bidirectional Long Short-Term Memory (BiLSTM) network to extract features from electrocardiogram (ECG) data. The network was designed to detect atrial fibrillation, which has been shown to increase the risk of heart failure²⁷. The authors used the 2017 PhysioNet/CINC dataset²⁸ and were able to achieve an accuracy of 89.3%. It is important to note that the PhysioNet/CINC dataset is synthetic and the authors employed a custom loss function. While synthetic datasets can be realistic, synthetic datasets could lead to model results that are not representative of reality. Additionally, custom loss functions are generally not advised due to the resulting difficulty in comparing performance between models (e.g. one with custom loss function and one with a standard loss function).

ResNets and S12L-ECG

While the previous section only dealt with single-lead ECGs, the authors in the this study²⁹ utilized the short-duration, standard, 12-lead ECG (S12L-ECG), which is the typical ECG found in clinical settings. The authors also utilized a dataset with over 2 million labeled S12L-ECG exams. The fundamental network is a variation of ResNet described in¹⁶, with the model capable of classifying six types of ECG abnormalities. The authors were able to achieve and F1 score (a measure of accuracy for multi-class classification tasks) of above 80%. While this result is significant, and the authors did utilize a much larger (and realistic) dataset than the authors of²⁸, the authors used their validation dataset exclusively for tuning the network. Typically, it is best practice to utilize a different portion of the dataset for training, tuning, and testing to ensure that the model is not over-fitting and the performance metrics are accurate³⁰.

CNN Applications in Medical Imaging–Segmentation

Broadly, segmentation in medical imaging is defined as being able to isolate a region of interest (ROI) from a patient scan or image. Unlike classification, segmentation focuses on where something is in addition to whether it is present³¹. An example is that from the breast tumor papers that were examined earlier. The papers presented in the previous section can only tell you whether or not an image of a breast contains a tumor, not where that tumor may be. Segmentation, on the other hand, goes farther. Not only does it tell you whether there is a tumor, it outlines (or segments) the tumor from the rest of the image. This is extremely helpful, as combinations of different segmentation planes from the same patient (e.g. axial, sagittal, and coronal scans from CT or MRI) allow for the region of interest (like a breast tumor) to be visualized as a 3D model, which can then be used in other analysis techniques, such as finite element analysis or computational fluid dynamics.

Note: Thus far most of the studies considered have utilized architectures that were not specifically designed for medical image segmentation. An example of this can be found in the previous section, where the utilized a ResNet architecture²⁹. While this approach is valid, many of the following architectures utilize U-Net as a backbone, which is a CNN that was designed from the ground up to be used for biomedical image segmentation²¹. Many of these networks also utilize transformers, which were first described in a landmark paper³² and are the basis for the significant public interest in ML. While the description of this network is outside the scope of this review, the interested reader is directed to the corresponding breakthrough paper, as well as the first vision transformer (ViT) paper³³.

UNETR

The UNETR architecture³⁴is one of the first to combine a CNN-based network (U-Net) with a ViT. One of the weaknesses that the authors found in architectures such as U-Net is the inability to learn long-range spatial structures for 3D medical image segmentation. The structure of the ViT³³ leads to it naturally being context aware. Therefore, the authors elected to use image patch embeddings (similar to tokens in current large language models) and merge them with a version of U-Net-based decoder to produce the segmentation output. This allows for the U-Net portion to capture local image dependencies, while the transformer portion of the network is able to capture global trends. The authors showed significant progress on multiple datasets for various organ segmentations. While the jump in accuracy is important, transformers do require more compute than traditional CNNs, as highlighted by the significant number of computations in the transformer network³².

Swin UNETR

Shifted window (Swin) transformers were proposed as a hierarchical ViT that allows for more efficient computations than traditional ViTs^35,36. Here, the authors proposed utilizing a Swin ViT to replace the conventional ViT³⁷. The main advantage of using a Swin ViT is that it was specifically designed to handle 3D MRI images. This in turn allows the network as a whole to segment brain tumors more effectively. The authors highlighted the various morphological and textural inhomogeneities of brain tumors, necessitating the utilization of Swin ViTs. The “shifted window” portion of the Swin ViT allows the model to capture local details with small windows (sections) of an image and global dependencies via larger windows. These are then passed to the U-Net portion of the network, which helps preserve small details and construct the segmentation mask. Additionally, Swin UNETR is designed to handle 3D MRI images, meaning that 3D transformers and 3D convolutions are utilized throughout the network. The authors demonstrated that the Swin UNETR model was able to outperform many other models on the BRATS 2021 dataset³⁸. While Swin UNETR’s performance is exceptional, this is a specialized network designed to perform 3D brain tumor segmentation from 3D MRI images. A potential limitation to this network is its inability to generalize to other organs or imaging modalities. In this case, a network such as UNETR³⁴ would likely be better suited to this task.

MedSAM

The so-called Medical Segment Anything Model (MedSAM)³⁹ takes inspiration from Meta AI’s “Segment Anything Model” (SAM)⁴⁰, but with application specific to medical imaging. Inspiration from SAM also includes prompt-based segmentation, where the user can provide an input to the model besides an image, allowing for bounding boxes, points, or text throughout the segmentation process. This user-in-the-loop interaction is in stark contrast to nearly all of the studies thus far, highlighting the flexibility of the model in handling various medical tasks. MedSAM was trained on a large (and crucially, image-modality-diverse) dataset of over 1 million image-mask pairs, including modalities such as CT, MRI, and X-rays. Dataset annotations also include anatomical and pathological features in addition to the raw imaging. The authors did point out that its performance exceeds many existing state-of-the-art segmentation foundation models, but also outperforms many specialist models with over 50 internal and external validation tasks. While this is impressive, the authors also point out MedSAM’s imaging modality imbalance, as a majority of the dataset consists of MRI, CT, and endoscopy images. All things considered, the overall performance of MedSAM showcases that it has the capability across a wide breadth and depth of segmentation tasks, and does give confidence that further fine-tuning may lead to even better results.

The Data and Reproducibility Problems

While all of these studied highlighted the application of CNNs (and other networks, such as ViTs in later sections), it is also crucial to highlight the importance of data in this process, as well as some of the potential issues in this current wave of, bluntly, hype in the ML space. There are two main issues that are worth noting in this section: datasets and reproducibility.

Dataset Issues

One of the primary issues faced in the ML space is that of the datasets. Where the datasets come from, how they are curated, and the potential bases that they contain⁴¹. There have been multiple examples of accidental bias in large-level ML applications. A famous example is Google Photos classifying people with dark skin as “gorillas”⁴². While this is a relatively harmless example, it is not difficult to extend this to harmful areas, such as mortgage companies turning to ML algorithms to approve or deny borrowers based on their credit score, predictive policing systems inadvertently perpetuating racial bias, or annotation disparities in medical imaging datasets. These are not toy examples; both situations have occurred already^43–46. As a result, knowing details of datasets that are widely used is extremely important.

Fortunately, there are some techniques in place to help combat such biases. Multiple studies have examined potential standards for categorizing and managing biases in ML^41,47. To The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology was used to identify types of biases (specifically biases in data, algorithms, and user interaction) in the literature from 2017-2022⁴¹. Furthermore, the authors pointed out other tools, such as Aequitas and AI Fairness 360 that allow for bias mitigation.

On the government side, authors from the National Institute of Standards and Technology (NIST)⁴⁷ proposed a socio-technical framework to mitigate bias in ML systems. This framework combines both societal and technical aspects, which is crucial. Bias was categorized into several categories: systemic bias, statistical bias, and human bias. Guidance on reducing these biases and their origins is also given, providing tools for both technical and non-technical individuals. Furthermore, as many technical areas only focus on technical solutions, systems that touch both social and technological aspects (which ML systems do) should have a corresponding socio-technical solution, which these authors provide.

Reproducibility

One other major issue with the current ML landscape is that of reproducibility. An example of this was in response to a study from Google Health⁴⁸ by the Haibe-Kains and colleagues⁴⁹. The study’s lead author, Benjamin Haibe-Kains, noted that the Google Health team gave so little information about their code and its implementation that it seemed more like an advertisement than a paper. Furthermore, the Google Health study was not an isolated incident. Rather, it is part of an observed trend in the industry that many experts agree is worrying⁵⁰. This, in combination with the fact that many large models are considered black boxes, leads to a difficulty in reproducibility and trust, especially when many datasets are proprietary⁵⁰.

There are some potential solutions on the horizon. Yoshua Bengio, widely considered a leader in the field, organized a reproducibility challenge, where participants try to replicate studies from the top conferences in the ML field (e.g. NeurIPS, ACL, ICML, etc.)^50,51. This is also starting to make an impact. As an example, the Swin UNETR paper³⁷ has a page on Papers With Code, an online repository that many researchers use to host competitions and datasets⁵².

Future Outlooks

The cutting edge of ML in the medical imaging space is currently ViTs, typically in combination with CNNs³⁷. The advent of the transformer architecture has drastically transformed the field as a whole, and as such, it is being applied as frequently as possible. And for good reason, the Swin UNETR network is currently the top-ranked multi-organ CT segmentation algorithm of this writing⁵³.

The field is also at a crossroads in regards to large, foundation models vs. smaller, more specialized ones. There are popular large language models, such as OpenAI’s ChatGPT and Anthropic’s Claude (among others), that are able to perform a stunning number of tasks. We also saw the emergence of the segmentation equivalent of such a model with MedSAM³⁹. On the other hand, we have models such as Swin UNETR³⁷ that are more specialized. There is still no consensus as to what the better direction is, and is an active area of research⁵⁴.

Conclusions

Overall, CNNs form a crucial backbone in biomedical image processing. Both from their inception to today’s state-of-the-art models, they continue to be used, and due to their demonstrated effectiveness, it does not appear that their use will be going down, from basic classification to more advanced 3D image segmentation. Even with the breakthrough of the transformer architecture, we see CNNs still being used.

The ability of CNNs to learn hierarchical features from raw pixel data removes the need for feature engineering, as the network itself learns important features from its training data. This has led to their widespread adoption in various medical areas, such as imaging, pathology, radiology, and (more recently) radiomics. The versatility of CNNs have also led to many specialized architectures tailored to specific tasks, as was seen in previous sections. Examples include networks tuned to process CT, MRI, or X-ray images. These allow for more personalized medicine, to say nothing of the drastic increase in disease diagnosis.

In summary, the use of CNNs in biomedical image processing has only expanded over time, and it most likely will continue to expand. As we have seen, the best uses of CNNs have been in ensemble models combined with other network architectures such as vision transformers. While CNNs are only going to be used more frequently in this space, their usage in ensemble models specifically will be a large driver of their adoption, ultimately to help improve the lives of all.

References

Lee, C. S., Nagy, P. G., Weaver, S. J. & Newman-Toker, D. E. Cognitive and system factors contributing to diagnostic errors in radiology. AJR. American journal of roentgenology 201, 611–617 (2013).

Clark, K. et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging 26, 1045–1057 (2013).

Antonelli, M. et al. The Medical Segmentation Decathlon. Nature Communications 13, 4128 (2022).

Jusman, Y., Ng, S. C. & Abu Osman, N. A. Intelligent screening systems for cervical cancer. TheScientificWorldJournal 2014, 810368 (2014).

Mavroforakis, M. E., Georgiou, H. V., Dimitropoulos, N., Cavouras, D. & Theodoridis, S. Mammographic masses characterization based on localized texture and dataset fractal analysis using linear, neural and support vector machine classifiers. Artificial Intelligence in Medicine 37, 145–162 (2006).

Ho, T. K. Random decision forests. in Proceedings of 3rd International Conference on Document Analysis and Recognition vol. 1 278–282 vol.1 (1995).

Griethuysen, J. J. M. van et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Research 77, e104–e107 (2017).

ROSENBLATT, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review 65, 386–408 (1958).

Amidi, S. & Amidi, A. CS 230 - Convolutional Neural Networks Cheatsheet. (2019).

10.

Chattopadhyay, A. & Maitra, M. MRI-based brain tumour image detection using CNN based deep learning method. Neuroscience Informatics 2, 100060 (2022).

11.

Bengio, Y. Practical Recommendations for Gradient-Based Training of Deep Architectures. in Neural Networks: Tricks of the Trade: Second Edition (eds. Montavon, G., Orr, G. B. & Müller, K.-R.) 437–478 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012). doi:10.1007/978-3-642-35289-8_26.

12.

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning. (MIT Press, 2016).

13.

G. E. Hinton, S. Osindero & Y. Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18, 1527–1554 (2006).

14.

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. in Advances in Neural Information Processing Systems (eds. Pereira, F., Burges, C. J., Bottou, L. & Weinberger, K. Q.) vol. 25 (Curran Associates, Inc., 2012).

15.

Y. Lecun, L. Bottou, Y. Bengio & P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998).

16.

K. He, X. Zhang, S. Ren & J. Sun. Deep Residual Learning for Image Recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016). doi:10.1109/CVPR.2016.90.

17.

Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. (2017).

18.

Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2015).

19.

G. Huang, Z. Liu, L. Van Der Maaten & K. Q. Weinberger. Densely Connected Convolutional Networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2261–2269 (2017). doi:10.1109/CVPR.2017.243.

20.

C. Szegedy et al. Going deeper with convolutions. in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–9 (2015). doi:10.1109/CVPR.2015.7298594.

21.

Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (eds. Navab, N., Hornegger, J., Wells, W. M. & Frangi, A. F.) 234–241 (Springer International Publishing, Cham, 2015).

22.

Wang, W. et al. Detection of SARS-CoV-2 in Different Types of Clinical Specimens. JAMA 323, 1843–1844 (2020).

23.

Bassi, P. R. A. S. & Attux, R. A deep convolutional neural network for COVID-19 detection using chest X-rays. Research on Biomedical Engineering 38, 139–148 (2022).

24.

Wang, X. et al. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. in Proceedings of the IEEE conference on computer vision and pattern recognition 2097–2106 (2017).

25.

Maghdid, H. S., Asaad, A. T., Ghafoor, K. Z., Sadiq, A. S. & Khan, M. K. Diagnosing COVID-19 Pneumonia from X-Ray and CT Images using Deep Learning and Transfer Learning Algorithms. (2020).

26.

Cheng, J., Zou, Q. & Zhao, Y. ECG signal classification based on deep CNN and BiLSTM. BMC Medical Informatics and Decision Making 21, 365 (2021).

27.

Chugh, S. S. et al. Worldwide epidemiology of atrial fibrillation: A Global Burden of Disease 2010 Study. Circulation 129, 837–847 (2014).

28.

Goodfellow, S. D. et al. Towards Understanding ECG Rhythm Classification Using Convolutional Neural Networks and Attention Mappings. in Proceedings of the 3rd Machine Learning for Healthcare Conference (eds. Doshi-Velez, F. et al.) vol. 85 83–101 (PMLR, 2018).

29.

Ribeiro, A. H. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nature Communications 11, 1760 (2020).

30.

Russell, S. J. & Norvig, P. Artificial Intelligence: A Modern Approach. (Pearson, Hoboken, NJ, 2021).

31.

Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18, 203–211 (2021).

32.

Vaswani, A. et al. Attention is All You Need. in (2017).

33.

Kolesnikov, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. in (2021).

34.

Hatamizadeh, A. et al. UNETR: Transformers for 3d medical image segmentation. in Proceedings of the IEEE/CVF winter conference on applications of computer vision 574–584 (2022).

35.

Z. Liu et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 9992–10002 (2021). doi:10.1109/ICCV48922.2021.00986.

36.

Z. Liu et al. Video Swin Transformer. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 3192–3201 (2022). doi:10.1109/CVPR52688.2022.00320.

37.

Hatamizadeh, A. et al. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. in Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (eds. Crimi, A. & Bakas, S.) 272–284 (Springer International Publishing, Cham, 2022). doi:https://doi.org/10.1007/978-3-031-08999-2_22.

38.

Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification. arXiv e-prints arXiv:2107.02314 (2021) doi:10.48550/arXiv.2107.02314.

39.

Ma, J. et al. Segment anything in medical images. Nature Communications 15, 654 (2024).

40.

A. Kirillov et al. Segment Anything. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 3992–4003 (2023). doi:10.1109/ICCV51070.2023.00371.

41.

Pagano, T. P. et al. Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data and Cognitive Computing 7, (2023).

42.

Simonite, T. When It Comes to Gorillas, Google Photos Remains Blind. WIRED (2018).

43.

Andrews, E. L. How Flawed Data Aggravates Inequality in Credit. (2020).

44.

Heaven, W. D. Bias isn’t the only problem with credit scores—and no, AI can’t help. (2021).

45.

Richardson, R., Schultz, J. M. & Crawford, K. Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. NYUL Rev. Online 94, 15 (2019).

46.

Jones, C. et al. A causal perspective on dataset bias in machine learning for medical imaging. Nature Machine Intelligence 6, 138–146 (2024).

47.

Schwartz, R. et al. Towards a Standard for Identifying and Managing Bias in Artificial Intelligence. (2022) doi:10.6028/NIST.SP.1270.

48.

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

49.

Haibe-Kains, B. et al. Transparency and reproducibility in artificial intelligence. Nature 586, E14–E16 (2020).

50.

Heaven, W. D. AI is Wrestling with a Replication Crisis. (2024).

51.

ML Reproducibility Challenge 2023.

52.

Hatamizadeh, A. et al. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. (2022).

53.

Medical Image Segmentation on Synapse multi-organ CT.

54.

Maslej, N. et al. The AI Index 2024 Annual Report. (Institute for Human-Centered AI, Stanford University, Stanford, CA, 2024).

Citation

BibTeX citation:

@online{gregory2024,
  author = {Gregory, Josh},
  title = {Convolutional {Neural} {Networks} in {Biomedical} {Image}
    {Processing:} {A} {Review}},
  date = {2024-12-24},
  url = {https://joshgregory42.github.io/posts/2024-12-24-ml-review/},
  langid = {en}
}

For attribution, please cite this work as:

Gregory, J. Convolutional Neural Networks in Biomedical Image Processing: A Review. https://joshgregory42.github.io/posts/2024-12-24-ml-review/ (2024).