Summary: | Many current and state-of-the-art deep learning models for accurate image segmentation are based on the U-Net architecture, a convolutional neural network designed for biomedical applications. Despite its widespread adoption in the medical imaging community, U-Net has two major limitations. First, due to its deep structure and the large number of filters used, the number of parameters and Floating-Point Operations Per Second (FLOPS) are high. This results in high computational complexity and demands a large memory size, making real-time implementation and deployment of U-Net models challenging. Second, the base U-Net model only uses a single kernel type (i.e., <inline-formula> <tex-math notation="LaTeX">$3\times 3$ </tex-math></inline-formula>) throughout the network for all convolution operations. Feature extraction using a single spatial extent throughout the network is suitable if the size, location, and shape of the salient regions remain static in all the dataset images, which is not necessarily the case in medical imaging. To address these two limitations, we propose an Efficient Multi-Encoder-Decoder based UNet (EMED-UNet), a novel architecture for efficient medical image segmentation. We evaluated our network on four medical imaging datasets: Montgomery County, Shenzhen CXR, COVID-19 CT LS, and the BraTS (Brain Tumor Segmentation) dataset. EMED-UNet outperforms U-Net and its variants in terms of accuracy, with around 77% reduction in parameters, a 60% reduction in FLOPS, and a 79.2% reduction in memory usage (all as compared to U-Net). The results demonstrate that EMED-UNet is a lightweight and accurate model for image segmentation that substantially improves upon the U-Net base model and is more feasible to deploy, given its decreased computational cost.
|