Summary: | We propose an encoder–decoder architecture using wide and deep convolutional layers combined with different aggregation modules for the segmentation of medical images. Initially, we obtain a rich representation of features that span from low to high levels and from small to large scales by stacking multiple <i>k</i> × <i>k</i> kernels, where each <i>k</i> × <i>k</i> kernel operation is split into <i>k</i> × 1 and 1 × <i>k</i> convolutions. In addition, we introduce two feature-aggregation modules—multiscale feature aggregation (<i>MFA</i>) and hierarchical feature aggregation (<i>HFA</i>)—to better fuse information across end-to-end network layers. The <i>MFA</i> module progressively aggregates features and enriches feature representation, whereas the <i>HFA</i> module merges the features iteratively and hierarchically to learn richer combinations of the feature hierarchy. Furthermore, because residual connections are advantageous for assembling very deep networks, we employ an <i>MFA</i>-based long residual connections to avoid vanishing gradients along the aggregation paths. In addition, a guided block with multilevel convolution provides effective attention to the features that were copied from the encoder to the decoder to recover spatial information. Thus, the proposed method using feature-aggregation modules combined with a guided skip connection improves the segmentation accuracy, achieving a high similarity index for ground-truth segmentation maps. Experimental results indicate that the proposed model achieves a superior segmentation performance to that obtained by conventional methods for skin-lesion segmentation, with an average accuracy score of 0.97 on the ISIC-2018, PH2, and UFBA-UESC datasets.
|