Abstract
Low-complexity convolutional neural networks have been shown to be sufficient for segmentation of cardiac US images in A2C and A4C views. The performance of 24 varying-complexity implementations of U-Net and DeepLabV3+ (popular segmentation architectures) has been tested on cardiac US data (CAMUS data set) and street view data (Cityscapes data set). The inference speed of the models has also been measured before and after post-training optimization. The models systematically differed in their structural components: the number of layers and convolutional filters as well as the receptive field size. All models trained to maximize the Dice Coefficient. The Dice Coefficient was consistently high (0.86-0.90) on CAMUS data and low (0.48-0.67) on Cityscapes data for all models. Each ten-fold reduction in the number of model parameters tended to reduce the score by ≈0.01 on CAMUS and by 0.03-0.05 on Cityscapes. Likewise, low-parameter models, especially the ones based on U-Net, had yielded predictions with higher (worse) Hausdorff Distance values. Increasing the receptive field size of the models partially mitigated this effect. Without post-training optimization, the inference speed mostly varied with the number of layers in the networks. The least complex U-Net model was 83% faster than the most complex one; for the DeepLab models the difference was 53%. With post-training optimization, any reduction in the number of parameters led to increased speed: up to more than 700% for both architecture types.