Intra-inter modal attention blocks for
RGB-D semantic segmentation
Soyun Choi, Youjia Zhang, and Sungeun Hong
Inha University
ICMR 2023
Soyun Choi, Youjia Zhang, and Sungeun Hong
Inha University
ICMR 2023
Abstract
In this paper, we introduce a novel approach to address the challenge of effectively utilizing both RGB and depth information for semantic segmentation. Our approach, Intra-inter Modal Attention (IMA) blocks, considers both intra-modal and inter-modal aspects of the information to produce better results than prior methods which primarily focused on inter-modal relationships. The IMA blocks consist of a cross-modal non-local module and an adaptive channel-wise fusion module. The cross-modal non-local module captures both intra-modal and inter-modal variations at the spatial level through inter-modality parameter sharing, while the adaptive channel-wise fusion module refines the spatially-correlated features. Experimental results on RGB-D benchmark datasets demonstrate consistent performance improvements over various baseline segmentation networks when using the IMA blocks. Our in-depth analysis provides comprehensive results on the impact of intra-, inter-, and intra-inter modal attention on RGB-D segmentation.
Overall Framework
Outline of the pluggable IMA module in two-stream networks for RGB-D segmentation. The module is composed of two components: SIM-NL captures the intra- and inter-modal dependency of RGB and depth images at the spatial level, while ACF performs channel-wise recalibration. Note that our IMA module can be integrated with various single-stream or two-stream baseline networks.
Comparison With State-of-the-Art Methods
Qualitative Results
BibTex
@InProceedings{choi2023intra,
title={Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation},
author={Choi, Soyun and Zhang, Youjia and Hong, Sungeun},
booktitle={Proceedings of the 2023 ACM International Conference on Multimedia Retrieval},
pages={217--225},
year={2023}
}
References
X. Wang et al. “Non-local neural networks.” In Proc. of Computer Vision and Pattern Recognition, 2018.
G. Zhang et al. “Non-local aggregation for RGB-D semantic segmentation.” IEEE Signal Processing Letters, 2021.
J. Cao et al. “Shapeconv: Shape-aware convolutional layer for indoor RGB-D semantic segmentation.” In Proc. of Int’l Conf. on Computer Vision, 2021.
E. Xie. “SegFormer: Simple and efficient design for semantic segmentation with transformers.” In Proc. of Neural Information Processing Systems, 2021.