Structure-aware multimodal feature fusion for RGB-D scene classification and beyond

Structure-aware multimodal feature fusion for RGB-D scene classification and beyond

While convolutional neural networks (CNNs) have been excellent for object recognition, the greater spatial variability in scene images typically means that the standard full-image CNN features are suboptimal for scene classification. In this article, we investigate a framework allowing greater spati...

Full description

Bibliographic Details
Main Authors:	Wang, Anran, Cai, Jianfei, Lu, Jiwen, Cham, Tat-Jen
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Feature Fusion Multimodal Analytics
Online Access:	https://hdl.handle.net/10356/138263

Similar Items

Towards robust and efficient multimodal representation learning and fusion
by: Guo, Xiaobao
Published: (2025)

Fusing pairwise modalities for emotion recognition in conversations
by: Fan, Chunxiao, et al.
Published: (2024)

Multimodal sentiment analysis using hierarchical fusion with context modeling
by: Majumder, Navonil, et al.
Published: (2020)

Feature learning for RGB-D scene understanding
by: Wang, Anran
Published: (2016)

Feature fusion with covariance matrix regularization in face recognition
by: Lu, Ze, et al.
Published: (2018)

Multimodal fusion for in-car human action recognition
by: He, Hao
Published: (2024)

Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding
by: Huang, Zhiyu, et al.
Published: (2022)

Visiting the Invisible: layer-by-layer completed scene decomposition
by: Zheng, Chuanxia, et al.
Published: (2023)

KnowleNet: knowledge fusion network for multimodal sarcasm detection
by: Yue, Tan, et al.
Published: (2023)

Knowledge-based multimodal information fusion for role recognition and situation assessment by using mobile robot
by: Yang, Chule, et al.
Published: (2020)

Pluralistic image completion
by: Zheng, Chuanxia, et al.
Published: (2020)

Boundary-aware feature propagation for scene segmentation
by: Ding, Henghui, et al.
Published: (2020)

Structure-aware fusion network for 3D scene understanding
by: Yan, Haibin, et al.
Published: (2022)

Weakly-supervised 3D hand pose estimation from monocular RGB images
by: Cai, Yujun, et al.
Published: (2020)

Data efficient deep multimodal learning
by: Shen, Meng
Published: (2025)

A novel context-aware multimodal framework for persian sentiment analysis
by: Dashtipour, Kia, et al.
Published: (2022)

Pluralistic free-form image completion
by: Zheng, Chuanxia, et al.
Published: (2023)

EduBrowser : a multimodal automated monitoring system for co-located collaborative learning
by: Chua, Victoria Yi Han, et al.
Published: (2021)

Multimodal data fusion for object detection under rainy conditions
by: Liu, Ting Tao
Published: (2022)

Autonomous soundscape augmentation with multimodal fusion of visual and participant-linked inputs
by: Ooi, Kenneth, et al.
Published: (2023)

Large multimodal models for visual reasoning
by: Duong, Ngoc Yen
Published: (2024)

Real-time shadow-aware portrait relighting in virtual backgrounds for realistic telepresence
by: Song, Guoxian, et al.
Published: (2023)

A generative model for depth-based robust 3D facial pose tracking
by: Sheng, Lu, et al.
Published: (2020)

Towards unbiased visual emotion recognition via causal intervention
by: Chen, Yuedong, et al.
Published: (2023)

Bridging global context interactions for high-fidelity image completion
by: Zheng, Chuanxia, et al.
Published: (2023)

FASFLNet: feature adaptive selection and fusion lightweight network for RGB-D indoor scene parsing
by: Qian, Xiaohong, et al.
Published: (2023)

Multimodal few-shot classification without attribute embedding
by: Chang, Jun Qing, et al.
Published: (2024)

Sem2NeRF: converting single-view semantic masks to neural radiance fields
by: Chen, Yuedong, et al.
Published: (2023)

Multiple consumer-grade depth camera registration using everyday objects
by: Deng, Teng, et al.
Published: (2020)

Radar gesture recognition using deep learning: a multi-feature fusion approach
by: Wu, Huan
Published: (2025)

Enhanced feature fusion through irrelevant redundancy elimination in intra-class and extra-class discriminative correlation analysis
by: Wu, Zuobin, et al.
Published: (2020)

Shading‐based surface recovery using subdivision‐based representation
by: Deng, Teng, et al.
Published: (2020)

RGB-NIR image fusion
by: Pan, Liangyi
Published: (2021)

An Autonomous Agent for Learning Spatiotemporal Models of Human Daily Activities
by: Gao, Shan, et al.
Published: (2016)

Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing
by: Yu, Zitong, et al.
Published: (2024)

Recovering facial reflectance and geometry from multi-view images
by: Song, Guoxian, et al.
Published: (2023)

Look, read and feel : benchmarking ads understanding with multimodal multitask learning
by: Zhang, Huaizheng, et al.
Published: (2021)

Multi-view fusion-based 3D object detection for robot indoor scene perception
by: Wang, Li, et al.
Published: (2020)

Parallelized two-stage object detection in cluttered RGB-D scenes
by: Popović, Sanja, M. Eng. Massachusetts Institute of Technology
Published: (2014)

A multi-image dataset based on social media
by: Gan, Junjie
Published: (2025)