Multimodal Person Re-Identification using a Lightweight Residual Self-Organizing Maps InceptionNet Framework

Badireddygari Anurag Reddy; Deepika Ghai; Danvir Mandal

doi:https://doi.org/10.14445/22315381/IJETT-V74I3P122

Research Article | Open Access | Download PDF

Volume 74 | Issue 3 | Year 2026 | Article Id. IJETT-V74I3P122 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I3P122

Multimodal Person Re-Identification using a Lightweight Residual Self-Organizing Maps InceptionNet Framework

Badireddygari Anurag Reddy, Deepika Ghai, Danvir Mandal

Received	Revised	Accepted	Published
28 Jul 2025	31 Jan 2026	06 Feb 2026	28 Mar 2026

Citation :

Badireddygari Anurag Reddy, Deepika Ghai, Danvir Mandal, "Multimodal Person Re-Identification using a Lightweight Residual Self-Organizing Maps InceptionNet Framework," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 3, pp. 311-335, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I3P122

Abstract

Person Re-Identification (ReID) is one of the critical tasks in surveillance systems and security systems, aiming to match individuals across different non-overlapping camera views. Normal approaches struggle under changing various modality inputs, lighting conditions, and occlusion. The importance of multimodal learning has significantly improved Person Re-Identification performance by incorporating complementary visual, infrared, and skeletal features. The existing Re-Identification models, like DMIRL (Deep Multimodal InceptionNet Representation Learning), provide improved accuracy using multimodal fusion, and this model suffers from computational overhead and a lack of adaptability in dynamic real-world settings. Moreover, DMIRL’s reliance solely on inception-based feature extraction may miss topological feature distribution and inter-modal contextual relationships. This paper introduces RSI-Net, which is a lightweight yet powerful deep learning framework for person Re-Identification. This model combines Residual Learning, Self-Organizing Maps (SOMs), and Inception Learning for more effective multimodal feature extraction. To enable deeper networks, this model uses Inception modules to capture scale-variant features, Residual blocks, and SOMs to spatially organize latent features across modalities. Joint cross-entropy and Triplet loss objectives are used in attention-based multimodal fusion, which is applied before training. Various benchmark datasets used in this RSI net representation are Market-1501, DukeMTMC-reID, and CUHK03. The performance of the proposed model is compared with the existing model DMIRL and the baseline. The evaluation metrics used in this paper are Rank-1 accuracy and mAP while reducing model complexity. The proposed model mainly focuses on the limitations of DMIRL algorithms, and it reduces the training time by 25% and improves fusion stability with less modality loss. The proposed model is suitable for real-time deployments and surveillance applications.

Keywords

Person Re-Identification, Multimodal Deep Learning, Residual Learning, Self-Organizing Maps, Inception Networks.

References

[1] Yaobin Zhang et al., “Graph based Spatial-Temporal Fusion for Multi-Modal Person Re-Identification,” Proceedings of the 31^st ACM International Conference on Multimedia, pp. 3736-3744, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[2] Zhen Sun et al., “FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification,” arXiv preprint, pp. 1-22, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[3] Cuiqun Chen, Mang Ye, and Ding Jiang, “Towards Modality-Agnostic Person Re-Identification with Descriptive Query,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 15128-15137, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[4] Can Su et al., “Robust Indoor Person Re-Identification with Multimodal Training,” IEEE Internet of Things Journal, vol. 12, no. 14, pp. 26289-26302, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Changshuo Wang et al., “Looking Clearer with Text: A Hierarchical Context Blending Network for Occluded Person Re-Identification,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 4296-4307, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[6] Shutao Bai, Hong Chang, and Bingpeng Ma, “Incorporating Texture and Silhouette for Video-based Person Re-Identification,” Pattern Recognition, vol. 156, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[7] Mulham Fawakherji et al., “TextAug: Test Time Text Augmentation for Multimodal Person Re-Identification,” 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, pp. 320-329, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[8] Xi Yang et al., “TIENet: A Tri-Interaction Enhancement Network for Multimodal Person Reidentification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 6, pp. 9852-9863, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[9] Moncef Boujou et al., “In-Depth Analysis of GAF-Net: Comparative Fusion Approaches in Video-based Person Re-Identification,” Algorithms, vol. 17, no. 8, pp. 1-26, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[10] Zi Wang et al., “Heterogeneous Test-Time Training for Multi-Modal Person Re-Identification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, pp. 5850-5858, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[11] Aihua Zheng et al., “Robust Multi-Modality Person Re-Identification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3529-3537, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[12] Suncheng Xiang et al., “Deep Multimodal Representation Learning for Generalizable Person Re-Identification,” Machine Learning, vol. 113, no. 4, pp. 1921-1939, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[13] Xiangtian Zheng et al., “Multi-Modal Person Re-Identification based on Transformer Relational Regularization,” Information Fusion, vol. 103, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Di Wu et al., “LRMM: Low Rank Multi-Scale Multi-Modal Fusion for Person Re-Identification based on RGB-NI-TI,” Expert Systems with Applications, vol. 263, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[15] Guang Han et al., “Text-To-Image Person Re-Identification based on Multimodal Graph Convolutional Network,” IEEE Transactions on Multimedia, vol. 26, pp. 6025-6036, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[16] Yongkang Ding et al., “Decoupling Feature-Driven and Multimodal Fusion Attention for Clothing-Changing Person Re-Identification,” Artificial Intelligence Review, vol. 58, no. 8, pp. 1-26, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[17] Shizhou Zhang et al., “Prompt-based Modality Alignment for Effective Multi-Modal Object Re-Identification,” IEEE Transactions on Image Processing, vol. 34, pp. 2450-2462, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[18] He Li et al., “All in One Framework for Multimodal Re-Identification in the Wild,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 17459-17469, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[19] Mingfu Xiong et al., “RFFR-Net: Robust Feature Fusion and Reconstruction Network for Clothing-Change Person Re-Identification,” Information Fusion, vol. 118, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[20] Yongkang Ding et al., “Attention-Enhanced Multimodal Feature Fusion Network for Clothes-Changing Person Re-Identification,” Complex and Intelligent Systems, vol. 11, no. 1, pp. 1-15, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[21] Cosimo Patruno et al., “Multimodal People Re-identification using 3D Skeleton, Depth and Color Information,” IEEE Access, vol. 12, pp. 174689-174704, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[22] Qianqian Wang et al., “Towards Unified Bijective Image-Text Generation for Text-to-Image Person Re-Identification,” Knowledge-based Systems, vol. 325, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[23] Yongkang Ding et al., “Disentangled Body Features for Clothing Change Person Re-Identification,” Multimedia Tools and Applications, vol. 83, no. 27, pp. 69693-69714, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[24] Yanbing Chen et al., “Person Re-Identification in Special Scenes based on Deep Learning: A Comprehensive Survey,” Mathematics, vol. 12, no. 16, pp. 1-19, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[25] Chang Liu, and Shibao Zheng, “Exploring Cross-Domain Techniques in Person Re-Identification: Challenges and Emerging Trends,” 2024 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), Shenzhen, China, pp. 2013-2018, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[26] Badireddygari Anurag Reddy, Danvir Mandal, and Bhaveshkumar C. Dharmani, “Multimodal Feature-based Deep Learning Framework for Person Re-Identification: Enhancing Models with InceptionNet Representation,” International Journal of Engineering Trends and Technology, vol. 73, no. 7, pp. 34-51, 2025.
[CrossRef] [Publisher Link]