Research Article | Open Access | Download PDF
Volume 74 | Issue 3 | Year 2026 | Article Id. IJETT-V74I3P122 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I3P122Multimodal Person Re-Identification using a Lightweight Residual Self-Organizing Maps InceptionNet Framework
Badireddygari Anurag Reddy, Deepika Ghai, Danvir Mandal
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 28 Jul 2025 | 31 Jan 2026 | 06 Feb 2026 | 28 Mar 2026 |
Citation :
Badireddygari Anurag Reddy, Deepika Ghai, Danvir Mandal, "Multimodal Person Re-Identification using a Lightweight Residual Self-Organizing Maps InceptionNet Framework," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 3, pp. 311-335, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I3P122
Abstract
Person Re-Identification (ReID) is one of the critical tasks in surveillance systems and security systems, aiming to match individuals across different non-overlapping camera views. Normal approaches struggle under changing various modality inputs, lighting conditions, and occlusion. The importance of multimodal learning has significantly improved Person Re-Identification performance by incorporating complementary visual, infrared, and skeletal features. The existing Re-Identification models, like DMIRL (Deep Multimodal InceptionNet Representation Learning), provide improved accuracy using multimodal fusion, and this model suffers from computational overhead and a lack of adaptability in dynamic real-world settings. Moreover, DMIRL’s reliance solely on inception-based feature extraction may miss topological feature distribution and inter-modal contextual relationships. This paper introduces RSI-Net, which is a lightweight yet powerful deep learning framework for person Re-Identification. This model combines Residual Learning, Self-Organizing Maps (SOMs), and Inception Learning for more effective multimodal feature extraction. To enable deeper networks, this model uses Inception modules to capture scale-variant features, Residual blocks, and SOMs to spatially organize latent features across modalities. Joint cross-entropy and Triplet loss objectives are used in attention-based multimodal fusion, which is applied before training. Various benchmark datasets used in this RSI net representation are Market-1501, DukeMTMC-reID, and CUHK03. The performance of the proposed model is compared with the existing model DMIRL and the baseline. The evaluation metrics used in this paper are Rank-1 accuracy and mAP while reducing model complexity. The proposed model mainly focuses on the limitations of DMIRL algorithms, and it reduces the training time by 25% and improves fusion stability with less modality loss. The proposed model is suitable for real-time deployments and surveillance applications.
Keywords
Person Re-Identification, Multimodal Deep Learning, Residual Learning, Self-Organizing Maps, Inception Networks.
References
[1] Yaobin Zhang
et al., “Graph based Spatial-Temporal Fusion for Multi-Modal Person
Re-Identification,” Proceedings
of the 31st ACM International Conference on Multimedia, pp. 3736-3744, 2023.
[CrossRef]
[Google Scholar]
[Publisher Link]
[2] Zhen Sun et
al., “FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person
Re-Identification,” arXiv
preprint, pp. 1-22, 2025.
[CrossRef]
[Google Scholar]
[Publisher
Link]
[3] Cuiqun Chen,
Mang Ye, and Ding Jiang, “Towards Modality-Agnostic Person Re-Identification
with Descriptive Query,” 2023
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 15128-15137, 2023.
[CrossRef]
[Google Scholar]
[Publisher Link]
[4] Can Su et
al., “Robust Indoor Person Re-Identification with Multimodal Training,” IEEE Internet of Things Journal, vol. 12, no. 14, pp. 26289-26302, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[5] Changshuo
Wang et al., “Looking Clearer with Text: A Hierarchical Context Blending
Network for Occluded Person Re-Identification,” IEEE
Transactions on Information Forensics and Security, vol. 20, pp. 4296-4307, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[6] Shutao Bai,
Hong Chang, and Bingpeng Ma, “Incorporating Texture and Silhouette for
Video-based Person Re-Identification,” Pattern
Recognition, vol. 156, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[7] Mulham
Fawakherji et al., “TextAug: Test Time Text Augmentation for Multimodal Person
Re-Identification,” 2024
IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, pp. 320-329, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[8] Xi Yang et
al., “TIENet: A Tri-Interaction Enhancement Network for Multimodal Person
Reidentification,” IEEE
Transactions on Neural Networks and Learning Systems, vol. 36, no. 6, pp. 9852-9863, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[9] Moncef Boujou et al., “In-Depth Analysis of GAF-Net: Comparative Fusion
Approaches in Video-based Person Re-Identification,” Algorithms, vol. 17, no. 8, pp. 1-26, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[10] Zi Wang et al., “Heterogeneous Test-Time Training
for Multi-Modal Person Re-Identification,” Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, pp. 5850-5858, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[11] Aihua Zheng et al., “Robust Multi-Modality Person
Re-Identification,” Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3529-3537, 2021.
[CrossRef]
[Google Scholar]
[Publisher Link]
[12] Suncheng Xiang et al., “Deep Multimodal
Representation Learning for Generalizable Person Re-Identification,” Machine Learning, vol. 113, no. 4, pp. 1921-1939, 2023.
[CrossRef]
[Google Scholar]
[Publisher Link]
[13] Xiangtian Zheng et al., “Multi-Modal Person
Re-Identification based on Transformer Relational Regularization,” Information Fusion, vol. 103, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[14] Di Wu et al., “LRMM: Low Rank Multi-Scale
Multi-Modal Fusion for Person Re-Identification based on RGB-NI-TI,” Expert Systems with Applications, vol. 263, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[15] Guang Han et al., “Text-To-Image Person
Re-Identification based on Multimodal Graph Convolutional Network,” IEEE Transactions on Multimedia, vol. 26, pp. 6025-6036, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[16] Yongkang Ding et al., “Decoupling Feature-Driven
and Multimodal Fusion Attention for Clothing-Changing Person Re-Identification,”
Artificial Intelligence
Review, vol. 58, no. 8, pp.
1-26, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[17] Shizhou Zhang et al., “Prompt-based Modality
Alignment for Effective Multi-Modal Object Re-Identification,” IEEE Transactions on Image
Processing, vol. 34, pp. 2450-2462,
2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[18] He Li et al., “All in One Framework for Multimodal
Re-Identification in the Wild,” 2024
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 17459-17469, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[19] Mingfu Xiong et al., “RFFR-Net: Robust Feature
Fusion and Reconstruction Network for Clothing-Change Person
Re-Identification,” Information
Fusion, vol. 118, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[20] Yongkang Ding et al., “Attention-Enhanced
Multimodal Feature Fusion Network for Clothes-Changing Person
Re-Identification,” Complex
and Intelligent Systems, vol.
11, no. 1, pp. 1-15, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[21] Cosimo Patruno et al., “Multimodal People Re-identification using 3D
Skeleton, Depth and Color Information,” IEEE
Access, vol. 12, pp.
174689-174704, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[22] Qianqian Wang et al., “Towards Unified Bijective
Image-Text Generation for Text-to-Image Person Re-Identification,” Knowledge-based Systems, vol. 325, 2025.
[CrossRef]
[Google Scholar]
[Publisher Link]
[23] Yongkang Ding et al., “Disentangled Body Features
for Clothing Change Person Re-Identification,” Multimedia
Tools and Applications, vol. 83,
no. 27, pp. 69693-69714, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[24] Yanbing Chen et al., “Person Re-Identification in
Special Scenes based on Deep Learning: A Comprehensive Survey,” Mathematics, vol. 12, no. 16, pp. 1-19, 2024.
[CrossRef]
[Google Scholar]
[Publisher Link]
[25] Chang Liu, and Shibao Zheng, “Exploring
Cross-Domain Techniques in Person Re-Identification: Challenges and Emerging
Trends,” 2024
International Conference on Image Processing, Computer Vision and Machine
Learning (ICICML), Shenzhen,
China, pp. 2013-2018, 2024.
[CrossRef] [Google Scholar]
[Publisher Link]
[26] Badireddygari Anurag Reddy, Danvir Mandal, and Bhaveshkumar
C. Dharmani, “Multimodal Feature-based Deep Learning Framework for Person
Re-Identification: Enhancing Models with InceptionNet Representation,” International Journal of
Engineering Trends and Technology, vol. 73, no. 7, pp. 34-51, 2025.
[CrossRef] [Publisher
Link]