Incorporation of Reinforcement Learning in Ant Colony Optimization Algorithm: Mathematical Analysis

Nami Susan Kurian; B.Rajesh Shyamala Devi

doi:https://doi.org/10.14445/22315381/IJETT-V74I5P110

Research Article | Open Access | Download PDF

Volume 74 | Issue 5 | Year 2026 | Article Id. IJETT-V74I5P110 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I5P110

Incorporation of Reinforcement Learning in Ant Colony Optimization Algorithm: Mathematical Analysis

Nami Susan Kurian, B.Rajesh Shyamala Devi

Received	Revised	Accepted	Published
06 Oct 2025	17 Feb 2026	28 Feb 2026	30 May 2026

Citation :

Nami Susan Kurian, B.Rajesh Shyamala Devi, "Incorporation of Reinforcement Learning in Ant Colony Optimization Algorithm: Mathematical Analysis," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 5, pp. 148-168, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I5P110

Abstract

The research investigates the incorporation of Reinforcement Learning (RL) techniques into the metaheuristic Ant Colony Optimization tuned Traveling Salesman Problem (ACO-TSP) algorithm for data collection in Wireless Sensor Networks (WSNs) using the Mobile Sink (MS), aiming to enhance adaptiveness, intelligence, and decision-making efficiency. RL is an approach to machine learning where the algorithm learns using a reward-punishment technique, and the agent makes decisions through repeated interactions with the environment. The primary constraint of WSN is its limited energy, which results in challenging implementations, and hence, competent utilization of resources is required to ensure network longevity. Traditional methods in wireless sensor networks follow scheduled sleep, predefined routes, low adaptability, and no learning capability. Reinforcement learning maximizes the network lifetime, improves data collection, learns from the environment to handle dynamic topologies, which in turn reduces human interaction. In this article, a mathematical analysis of how to use reinforcement Q-Learning in ACO to find the optimal path is presented. Additionally, an analysis on how the mobile sink traverses through the Q-learning-based scheduled best path is done, and the suggested approach, Ant Colony Optimization with Mobile Sink and Q-learning algorithm (ACOMS-Q), is compared with prior research on different metrices, and it is found to be effective. In ACOMS-Q, the reinforcement learning algorithm learns and finds the active nodes based on node behavior such as buffer occupancy and energy level over time, reducing the tour length of the mobile sink, ensuring network longevity, and reducing delay.

Keywords

Ant Colony Optimization, Mobile Sink, Pheromone Level, REINFORCEMENT Q-LEARNING, Traveling Salesperson Problem.

References

[1] Dionisis Kandris et al., “Applications of Wireless Sensor Networks: An Up-to-Date Survey,” Applied System Innovation, vol. 3, no. 1, pp. 1-24, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[2] Marco Dorigo, Mauro Birattari, and Thomas Stutzle, “Ant Colony Optimization,” IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28-39, 2006.
[CrossRef] [Google Scholar] [Publisher Link]

[3] Gilbert Laporte, “The Traveling Salesman Problem: An Overview of Exact and Approximate Algorithms,” European Journal of Operational Research, vol. 59, no. 2, pp. 231-247, 1992.
[CrossRef] [Google Scholar] [Publisher Link]

[4] Oday Al-Jerew, Nizar Al Bassam, and Abeer Alsadoon, “Reinforcement Learning for Delay Tolerance and Energy Saving in Mobile Wireless Sensor Networks,” IEEE Access, vol. 11, pp. 19819-19835, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Islam Mosavvar, and Ali Ghaffari, “Data Aggregation in Wireless Sensor Networks using Firefly Algorithm,” Wireless Personal Communications, vol. 104, no. 1, pp. 307-324, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[6] Vishal Kumar Arora, Vishal Sharma, and Monika Sachdeva, “ACO Optimized Self-Organized Tree-based Energy Balance Algorithm for Wireless Sensor Network: AOSTEB,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 12, pp. 4963-4975, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[7] Adam Kozłowski, and Janusz Sosnowski, “Energy Efficiency Trade-Off Between Duty-Cycling and Wake-Up Radio Techniques in IoT Networks,” Wireless Personal Communications, vol. 107, no. 4, pp. 1951-1971, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[8] Praveen Kumar Donta, Tarachand Amgoth, and Chandra Sekhara Rao Annavarapu, “An Extended ACO-based Mobile Sink Path Determination in Wireless Sensor Networks,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 10, pp. 8991-9006, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[9] Vishal Kumar Arora, Vishal Sharma, and Monika Sachdeva, “A Multiple Pheromone Ant Colony Optimization Scheme for Energy-Efficient Wireless Sensor Networks,” Soft Computing, vol. 24, no. 1, pp. 543-553, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[10] Palvinder Singh Mann, and Satvir Singh, “Optimal Node Clustering and Scheduling in Wireless Sensor Networks,” Wireless Personal Communications, vol. 100, no. 3, pp. 683-708, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[11] W.R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy-Efficient Communication Protocol for Wireless Microsensor Networks,” Proceedings of the 33^rd Annual Hawaii International Conference on System Sciences, Maui, HI, USA, vol. 2, pp. 1-10, 2000.
[CrossRef] [Google Scholar] [Publisher Link]

[12] O. Younis, and S. Fahmy, “HEED: A Hybrid, Energy-Efficient, Distributed Clustering Approach for Ad Hoc Sensor Networks,” IEEE Transactions on Mobile Computing, vol. 3, no. 4, pp. 366-379, 2004.
[CrossRef] [Google Scholar] [Publisher Link]

[13] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant System: Optimization by a Colony of Cooperating Agents,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 26, no. 1, pp. 29-41, 1996.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Jing-hui Zhong, and Jun Zhang, “Ant Colony Optimization Algorithm for Lifetime Maximization in Wireless Sensor Network with Mobile Sink,” GECCO’12: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation, Association for Computing Machinery, New York, NY, United States, pp.1199-1204, 2012. [CrossRef] [Google Scholar] [Publisher Link]

[15] Weimin Wen et al., “EAPC: Energy-Aware Path Construction for Data Collection using Mobile Sink in Wireless Sensor Networks,” IEEE Sensors Journal, vol. 18, no. 2, pp. 890-901, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[16] Jin Wang et al., “An Improved Ant Colony Optimization-based Approach with Mobile Sink for Wireless Sensor Networks,” The Journal of Supercomputing, vol. 74, no. 12, pp. 6633-6645, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[17] D. Praveen Kumar, Amgoth Tarachand, and Annavarapu Chandra Sekhara Rao, “ACO-based Mobile Sink Path Determination for Wireless Sensor Networks Under Non-Uniform Data Constraints,” Applied Soft Computing, vol. 69, pp. 528-540, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[18] Minal Shahakar, S.A. Mahajan, and Lalit Patil, “Enhancing Resource Utilization and Load Distribution with ACO and Reinforcement Learning in Dynamic Computing Infrastructure,” Panamerican Mathematical Journal, vol. 34, no. 1, pp. 14-24, 2024.
[CrossRef] [Publisher Link]

[19] Zhou Wu, and Gang Wan, “An Enhanced ACO-based Mobile Sink Path Determination for Data Gathering in Wireless Sensor Networks,” EURASIP Journal on Wireless Communications and Networking, vol. 2022, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]

[20] Muralitharan Krishnan, and Yongdo Lim, “Reinforcement Learning-based Dynamic Routing using Mobile Sink for Data Collection in WSNs and IoT Applications,” Journal of Network and Computer Applications, vol. 194, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[21] Madana Srinivas, and Tarachand Amgoth, “Data Acquisition in Large-Scale Wireless Sensor Networks using Multiple Mobile Sinks: A Hierarchical Clustering Approach,” Wireless Networks, vol. 28, no. 2, pp. 603-619, 2022.
[CrossRef] [Google Scholar] [Publisher Link]

[22] Praveen Kumar Donta et al., “Data Collection and Path Determination Strategies for Mobile Sink in 3D WSNs,” IEEE Sensors Journal, vol. 20, no. 4, pp. 2224-2233, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[23] Vikram Rajpoot et al., “Analysis of Machine Learning based LEACH Robust Routing in the Edge Computing Systems,” Computers and Electrical Engineering, vol. 96, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[24] Peng Xiong, Dan He, and Tiankun Lu, “A Q-Learning based Target Coverage Algorithm for Wireless Sensor Networks,” Mathematics, vol. 13, no. 3, pp. 1-14, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[25] Yiyang Liu et al., “Improved Dyna-Q: A Reinforcement Learning Method Focused via Heuristic Graph for AGV Path Planning in Dynamic Environments,” Drones, vol. 6, no. 11, pp. 1-17, 2022.
[CrossRef] [Google Scholar] [Publisher Link]

[26] Benjamin Freed et al., “Unifying Model-based and Model-Free Reinforcement Learning with Equivalent Policy Sets,” Reinforcement Learning Conference, pp. 1-19, 2024.
[Google Scholar] [Publisher Link]

[27] Xu Wang et al., “Deep Reinforcement Learning: A Survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5064-5078, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[28] Lieping Zhang et al., “A Self‐Adaptive Reinforcement‐Exploration Q‐Learning Algorithm,” Symmetry, vol. 13, no. 6, pp. 1-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[29] Bandar Al-Ghamdi, Marwane Ayaida, and Hacène Fouchal, “Scheduling Approaches for Wireless Sensor Networks,” 2015 15^thInternational Conference on Innovations for Community Services (I4CS), Nuremberg, Germany, pp. 1-6, 2015.
[CrossRef] [Google Scholar] [Publisher Link]

[30] Zhenchun Wei et al., “A Q-Learning Algorithm for Task Scheduling based on Improved SVM in Wireless Sensor Networks,” Computer Networks, vol. 161, pp. 138-149, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[31] Yunsick Sung, Eunyoung Ahn, and Kyungeun Cho, “Q-Learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks,” Wireless Personal Communications, vol. 73, no. 2, pp. 257-273, 2013.
[CrossRef] [Google Scholar] [Publisher Link]

[32] Hui Li et al., “Energy Efficient Mobile Sink Driven Data Collection in Wireless Sensor Network with Nonuniform Data,” Scientific Reports, vol. 14, no. 1, pp. 1-19, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[33] Shashank Singh, Veena Anand, and Pallav Kumar Bera, “A Delay-Tolerant Low-Duty Cycle Scheme in Wireless Sensor Networks for IoT Applications,” International Journal of Cognitive Computing in Engineering, vol. 4, pp. 194-204, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[34] Ouadoudi Zytoune, Youssef Fakhri, and Driss Aboutajdine, “Lifetime Optimization for Wireless Sensor Networks,” 2009 IEEE/ACS International Conference on Computer Systems and Applications, Rabat, Morocco, pp. 816-820, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[35] Arif Ullah et al., “A Hybrid Approach for Energy Consumption and Improvement in Sensor Network Lifespan in Wireless Sensor Networks,” Sensors, vol. 24, no. 5, pp. 1-18, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[36] G. Pushpa et al., “Optimizing Coverage in Wireless Sensor Networks using Deep Reinforcement Learning with Graph Neural Networks,” Scientific Reports, vol. 15, no. 1, pp. 1-21, 2025.
[CrossRef] [Google Scholar] [Publisher Link]

[37] M. Senthamilselvi, and C. Ranjeeth Kumar, “Multi-Agent based DRL with Federated Learning for Data Transmission in Mobile Sensor Networks,” Automatika, vol. 66, no. 3, pp. 475-490, 2025.
[CrossRef] [Google Scholar] [Publisher Link]