Research Article | Open Access | Download PDF
Volume 74 | Issue 5 | Year 2026 | Article Id. IJETT-V74I5P110 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I5P110Incorporation of Reinforcement Learning in Ant Colony Optimization Algorithm: Mathematical Analysis
Nami Susan Kurian, B.Rajesh Shyamala Devi
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 06 Oct 2025 | 17 Feb 2026 | 28 Feb 2026 | 30 May 2026 |
Citation :
Nami Susan Kurian, B.Rajesh Shyamala Devi, "Incorporation of Reinforcement Learning in Ant Colony Optimization Algorithm: Mathematical Analysis," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 5, pp. 148-168, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I5P110
Abstract
The research investigates the incorporation of Reinforcement Learning (RL) techniques into the metaheuristic Ant Colony Optimization tuned Traveling Salesman Problem (ACO-TSP) algorithm for data collection in Wireless Sensor Networks (WSNs) using the Mobile Sink (MS), aiming to enhance adaptiveness, intelligence, and decision-making efficiency. RL is an approach to machine learning where the algorithm learns using a reward-punishment technique, and the agent makes decisions through repeated interactions with the environment. The primary constraint of WSN is its limited energy, which results in challenging implementations, and hence, competent utilization of resources is required to ensure network longevity. Traditional methods in wireless sensor networks follow scheduled sleep, predefined routes, low adaptability, and no learning capability. Reinforcement learning maximizes the network lifetime, improves data collection, learns from the environment to handle dynamic topologies, which in turn reduces human interaction. In this article, a mathematical analysis of how to use reinforcement Q-Learning in ACO to find the optimal path is presented. Additionally, an analysis on how the mobile sink traverses through the Q-learning-based scheduled best path is done, and the suggested approach, Ant Colony Optimization with Mobile Sink and Q-learning algorithm (ACOMS-Q), is compared with prior research on different metrices, and it is found to be effective. In ACOMS-Q, the reinforcement learning algorithm learns and finds the active nodes based on node behavior such as buffer occupancy and energy level over time, reducing the tour length of the mobile sink, ensuring network longevity, and reducing delay.
Keywords
Ant Colony Optimization, Mobile Sink, Pheromone Level, REINFORCEMENT Q-LEARNING, Traveling Salesperson Problem.
References
[1] Dionisis Kandris et al., “Applications of Wireless
Sensor Networks: An Up-to-Date Survey,” Applied System Innovation,
vol. 3, no. 1, pp. 1-24, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Marco Dorigo, Mauro
Birattari, and Thomas Stutzle, “Ant Colony Optimization,” IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28-39, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Gilbert Laporte, “The
Traveling Salesman Problem: An Overview of Exact and Approximate Algorithms,” European Journal of Operational Research, vol. 59, no. 2, pp. 231-247, 1992.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Oday Al-Jerew, Nizar Al
Bassam, and Abeer Alsadoon, “Reinforcement Learning for Delay Tolerance and
Energy Saving in Mobile Wireless Sensor Networks,” IEEE Access, vol. 11,
pp. 19819-19835, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Islam Mosavvar, and Ali
Ghaffari, “Data Aggregation in Wireless Sensor Networks using Firefly
Algorithm,” Wireless Personal Communications, vol. 104, no. 1, pp.
307-324, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Vishal Kumar Arora, Vishal
Sharma, and Monika Sachdeva, “ACO Optimized Self-Organized Tree-based Energy
Balance Algorithm for Wireless Sensor Network: AOSTEB,” Journal of Ambient
Intelligence and Humanized Computing, vol. 10, no. 12, pp. 4963-4975, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Adam Kozłowski, and Janusz
Sosnowski, “Energy Efficiency Trade-Off Between Duty-Cycling and Wake-Up Radio
Techniques in IoT Networks,” Wireless Personal Communications, vol. 107,
no. 4, pp. 1951-1971, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Praveen Kumar Donta,
Tarachand Amgoth, and Chandra Sekhara Rao Annavarapu, “An Extended ACO-based
Mobile Sink Path Determination in Wireless Sensor Networks,” Journal of
Ambient Intelligence and Humanized Computing, vol. 12, no. 10, pp.
8991-9006, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Vishal Kumar Arora, Vishal Sharma, and Monika
Sachdeva, “A Multiple Pheromone Ant Colony Optimization Scheme for
Energy-Efficient Wireless Sensor Networks,” Soft Computing, vol. 24, no.
1, pp. 543-553, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Palvinder Singh Mann, and
Satvir Singh, “Optimal Node Clustering and Scheduling in Wireless Sensor
Networks,” Wireless Personal Communications, vol. 100, no. 3, pp.
683-708, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] W.R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy-Efficient
Communication Protocol for Wireless Microsensor Networks,” Proceedings
of the 33rd Annual Hawaii International Conference on System
Sciences, Maui, HI, USA, vol. 2, pp. 1-10, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[12] O. Younis, and S. Fahmy, “HEED: A Hybrid, Energy-Efficient, Distributed
Clustering Approach for Ad Hoc Sensor Networks,” IEEE Transactions on
Mobile Computing, vol. 3, no. 4, pp. 366-379, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[13] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant System: Optimization by a
Colony of Cooperating Agents,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), vol. 26, no. 1, pp.
29-41, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Jing-hui Zhong, and Jun Zhang, “Ant Colony Optimization Algorithm for
Lifetime Maximization in Wireless Sensor Network with Mobile Sink,” GECCO’12:
Proceedings of the Fourteenth International Conference on Genetic and
Evolutionary Computation, Association
for Computing Machinery, New York, NY, United States, pp.1199-1204,
2012. [CrossRef] [Google Scholar] [Publisher Link]
[15] Weimin Wen et al., “EAPC:
Energy-Aware Path Construction for Data Collection using Mobile Sink in
Wireless Sensor Networks,” IEEE Sensors Journal, vol. 18, no. 2, pp.
890-901, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Jin Wang et al., “An
Improved Ant Colony Optimization-based Approach with Mobile Sink for Wireless
Sensor Networks,” The Journal of Supercomputing, vol. 74, no. 12, pp.
6633-6645, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[17] D. Praveen Kumar, Amgoth
Tarachand, and Annavarapu Chandra Sekhara Rao, “ACO-based Mobile Sink Path
Determination for Wireless Sensor Networks Under Non-Uniform Data Constraints,”
Applied Soft Computing, vol. 69, pp. 528-540, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Minal Shahakar, S.A. Mahajan, and Lalit Patil, “Enhancing Resource
Utilization and Load Distribution with ACO and Reinforcement Learning in
Dynamic Computing Infrastructure,” Panamerican Mathematical Journal,
vol. 34, no. 1, pp. 14-24, 2024.
[CrossRef] [Publisher Link]
[19] Zhou Wu, and Gang Wan, “An
Enhanced ACO-based Mobile Sink Path Determination for Data Gathering in
Wireless Sensor Networks,” EURASIP Journal on Wireless Communications and
Networking, vol. 2022, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Muralitharan Krishnan, and
Yongdo Lim, “Reinforcement Learning-based Dynamic Routing using Mobile Sink for
Data Collection in WSNs and IoT Applications,” Journal of Network and
Computer Applications, vol. 194, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Madana Srinivas, and
Tarachand Amgoth, “Data Acquisition in Large-Scale Wireless Sensor Networks
using Multiple Mobile Sinks: A Hierarchical Clustering Approach,” Wireless
Networks, vol. 28, no. 2, pp. 603-619, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Praveen Kumar Donta et al.,
“Data Collection and Path Determination Strategies for Mobile Sink in 3D WSNs,”
IEEE Sensors Journal, vol. 20, no. 4, pp. 2224-2233, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Vikram Rajpoot et al.,
“Analysis of Machine Learning based LEACH Robust Routing in the Edge Computing
Systems,” Computers and Electrical Engineering, vol. 96, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Peng Xiong, Dan He, and
Tiankun Lu, “A Q-Learning based Target Coverage Algorithm for Wireless Sensor
Networks,” Mathematics, vol. 13, no. 3, pp. 1-14, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Yiyang Liu et al.,
“Improved Dyna-Q: A Reinforcement Learning Method Focused via Heuristic Graph
for AGV Path Planning in Dynamic Environments,” Drones, vol. 6, no. 11,
pp. 1-17, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Benjamin Freed et al., “Unifying Model-based and Model-Free
Reinforcement Learning with Equivalent Policy Sets,” Reinforcement Learning
Conference, pp. 1-19, 2024.
[Google Scholar] [Publisher Link]
[27] Xu Wang et al., “Deep
Reinforcement Learning: A Survey,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 35, no. 4, pp. 5064-5078, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Lieping Zhang et al., “A
Self‐Adaptive Reinforcement‐Exploration Q‐Learning Algorithm,” Symmetry,
vol. 13, no. 6, pp. 1-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Bandar Al-Ghamdi, Marwane Ayaida, and Hacène Fouchal,
“Scheduling Approaches for Wireless Sensor Networks,” 2015 15th International Conference on Innovations for
Community Services (I4CS),
Nuremberg, Germany, pp. 1-6, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Zhenchun Wei et al., “A
Q-Learning Algorithm for Task Scheduling based on Improved SVM in Wireless
Sensor Networks,” Computer Networks, vol. 161, pp. 138-149, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Yunsick Sung, Eunyoung Ahn,
and Kyungeun Cho, “Q-Learning Reward Propagation Method for Reducing the Transmission
Power of Sensor Nodes in Wireless Sensor Networks,” Wireless Personal
Communications, vol. 73, no. 2, pp. 257-273, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Hui Li et al., “Energy
Efficient Mobile Sink Driven Data Collection in Wireless Sensor Network with
Nonuniform Data,” Scientific Reports, vol. 14, no. 1, pp. 1-19, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Shashank Singh, Veena
Anand, and Pallav Kumar Bera, “A Delay-Tolerant Low-Duty Cycle Scheme in
Wireless Sensor Networks for IoT Applications,” International Journal of
Cognitive Computing in Engineering, vol. 4, pp. 194-204, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Ouadoudi Zytoune, Youssef Fakhri, and Driss Aboutajdine, “Lifetime
Optimization for Wireless Sensor Networks,” 2009 IEEE/ACS
International Conference on Computer Systems and Applications, Rabat, Morocco, pp. 816-820, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Arif Ullah et al., “A
Hybrid Approach for Energy Consumption and Improvement in Sensor Network
Lifespan in Wireless Sensor Networks,” Sensors, vol. 24, no. 5, pp.
1-18, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[36] G. Pushpa et al., “Optimizing Coverage in Wireless
Sensor Networks using Deep Reinforcement Learning with Graph Neural
Networks,” Scientific Reports, vol. 15, no. 1, pp. 1-21, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[37] M.
Senthamilselvi, and C. Ranjeeth Kumar, “Multi-Agent based DRL with Federated
Learning for Data Transmission in Mobile Sensor Networks,” Automatika,
vol. 66, no. 3, pp. 475-490, 2025.
[CrossRef] [Google
Scholar]
[Publisher Link]