Origin Aware Dynamic Load Balancing Algorithm for Performance Enhancement of NUMA Multiprocessor Systems

D. A. Mehta; Priyesh Kanungo

doi:https://doi.org/10.14445/22315381/IJETT-V73I2P102

Research Article | Open Access | Download PDF

Volume 73 | Issue 2 | Year 2025 | Article Id. IJETT-V73I2P102 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I2P102

Origin Aware Dynamic Load Balancing Algorithm for Performance Enhancement of NUMA Multiprocessor Systems

D. A. Mehta, Priyesh Kanungo

Received	Revised	Accepted	Published
02 Aug 2024	11 Dec 2024	17 Dec 2024	21 Feb 2025

Citation :

D. A. Mehta, Priyesh Kanungo, "Origin Aware Dynamic Load Balancing Algorithm for Performance Enhancement of NUMA Multiprocessor Systems," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 2, pp. 9-21, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I2P102

Abstract

The process selection policy of the linux load balancer pays no attention to the origin of the processes while selecting them for migration in NUMA Multiprocessor Systems. Consequently, the migrated processes may experience large memory latencies, and the load balancer may degrade the system performance, particularly when the number of memory access levels is large. This paper proposes a novel load balancing algorithm for NUMA Multiprocessors that attempts to keep the processes on or near their originating nodes and thereby reduces the memory access overheads to zero or minimum, resulting in significant performance gain (ranging from 7 to 23% for various NUMA systems) over the existing load balancer.

Keywords

Dynamic load balancing, Load balancer, NUMA, Scheduling domain, Process migration, Memory Access Level, Memory access overhead.

References

[1] Martin J. Bligh et al., “Linux on NUMA Systems,” Linux Symposium, vol. 1, pp. 89-102, 2004.
[Google Scholar] [Publisher Link]
[2] Mei-Ling Chiang et al., “Enhancing Inter-Node Process Migration for Load Balancing on Linux-Based NUMA Multicore Systems,” 2018 IEEE 42nd Annual Computer Software and Applications Conference, Tokyo, Japan, pp. 394-399, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[3] M. Correa et al., “Multilevel Load Balancing in NUMA Computers,” Technical Report Series, PPGCC-FACIN-PUCRS, Brazil, no. 49, pp.1-22, 2005.
[Google Scholar] [Publisher Link]
[4] Alexey Paznikov, “Optimization of Thread Affinity and Memory Affinity for Remote Core Locking Synchronization in Multithreaded Programs for Multicore Computer Systems,” Vibroengineering Procedia, vol. 12, pp. 213-218, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Matthew Dobson et al., “Linux Support for NUMA Hardware,” Linux Symposium, pp. 169-184, 2003.
[Google Scholar] [Publisher Link]
[6] Christoph Lameter, “Local and Remote Memory: Memory in a Linux/NUMA System,” Linux Symposium, pp. 1-25, 2006.
[Google Scholar] [Publisher Link]
[7] Ilaria Di Gennaro, Alessandro Pellegrini, and Francesco Quaglia, “OS-Based NUMA Optimization: Tackling the Case of Truly Multithread Applications with Non-Partitioned Virtual Page Accesses,” 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, pp. 291-300, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Isaac Sánchez Barrera et al., “Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies,” Proceedings of the 2018 International Conference on Supercomputing, Beijing, China, pp. 207-217, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Ye Liu, Shinpei Kato, and Masato Edahiro, “Optimization of the Load Balancing Policy for Tiled Many-Core Processors,” IEEE Access, vol. 7, pp. 10176-10188, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Matthias Diener et al., “Kernel-Based Thread and Data Mapping for Improved Memory Affinity,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 9, pp. 2653-2666, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Jean-Pierre Lozi et al., “The Linux Scheduler: A Decade of Wasted Cores,” Proceedings of the Eleventh European Conference on Computer Systems, London United Kingdom, pp. 1-16, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Omar Kermia, and Yves Sorel, “Load Balancing and Efficient Memory Usage for Homogeneous Distributed Real-Time Embedded Systems,” 2008 International Conference on Parallel Processing – Workshops, Portland, OR, USA, pp. 39-46, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova, “Thread and Memory Placement on NUMA Systems: Asymmetry Matters,” 2015 USENIX Annual Technical Conference (USENIC ATC ’15), Santa Clara, CA, USA, pp. 276-289, 2015.
[Publisher Link]
[14] Laércio L. Pilla et al., “A Hierarchical Approach for Load Balancing on Parallel Multi-Core Systems,” 2012 41st International Conference on Parallel Processing, Pittsburgh, PA, USA pp. 118-127, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan, “Tumbler: An Effective Load Balancing Technique for MultiCPU Multicore Systems,” ACM Transactions on Architecture and Code Optimization, vol. 12, no. 4, pp. 1-24, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Suresh B. Siddha, Sched: New Sched Domain for Representing Multicore, 2006. [Online]. Available: https://lwn.net/Articles/169277/ [17] Li Wang et al., “NUMA-Aware Scalable and Efficient In-Memory Aggregation on Large Domains,” IEEE-Transactions on Knowledge and Data Engineering, vol. 27, no. 4, pp. 1071-1084, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Saleh A. Khawatreh, “An Efficient Algorithm for Load Balancing in Multiprocessor Systems,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 3, pp. 160-164, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Mei-Ling Chiang et al., “Memory-Aware Kernel Mechanism and Policies for Improving Inter-Node Load Balancing on NUMA Systems,” Software: Practice and Experience, vol. 49, no. 10, pp. 1485-1508, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Mei-Ling Chiang, and Wei-Lun Su, “Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems,” Applied Sciences, vol. 11, no. 14, pp. 1-22, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Shreelekha Pandey, “Simulator for Linux Scheduler and Load Balancer for NUMA Multiprocessor Architectures,” ME Thesis, Shri G. S. Institute of Technology and Science, 2009.
[Google Scholar]