Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University

Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University

  IJETT-book-cover           
  
© 2025 by IJETT Journal
Volume-73 Issue-11
Year of Publication : 2025
Author : Hanson Geraldi Pardede, Tuga Mauritsius
DOI : 10.14445/22315381/IJETT-V73I11P115

How to Cite?
Hanson Geraldi Pardede, Tuga Mauritsius,"Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University", International Journal of Engineering Trends and Technology, vol. 73, no. 11, pp.193-207, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I11P115

Abstract
XYZ University is an institution that relies on Bahan Ajar (BA) as the primary learning medium, which is mandatory for every student. However, in its implementation, numerous complaints related to BA continue to be reported. Currently, complaint handling at XYZ University still involves manual categorization by the customer service team. This practice leads to several issues, such as delayed complaint resolution, inaccurate problem handling, and the potential degradation of the university's reputation. This research aims to design and evaluate a model that enables XYZ University to automatically categorize BA-related complaints from students. This study proposes a novel approach by using the CRISP-DM framework and integrating Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) with the Naive Bayes (NB) machine learning algorithm, as well as applying a combination of hyperparameter customization to Neural Network (NN) and Support Vector Machine (SVM) algorithms to categorize BA-related complaints. The results show that the NN algorithm, using a combination of hyperparameters consisting of four hidden layers with sequential neuron counts of 512, 256, 128, and 64; a dropout rate of 0.4 on each hidden layer; batch normalization applied to each layer; a learning rate of 0.0005; ReLU activation; softmax on the output layer; CrossEntropyLoss as the loss function; Adam optimizer; and 200 epochs, achieved the best performance. The model evaluation resulted in an accuracy of 0.9196, a precision of 0.9200, a recall of 0.9196, and an F1 score of 0.9196.

Keywords
Text mining, Machine Learning, Categorization, Hyperparameters, CRISP-DM..

References
[1] Haris Ahmad Gozali, Mochamad Alfan Rosid, and Sumarno, “Classification of Student Complaints with the Naïve Bayes and Literature Methods,” Journal of Informatics, Network, and Computer Science, vol. 3, no. 1, pp. 22-26, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Muchamad Taufiq Anwar, Anggy Eka Pratiwi, and Khadijah Febriana Rukhmanti Udhayana, “Automatic Complaints Categorization using Random Forest and Gradient Boosting,” Advance Sustainable Science, Engineering and Technology (ASSET), vol. 3, no. 1, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Nuzulul Khairu Nissa, and Evi Yulianti, “Multi-label Text Classification of Indonesian Customer Reviews using Bidirectional Encoder Representations from Transformers Language Model,” International Journal of Electrical and Computer Engineering, vol. 13, no. 5, pp. 5641-5652, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Yessy Asri et al., “Sentiment Analysis Based on Indonesian Language Lexicon and IndoBERT on User Reviews of PLN Mobile Application,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 38, no. 1, pp. 677-688, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[5] S.L. Ting, W.H. Ip., and Albert H.C. Tsang, “Is Naïve Bayes a Good Classifier for Document Classification,” International Journal of Software Engineering and Its Applications, vol. 5, no 3, pp. 37-46, 2011.
[Google Scholar] [Publisher Link]
[6] Isti Surjandari et al., “Application of Text Mining for Classification of Textual Reports: A Study of Indonesia’s National Complaint Handling System,” Proceedings of the 2016 International Conference on Industrial Engineering and Operations Management, Kuala Lumpur, Malaysia, pp 1147-1156, 2016.
[Google Scholar] [Publisher Link]
[7] Manzhu Yu et al., “Deep Learning for Real-Time Social Media Text Classification for Situation Awareness using Hurricanes Sandy, Harvey, and Irma as Case Studies,” International Journal of Digital Earth, vol. 12, no. 11, pp. 1230-1247, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Bin Ning, Wu Junwei, and Hu Feng, “Spam Message Classification Based on the Naïve Bayes Classification Algorithm,” IAENG International Journal of Computer Science, vol. 46, no. 1, pp. 46-53, 2019.
[Google Scholar] [Publisher Link]
[9] Rio Wirawan, Erly Krisnanik, and Artika Arista, “Text Mining for News Forecasting on the Turnback Hoax Website,” International Journal on Informatics Visualization, vol. 8, no. 1, pp. 96-106, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Aurangzeb Khan et al., “A Review of Machine Learning Algorithms for Text-Documents Classification,” Journal of Advances in Information Technology, vol. 1, no. 1, pp. 4-20, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Sindhuja Penchala et al., “Unveiling Text Mining Potential: A Comparative Analysis of Document Classification Algorithms,” EPiC Series in Computing: Proceedings of 39th International Conference on Computers and their Applications, vol. 98, pp. 103-115, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Charan Singh Tejavath, and Tryambak Hirwarkar, “Analysis of Different Classification Algorithms for Text Data Mining,” Advances in Mathematics: Scientific Journal, vol. 9, no. 6, pp. 3477-3485, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Fajri Koto et al., “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-Trained Language Model for Indonesian NLP,” Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 757-770, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Max Kuhn, and Kjell Johnson, Applied Predictive Modeling, 1nd ed., Springer, New York, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Eric Bauer, and Ron Kohavi, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Machine Learning, vol. 36, no. 1, pp. 105-139, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Charu C. Aggarwal, and Cheng Xiang Zhai, An Introduction to Text Mining, Mining Text Data, Springer, Boston, pp. 1-10, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Elham Kariri et al., “Exploring the Advancements and Future Research Directions of Artificial Neural Networks: A Text Mining Approach,” Applied Sciences, vol. 13, no. 5, pp. 1-18, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Sayeda Muntaha Ferdous et al., “Sentiment Analysis in the Transformative Era of Machine Learning: A Comprehensive Review,” Statistics, Optimization and Information Computing, vol. 13, no. 1, pp. 331-346, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Ling Lin et al., “Let’s Make It Better: An Updated Model Interpreting International Student Satisfaction in China Based on a PLS-SEM Approach,” PLoS ONE, vol. 15, no. 11, pp. 1-13, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Idris Muh, Abidin Munirul, and Willya Evra, “Justice in Handling Complaints and Its Impact on Satisfaction and Loyalty in Higher Education,” Perspectives of Science and Education, vol. 61, no. 1, pp. 24-39, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Joseph F. Hair et al., A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), SAGE Publications, 2021.
[Google Scholar] [Publisher Link]
[22] Hermanto Hermanto, Ali Mustopa, and Antonius Yadi Kuntoro, “Algoritma Klasifikasi Naive Bayes Dan Support Vector Machine Dalam Layanan Komplain Mahasiswa,” JITK (Journal of Computer Science and Technology), vol. 5, no. 2, pp. 211-220, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Levent Çallı, and Fatih Çallı, “Understanding Airline Passengers during COVID-19 Outbreak to Improve Service Quality: Topic Modeling Approach to Complaints with Latent Dirichlet Allocation Algorithm,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2677, no. 4, pp. 656-673, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Jordana Bazzan et al., “An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques,” Buildings, vol. 13, no. 3, pp. 1-22, 2023.
[CrossRef] [Google Scholar] [Publisher Link]