Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University

Hanson Geraldi Pardede; Tuga Mauritsius

doi:https://doi.org/10.14445/22315381/IJETT-V73I11P115

Research Article | Open Access | Download PDF

Volume 73 | Issue 11 | Year 2025 | Article Id. IJETT-V73I11P115 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I11P115

Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University

Hanson Geraldi Pardede, Tuga Mauritsius

Received	Revised	Accepted	Published
25 Jul 2025	30 Oct 2025	10 Nov 2025	25 Nov 2025

Citation :

Hanson Geraldi Pardede, Tuga Mauritsius, "Application of Text Mining in Categorizing Complaints Related to Teaching Materials at XYZ University," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 11, pp. 193-207, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I11P115

Abstract

XYZ University is an institution that relies on Bahan Ajar (BA) as the primary learning medium, which is mandatory for every student. However, in its implementation, numerous complaints related to BA continue to be reported. Currently, complaint handling at XYZ University still involves manual categorization by the customer service team. This practice leads to several issues, such as delayed complaint resolution, inaccurate problem handling, and the potential degradation of the university's reputation. This research aims to design and evaluate a model that enables XYZ University to automatically categorize BA-related complaints from students. This study proposes a novel approach by using the CRISP-DM framework and integrating Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) with the Naive Bayes (NB) machine learning algorithm, as well as applying a combination of hyperparameter customization to Neural Network (NN) and Support Vector Machine (SVM) algorithms to categorize BA-related complaints. The results show that the NN algorithm, using a combination of hyperparameters consisting of four hidden layers with sequential neuron counts of 512, 256, 128, and 64; a dropout rate of 0.4 on each hidden layer; batch normalization applied to each layer; a learning rate of 0.0005; ReLU activation; softmax on the output layer; CrossEntropyLoss as the loss function; Adam optimizer; and 200 epochs, achieved the best performance. The model evaluation resulted in an accuracy of 0.9196, a precision of 0.9200, a recall of 0.9196, and an F1 score of 0.9196.

Keywords

Text mining, Machine Learning, Categorization, Hyperparameters, CRISP-DM.

References

[1] Koceilah Rekouche, “Early Phishing,” arXiv Preprint, pp. 1-9, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[2] “Phishing Activity Trends Report,” Summary - 1st Quarter 2025, Anti-Phishing Working Group, 2025.
[Publisher Link]
[3] Darren E. Tromblay, Federal Bureau of Investigation, The Handbook of Homeland Security, 1st ed., CRC Press, 2023.
[Google Scholar] [Publisher Link]
[4] Maria Sameen, Kyunghyun Han, and Seong Oun Hwang, “Phishhaven-An Efficient Real-Time AI Phishing URLs Detection System,” IEEE Access, vol. 8, pp. 83425-83443, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Beauden John, “Adapting to Advanced Threats: Celery Trap's Approach to Combating AI-Generated Phishing Campaigns,” pp. 1-9, 2025.
[Google Scholar]
[6] Alejandro Correa Bahnsen et al., “DeepPhish: Simulating Malicious AI,” 2018 APWG Symposium on Electronic Crime Research, pp. 1-8, 2018.
[Google Scholar]
[7] Nguyet Quang Do et al., “Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions,” IEEE Access, vol. 10, pp. 36429-36463, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] R.J. Van Geest et al., “The Applicability of a Hybrid Framework for Automated Phishing Detection,” Computers and Security, vol. 139, pp. 1-17, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Y. Bhanu Prasad, and Venkatesulu Dondeti, “PDSMV3-DCRNN: A Novel Ensemble Deep Learning Framework for Enhancing Phishing Detection and URL Extraction,” Computers and Security, vol. 148, pp. 1-16, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Felipe Castaño et al., “PhiKitA: Phishing Kit Attacks Dataset for Phishing Websites Identification,” IEEE Access, vol. 11, pp. 40779-40789, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Abdul Karim et al., “Phishing Detection System through Hybrid Machine Learning Based on URL,” IEEE Access, vol. 11, pp. 36805-36822, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Antonio Maci et al., “Unbalanced Web Phishing Classification through Deep Reinforcement Learning,” Computers, vol. 12, no. 6, pp. 1-30, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Orel Lavie, Asaf Shabtai, and Gilad Katz, “A Transferable and Automatic Tuning of Deep Reinforcement Learning for Cost Effective Phishing Detection,” arXiv Preprint, pp. 1-43, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Hasan Kamal et al., Reinforcement Learning Model for Detecting Phishing Websites, Cybersecurity and Artificial Intelligence, Springer, Cham, pp. 309-326, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Grega Vrbančič, Iztok Fister, and Vili Podgorelec, “Datasets for Phishing Websites Detection” Data in Brief, vol. 33, pp. 1-7, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Rodolfo Vieira Valentim et al., “URLGEN-Toward Automatic URL Generation Using GANs,” IEEE Transactions on Network and Service Management, vol. 20, no. 3, pp. 3734-3746, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Richard S. Sutton, and Andrew G. Barto, Reinforcement Learning: An Introduction, 2nd ed., Adaptive Computation and Machine Learning Series, The MIT Press, 2018.
[Google Scholar] [Publisher Link]
[18] Abdul Basit et al., “A Comprehensive Survey of AI-Enabled Phishing Attacks Detection Techniques,” Telecommunication Systems, vol. 76, no. 1, pp. 139-154, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Ankit Kumar Jain, and B.B. Gupta, “Phishing Detection: Analysis of Visual Similarity Based Approaches,” Security and Communication Networks, vol. 2017, pp. 1-20, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Haidar Jabbar, and Samir Al-Janabi, “AI-Driven Phishing Detection: Enhancing Cybersecurity with Reinforcement Learning,” Journal of Cybersecurity and Privacy, vol. 5, no. 2, pp. 1-21, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Subhash Ariyadasa, Shantha Fernando, and Subha Fernando, “A Reinforcement Learning-Based Intelligent Anti-Phishing Solution to Detect Spoofed Website Attacks,” International Journal of Information Security, vol. 23, no. 2, pp. 1055-1076, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Richard S. Sutton, and Andrew G. Barto, “Reinforcement Learning,” Journal of Cognitive Neuroscience, vol. 11, no. 1, pp. 126-134, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[23] H.S. Harisudhan, NLP Transformers-The Backbone of Today’s Language Models, Medium, 2025. [Online]. Available: https://medium.com/@speaktoharisudhan/nlp-transformers-the-backbone-of-todays-language-models-d752a2bf0752
[24] J.O. Schneppat, Transformer Neural Networks, Schneppat AI, 2017. [Online]. Available: https://schneppat.com/transformer-neural-networks.html