Toward Stacking Ensemble Based Bipartite Sentiment Classification of Hindi Movie Review Text
Toward Stacking Ensemble Based Bipartite Sentiment Classification of Hindi Movie Review Text |
||
|
||
© 2024 by IJETT Journal | ||
Volume-72 Issue-5 |
||
Year of Publication : 2024 | ||
Author : Ankita Sharma, Udayan Ghose |
||
DOI : 10.14445/22315381/IJETT-V72I5P134 |
How to Cite?
Ankita Sharma, Udayan Ghose, "Toward Stacking Ensemble Based Bipartite Sentiment Classification of Hindi Movie Review Text," International Journal of Engineering Trends and Technology, vol. 72, no. 5, pp. 332-345, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I5P134
Abstract
Sentiment analysis has significantly progressed in resource-rich languages like English, but research in Hindi is still advancing. Regardless of being the third most spoken language globally, Hindi faces resource limitations. However, the growing use of technology and Hindi interfaces has led to abundant Hindi text on the web, presenting opportunities for researchers to extract valuable insights. The present work aims to evaluate the effectiveness of ensemble learning methods for bipartite sentiment classification of Hindi Movie Reviews (HMRs). This area has received relatively less attention from researchers. The study involves manually creating a binary HMR dataset comprising 6,000 reviews. Preprocessing and feature extraction are performed on the collected dataset. Several individual classification models are applied to the HMRs; subsequently, the predictions from these models are combined through a hard voting ensemble approach, and finally, an integrated two-layered stacking ensemble architecture is proposed and implemented in the present work. The preprocessed dataset undergoes classification using SVM, RF, DT, and KNN models in the first classification stage. The decisions from these four classifiers are then amalgamated to build and optimize the second-level estimators SVM and MLP. Ultimately, the meta-classifier provides the final prediction for the bipartite sentiment labels. The results demonstrate that the proposed model achieves the highest performance. Furthermore, the outcomes derived from this investigation have undergone rigorous statistical assessment through the application of the Friedman statistical test. The proposed framework has achieved the most elevated ranking in both the HMR and IIT-P movie review datasets, thereby providing substantial verification of the obtained results. Notably, this study is the first instance of the application of a statistical test for supplementary validation within the realm of the Hindi Review Sentiment Classification task.
Keywords
Ensemble Learning, Hindi, Sentiment Analysis, Statistical Tests, Machine Learning, Movie Reviews.
References
[1] Dhanashree S. Kulkarni, and Sunil S. Rodd, “Sentiment Analysis in Hindi—A Survey on the State-of-the-Art Techniques,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 1, pp. 1-46, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Katarzyna Stąpor, “Evaluation of Classifiers: Current Methods and Future Research Directions,” Annals of Computer Science and Information, vol. 12, pp. 37-40, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Abdalsamad Keramatfar, and Hossein Amirkhani, “Bibliometrics of Sentiment Analysis Literature,” Journal of Information Science, vol. 45, no. 1, pp. 3-15, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Gazi Imtiyaz Ahmad, and Jimmy Singla, “Machine Learning Techniques for Sentiment Analysis of Indian Languages,” International Journal of Recent Technology and Engineering, vol. 8, no. 11, pp. 3630-3636, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Anshul Bhatia, Anuradha Chug, and Amit Prakash Singh, “Statistical Analysis of Machine Learning Techniques for Predicting Powdery Mildew Disease in Tomato Plants,” International Journal of Intelligent Engineering Informatics, vol. 9, no. 1, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Sujata Rani, and Parteek Kumar, “A Journey of Indian Languages over Sentiment Analysis: A Systematic Review,” Artificial Intelligence Review, vol. 52, pp. 1415-1462, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Aditya Joshi et al., “Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code-Mixed Text,” Proceedings of COLING the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 2482-2491, 2016.
[Google Scholar] [Publisher Link]
[8] Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya, “Aspect Based Sentiment Analysis in Hindi: Resource Creation and Evaluation,” Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, pp. 2703-2709, 2016.
[Google Scholar] [Publisher Link]
[9] Sandhya Singh et al., “Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation,” Proceedings of the 4th Workshop on Asian Translation, Taipei, Taiwan, pp. 167-170, 2017.
[Google Scholar] [Publisher Link]
[10] Pradeepika Verma, Sukomal Pal, and Hari Om, “A Comparative Analysis on Hindi and English Extractive Text Summarization,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 18, no. 3, pp. 1-39, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Lior Rokach, “Ensemble-Based Classifiers,” Artificial Intelligence Review, vol. 33, pp. 1-39, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Youwei Wang, Jiangchun Liu, and Lizhou Feng, “Text Length Considered Adaptive Bagging Ensemble Learning Algorithm for Text Classification,” Multimedia Tools Applications, vol. 82, pp. 27681-27706, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] M. Thangaraj, and M. Sivakami, “Text Classification Techniques: A Literature Review,” Interdisciplinary Journal of Information, Knowledge, and Management, vol. 13, pp. 117-135, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Charu Nanda, Mohit Dua, and Garima Nanda, “Sentiment Analysis of Movie Reviews in Hindi Language Using Machine Learning,” 2018 International Conference on Communication and Signal Processing, Chennai, India, pp. 1069-1072, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Vandana Jha et al., “HOMS: Hindi Opinion Mining System,” 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, Kolkata, India, pp. 366-371, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Deebha Mumtaz, and Bindiya Ahuja, “Sentiment Analysis of Movie Review Data Using Senti-Lexicon Algorithm,” 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology, Bangalore, India, pp. 592-597, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Marta Galvao, and Roberto Henriques, “Forecasting Model of a Movie's Profitability,” 2018 13th Iberian Conference on Information Systems and Technologies, Caceres, Spain, pp. 1-6, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Vandana Jha et al., “Sentiment Analysis in a Resource Scarce Language: Hindi,” International Journal of Scientific and Engineering Research, vol. 7, no. 9, pp. 968-980, 2016.
[Google Scholar] [Publisher Link]
[19] Arundeep Kaur, and A.P. Nidhi, “Predicting Movie Success Using Neural Network,” International Journal of Science and Research, vol. 2, no. 9, pp. 69-71, 2013.
[Google Scholar] [Publisher Link]
[20] Nahid Quader, Md. Osman Gani, and Dipankar Chaki, “Performance Evaluation of Seven Machine Learning Classification Techniques for Movie Box Office Success Prediction,” 2017 3rd International Conference on Electrical Information and Communication Technology, Khulna, Bangladesh, pp. 1-6, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Ashutosh Kanitkar, “Bollywood Movie Success Prediction Using Machine Learning Algorithms,” 2018 3rd International Conference on Circuits, Control, Communication and Computing, Bangalore, India, pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Travis Ginmu Rhee, and Farhana Zulkernine, “Predicting Movie Box Office Profitability: A Neural Network Approach,” 2016 15th IEEE International Conference on Machine Learning and Applications, Anaheim, CA, USA, pp. 665-670, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Konstantinas Korovkinas, Paulius Danenas, and Gintautas Garsva, “SVM and Naïve Bayes Classification Ensemble Method for Sentiment Analysis,” Baltic Journal of Modern Computing, vol. 5, no. 4, pp. 398-409, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Anuj Sharma, and Shubhamoy Dey, “A Boosted SVM based Ensemble Classifier for Sentiment Analysis of Online Reviews,” ACM SIGAPP Applied Computing Review, vol. 13, no. 4, pp. 43-52, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Savita Sangam, and Subhash Shinde, “Sentiment Classification of Social Media Reviews Using an Ensemble Classifier,” Indonesian Journal Electrical Engineering Computer Science, vol. 16, no. 1, pp. 355-363, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Amira M. Gaber, Mohamed Nour El-Din, and Hanan Moussa, “SMAD: Text Classification of Arabic Social Media Dataset for News Sources,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 10, pp. 1-9, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Shalini Puri, and Satya Prakash Singh, “An Efficient Hindi Text Classification Model Using SVM,” Computing and Network Sustainability, pp. 227-237, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Md Shad Akhtar et al., “A Hybrid Deep Learning Architecture for Sentiment Analysis,” Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 482-493, 2016.
[Google Scholar] [Publisher Link]
[29] Grégoire Mesnil et al., “Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews,” Arxiv, pp. 1-5, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Muhammad Usman et al., “Urdu Text Classification Using Majority Voting,” International Journal of Advanced Computer Science and Applications (IJCSA), vol. 7, no. 8, pp. 265-273, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[31] K. Sarkar, “Heterogeneous Classifier Ensemble for Sentiment Analysis of Bengali and Hindi Tweets,” Sādhanā, vol. 45, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Nikunj C. Oza, and Kagan Tumer, “Classifier Ensembles: Select Real-World Applications,” Information Fusion, vol. 9, no. 1, pp. 4-20, 2008.
[CrossRef] [Google Scholar] [Publisher Link]