Novel Method to Annotate Text Properties for Indic Language Kannada

Novel Method to Annotate Text Properties for Indic Language Kannada

© 2025 by IJETT Journal
Volume-73 Issue-1
Year of Publication : 2025
Author : Vivekananda, K C Ravishankar
DOI : 10.14445/22315381/IJETT-V73I1P126

How to Cite?
Vivekananda, K C Ravishankar, "Novel Method to Annotate Text Properties for Indic Language Kannada," International Journal of Engineering Trends and Technology, vol. 73, no. 1, pp. 313-323, 2025. Crossref,

Natural Language Processing includes Machine Translation and Transliteration processes. These Processes are helpful in knowledge transfer across language barriers. The language processing tools and algorithms should analyse the source language for efficient results. Therefore, language processing tasks have source language analysis as one of the preliminary steps. Text analysis to understand syllable structures is an important initial step. Understanding syllable structures of the agglutinative and abugida languages is more challenging because of complex syllable structures. Various morphemes are stitched together in agglutinative languages to build long words with complex meanings. In abugida languages, letters are built using consonants and vowel sequences. Syllabification and syllable segmentation processes explore complex syllable structures of abugida languages. This paper proposes a novel approach to annotate and classify an abugida-type language, Kannada text syllables, by building a Syntax Directed Translation (SDT) and implementing it using PLY (A python LEX and YACC tool).

NLP, Syllabification, Kannada text processing, Syllable annotation, Syllable classification, Syllable segmentation.

[1] Languages of the World, Ethnologue, 1951. [Online]. Available:
[2] C. V. Srinatha Sastry, UNICODE for Kannada, (U+0C80 to U+0CFF), UNICODE, 2016. [Online]. Available:,
[3] Manoj K. Chinnakotla, Om P. Damani, and Avijit Satoskar, “Transliteration for Resource-Scarce Languages,” ACM Transactions on Asian Language Information Processing, vol. 9, no. 4, pp. 1-30, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Loitongbam Gyanendro Singh, Lenin Laitonjam, and Sanasam Ranbir Singh, “Automatic Syllabification for Manipuri language,” Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 349-357, 2016.
[Google Scholar] [Publisher Link]
[5] Anoop Kunchukuttan, and Pushpak Bhattacharyya, “Orthographic Syllable as the Basic Unit for SMT Between Related Languages,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1912-1917, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Prakash Padakannaya, “Indian Orthography and Teaching How to Read: A Psycholinguistic Framework,” Psychological Studies, vol. 49, no. 4, pp. 262-271, 2004.
[Google Scholar]
[7] Anoop Kunchukuttan et al., “Leveraging Orthographic Similarity for Multilingual Neural Transliteration,” Transactions of the Association for Computational Linguistics, vol. 6, pp. 303-316, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Anoop Kunchukuttan, IndicNLP Library, 2014. [Online]. Available:
[9] Indic Layout Requirements, W3C, 2020. [Online]. Available:\_indic\_orthographic\_syllable\_boundaries
[10] PLY (Python Lex-Yacc), Dabeaz, 2024. [Online]. Available:
[11] Mohd Sanad Zaki Rizvi, 3 Important NLP Libraries for Indian Languages You Should Try Out Today!, Analytics Vidhya, 2024. [Online]. Available:
[12] Sonal Kulkarni-Joshi, “Linguistic History and Language Diversity in India: Views and Counterviews,” Journal of Biosciences, vol. 44, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Richard Sproat, “A Formal Computational Analysis of Indic Scripts,” International Symposium on Indic Scripts: Past and Future, Tokyo, pp. 1-32, 2003.
[Google Scholar] [Publisher Link]
[14] Kuche Anurag, Kuche Bhavani Priya, and Karthik Kashyap, “Transliteration of Kannada Text to English Text,” International Journal of Recent Engineering Research and Development, vol. 3, no. 10, pp.19-23, 2018.
[Publisher Link]
[15] Jong-Hoon OH, and Key-Sun CHOI, “Machine Learning Based English-to-Koren Transliteration Using Grapheme and Phoneme Information,” IEICE Transactions on Information and Systems, vol. E88-D, no. 7, pp. 1737-1748, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[16] M. Latha, M. Shivakumar, and R. Manjula, “Performance Analysis of Kannada Phonetics: Vowels, Fricatives and Stop Consonants Using LP Spectrum,” SN Computer Science, vol. 1, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Sarika Hegde, K.K. Achary, and Surendra Shetty, “Statistical Analysis of Features and Classification of Alpha Syllabary Sounds in Kannada Language,” International Journal of Speech Technology, vol. 18, pp. 65-75, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Jeffrey Lidz, “The Grammar of Accusative Case in Kannada,” Language: Linguistic Society of America, vol. 82, no. 1, pp. 10-32, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Google Translate,, 2024. [Online]. Available:
[20] R. Sproat,” Multilingual Text Analysis for Text-to-Speech Synthesis,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, Philadelphia, PA, USA, vol. 3, pp. 1365-1368, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Transliteration,, 2022. [Online]. Available:
[22] Rev. F. Kittel, and M. Mariappa Bhat, Kittel Kannada Dictionary: Free Download, Borrow, and Streaming, Internet Archive, 2022. [Online]. Available: