Automatically generated language learning exercises for Finno-Ugric languages
Abstract
Morphologically rich languages always constitute a great challenge for language learners. The learner must be able to understand the information encoded in different word forms of the same root and to generate the correct word form to express certain syntactic functions and grammatical relations by conjugating a verb or declining a noun, an adjective or a pronoun. One way to improve one’s language skills is through exercises that focus on certain aspects of grammar. In this paper, a language learning application is presented that is intended to help learners of Finnish and Hungarian (with Hungarian and Finnish L1, respectively) acquire new vocabulary items, as well as practice some grammar aspects that according to surveys are considered difficult by learners of these languages with the other Finno-Ugric language being the learner’s native tongue, while alleviating the need to create these exercises manually. This application is a result of an on-going research project. In this research project, bilingual translation pairs and additional monolingual data were collected that can be utilized to build language learning exercises and an online bilingual dictionary with the help of automatic methods. Several linguistic patterns and rules were defined in order to automatically select example sentences that focus on a given part of the target language. These sentences were automatically annotated with the help of language processing tools. Due to the large size of the previously collected data sets, to date, only a subset of the analyzed sentences and the bilingual translation pairs has been manually evaluated. The results of this evaluation are discussed in this paper in order to estimate the precision of the methodology presented here. To ensure the precision of the information and the reliability of the application, only manually validated data sets are displayed. In this project, continuous data validation is planned, since it leads to more and more examples and vocabulary items that learners can benefit from.
Keywords:
natural language processing, computer-assisted language learning, virtual flashcards, Finno-Ugric languagesReferences
Ács, J., Pajkossy, K., and Kornai, A. 2013. Building basic vocabulary across 40 languages. In Proceedings of the sixth workshop on building and using comparable corpora, 52–58.
Aslani, M., and Tabrizi, H. H. 2015. Teaching grammar to Iranian EFL learners through blended learning using multimedia softwares. Journal of Applied Linguistics and Language Research 2(8): 76–87.
Basoglu, E. B., and Akdemir, O. 2010. A comparison of undergraduate students' English vocabulary learning: Using mobile phones and flash cards. Turkish Online Journal of Educational Technology-TOJET 9(3): 1–7.
Elgort, I. 2013. Effects of L1 definitions and cognate status of test items on the Vocabulary Size Test. Language Testing 30(2): 253–272. (Crossref)
Ferenczi, Zs. 2021a. Finn–magyar fordítási párok kinyerése automatikus módszerekkel. In Gráczi Tekla Etelka és Ludányi Zsófia (Eds.), Doktoranduszok tanulmányai az alkalmazott nyelvészet köréből, 131–150. Nyelvtudományi Kutatóközpont, Budapest.
Ferenczi, Zs. 2021b. Wiktionary Parser. https://github.com/ferenczizsani/wiktionary_parser
Ferenczi, Zs. 2021c. WordNet Connector. https://github.com/ferenczizsani/connect_wordnets
Ferenczi, Zs. 2021d. OPUS Extractor. https://github.com/ferenczizsani/opus_extractor
Hämäläinen, M. 2019. UralicNLP: An NLP library for Uralic languages. Journal of Open Source Software 4(37). (Crossref)
Indig, B., Sass, B., Simon, E., Mittelholcz, I., Vadász, N., and Makrai, M. 2019. One format to rule them all–The emtsv pipeline for Hungarian. In Proceedings of the 13th Linguistic Annotation Workshop, 155–165. Association for Computational Linguistics, Florence. (Crossref)
Jo, G. 2018. English Vocabulary Learning with Wordlists vs. Flashcards; L1 Definitions vs. L2 Definitions; Abstract Words vs. Concrete Words. Culminating Projects in English. 132.
Kalivoda, Á. 2021. Igekötős szerkezetek a magyarban. [Preverb Constructions in Hungarian.] Ph.D. thesis, Pázmány Péter Catholic University, Budapest.
Karlsson, F., and Chesterman, A. 2008. Finnish: an essential grammar. Routledge.
Kilickaya, F., and Krajka, J. 2010. Comparative usefulness of online and traditional vocabulary learning. Turkish Online Journal of Educational Technology-TOJET 9(2): 55–63.
Korhonen, S. 2012. Oppijoiden suomi. Koulutettujen aikuisten käsitykset ja kompetenssit [Perceptions and competences of adult learners of Finnish]. Helsinki: Helsingin yliopisto.
Laufer, B., Elder, C., Hill, K., and Congdon, P. 2004. Size and strength: Do we need both to measure vocabulary knowledge? Language testing 21(2): 202–226. (Crossref)
Lindén, K., and Carlson, L. 2010. FinnWordNet – Finnish WordNet by translation. LexicoNordica–Nordic Journal of Lexicography 17: 119–140.
Máté, J. 1999. A magyar nyelv elsajátításának nehézségei a finn anyanyelvű tanulók szempontjából [Difficulties to learn Hungarian for Finnish learners]. Hungarologische Beiträge 12: 91–112.
Miháltz, M., Hatvani, C., Kuti, J., Szarvas, G., Csirik, J., Prószéky, G., and Váradi, T. 2008. Methods and results of the Hungarian WordNet project. In A. Tanács, D. Csendes, V. Vincze, Ch. Fellbaum, P. Vossen (Eds.), Proceedings of The Fourth Global WordNet Conference, 311–321. University of Szeged.
Miller, G. A. 1995. WordNet: a lexical database for English. Communications of the ACM 38(11): 39-41. (Crossref)
Pirinen, T. A. 2015. Development and Use of Computational Morphology of Finnish in the Open Source and Open Science Era: Notes on Experiences with Omorfi Development. SKY Journal of Linguistics 28: 381-393.
Tiedemann, J., and Nygaard, L. 2004. The OPUS Corpus – Parallel and Free. In Proceedings of the Fourth International Conference on Language Resources and Evaluation. Lisbon, Portugal. European Language Resources Association (ELRA).
Weöres, Gy. n.d.. The Relationship between the Finnish and the Hungarian Languages. https://histdoc.net/sounds/hungary.html
Pázmány Péter Catholic University https://orcid.org/0000-0002-2696-7143