Word-based largest chunks for Agreement Groups processing: Cross-linguistic observations

László Drienkó

SZSZC-Jáky Székesfehérvár , Hungary


The present study reports results from a series of computer experiments seeking to combine word-based Largest Chunk (LCh) segmentation and Agreement Groups (AG) sequence processing. The AG model is based on groups of similar utterances that enable combinatorial mapping of novel utterances. LCh segmentation is concerned with cognitive text segmentation, i.e. with detecting word boundaries in a sequence of linguistic symbols. Our observations are based on the text of Le petit prince (The little prince) by Antoine de Saint-Exupéry in three languages: French, English, and Hungarian. The data suggest that word-based LCh segmentation is not very efficient with respect to utterance boundaries, however, it can provide useful word combinations for AG processing. Typological differences between the languages are also reflected in the results.


cognitive computer modelling, segmentation, syntactic processing, language acquisition

Bagou, O., C. Fougeron, and U. H. Frauenfelder. 2002. Contribution of prosody to the segmentation and storage of "Words" in the acquisition of a new mini-language. Speech Prosody 2002, Aix-en-Provence, France, April 11–13, 2002.

Bahlmann, G., and A. D. Friederici. (2006). Hierarchical and linear sequence processing: An electrophysiological exploration of two different grammar types. Journal of Cognitive Neuroscience 18(11): 1829–1842.

Cameron-Faulkner, Th., E. Lieven, M. Tomasello. 2003. A construction based analysis of child directed speech. Cognitive Science 27: 843–873.

Cutler, A., and D. M. Carter. 1987. The predominance of strong initial syllables in English vocabulary. Computer Speech and Language 2: 133–142.

Cutler, A. and D. G. Norris. 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14: 113–121.

de Saint-Exupéry, A. 1943a. Le petit prince. Édition du groupe “Ebooks libres et gratuits”. Retrieved from https://www.ebooksgratuits.com/pdf/st_exupery_le_petit_prince.pdf.

de Saint-Exupéry, A. 1943b. The little prince. English translation by Jeff Mcneill. TheVirtualLibrary.org. Retrieved from https://thevirtuallibrary.org/index.php/en/?option=com_djclassifieds&format=raw&view=download&task=download&fid=14329.

de Saint-Exupéry, A. 1943c. A kis herceg. Hungarian translation by György Rónay. Retrieved from http://mek.oszk.hu/00300/00384/00384.pdf.

Drienkó, L. 2013a. Distributional cues for language acquisition: A cross-linguistic agreement groups analysis. Poster presentation for the 11th International Symposium of Psycholinguistics, Tenerife, Spain 20–23 March, 2013.

Drienkó, L. 2013b. Agreement groups coverage of mother-child language. Talk presented at the Child Language Seminar, Manchester, UK, 23–25 June, 2013.

Drienkó, L. 2014. Agreement groups analysis of mother-child discourse. In G. Rundblad, A. Tytus, O. Knapton, and C. Tang (eds.), Selected Papers from the 4th UK Cognitive Linguistics Conference, 52–67. London: UK Cognitive Linguistics Association. Retrieved from http://www.uk-cla.org.uk/proceedings/volume_2_36/36-32.

Drienkó, L. 2015. Discontinuous coverage of English mother-child speech. Talk presented at the Budapest Linguistics Conference, Budapest, Hungary, 18–20 June, 2015.

Drienkó, L. 2016a. Discovering utterance fragment boundaries in small unsegmented texts. In A. Takács, V. Varga, and V. Vincze (eds.), XII. Magyar Számítógépes Nyelvészeti Konferencia (12th Hungarian Computational Linguistics Conference), 273–281. Retrieved from https://rgai.inf.u-szeged.hu/sites/rgai.sed.hu/files/MSZNY2016_web_ISO_B5.pdf.

Drienkó, L. 2016b. Agreement groups coverage of English mother-child utterances for modelling linguistic generalisations. Journal of Child Language Acquisition and Development – JCLAD 4(3): 113–158. Retrieved from http://jclad.science-res.com/archives_full_issu/Vol%204%20issue%203%20FULL%20ISSUE.pdf.

Drienkó, L. 2017a. Agreement groups processing of context-free utterances: Coverage, structural precision, and category information Talk presented at the 2nd Budapest Linguistics Conference, 1–3 June 2017, Budapest, Hungary.

Drienkó, L. 2017b. Largest chunks as short text segmentation strategy: A cross-linguistic study. In A. Wallington, A. Foltz, and J. Ryan (eds.), Selected Papers from the 6th UK Cognitive Linguistics Conference, 273–292. The UK Cognitive Linguistics Association. Retrieved from http://www.uk-cla.org.uk/files/downloads/15_drienko_273_292.pdf.

Drienkó, L. 2018a. Agreement groups and dualistic syntactic processing. Talk presented at the “One Brain – Two Grammars? Examining dualistic approaches to language and cognition” international workshop, 1–2 March 2018, Rostock, Germany. Retrieved from https://independent.academia.edu/LaszloDrienko/Conference-Presentations.

Drienkó, L. 2018b. Largest-Chunk strategy for syllable-based segmentation. Language and Cognition 10(3), 391–407.

Drienkó, L. 2018c. The effects of utterance-boundary information on Largest-Chunk segmentation. Talk presented at the 20th Summer School of Psycholinguistics, Balatonalmádi, Hungary, 10–14 June, 2018.

Drienkó, L. (in review). Largest-chunking and group formation: two basic strategies for a cognitive model of linguistic processing.

Drienkó, L. 2020. Agreement Groups and dualistic syntactic processing. In A. Haselow, and G. Kaltenböck (eds.), Grammar and cognition: Dualistic models of language structure and language processing, 310–354. John Benjamins Publishing Company.

Erickson, L. C., and E. D. Thiessen. 2015. Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. Developmental Review 37: 66–108.

Finch, S., N. Chater, and M. Redington. 1995. Acquiring syntactic information from distributional statistics. In J. P. Levy, D. Bairaktaris, J. A. Bullinaria, and P. Cairns, (eds.), Connectionist models of memory and language, 229–242. UCL Press: London.

Ganger, J., and M. R.Brent. 2004. Reexamining the Vocabulary Spurt. Developmental Psychology 40(4): 621–632.

Harris, Z. S. 1951. Methods in structural linguistics. Chicago: University of Chicago Press.

Harris, Z. S. 1952. Discourse analysis. Language 28(1): 1–30.

Harris, Z. S. 1955. From phoneme to morpheme. Language 31: 190–222.

Kiss, G. R. 1973. Grammatical word classes: A learning process and its simulation. Psychology of Learning and Motivation 7: l–41.

Koplenig, A. 2019. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. Royal Society Open Science 6: 181274. Retrieved from https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.181274.

MacWhinney, B. 2000. The CHILDES Project: Tools for analyzing talk. Volume 2: The Database. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Mattys, S. L, L. White, J. F. Melhorn. 2005. Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General 134(4): 477–500.

Mintz, T. H. 2003. Frequent frames as a cue for grammatical categories in child directed speech. Cognition 90(1): 91–117.

Newport, E. L. 1990. Maturational constraints on language learning. Cognitive Science 14: 11–28.

Peters, A. 1983. The units of language acquisition. Cambridge: Cambridge University Press.

Redington, M., N. Chater, and S. Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22(4): 425–469.

Saffran, J. R., R. N. Aslin, and E. L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274(5294): 1926–1928.

Seidl, A. and E. K. Johnson. 2006. Infant word segmentation revisited: Edge alignment facilitates target extraction. Developmental Science 9: 565–573.

Sidtis, J. J., D. Van Lancker Sidtis, V. Dhawan, and D. Eidelberg. 2018. Switching Language Modes: Complementary Brain Patterns for Formulaic and Propositional Language. Brain connectivity 8(3): 189–196.

St. Clair, M. C., P. Monaghan, and M. H. Christiansen. 2010. Learning grammatical categories from distributional cues: Flexible frames for language acquisition. Cognition 116(3): 341–360.

Stoll, S., K. Abbot-Smith, E. Lieven. 2009. Lexically Restricted Utterances in Russian, German, and English Child-Directed Speech. Cognitive Science 33: 75–103.

Strauss, S. 1982. Ancestral and descendent behaviours: The case of U-shaped behavioural growth. In T. G. Bever (ed.), Regressions in mental development: Basic phenomena and theories, 191–220. Hillsdale, NJ: Lawrence Erlbaum Associate, Inc.

Theakston, A. L., E. V. Lieven, J. M. Pine, and C. F. Rowland. 2001. The role of performance limitations in the acquisition of verb-argument structure: An alternative account. Journal of Child Language. 28(1):127–52.

Thiessen, E. D., and J. R. Saffran. 2007. Learning to Learn: Infants’ Acquisition of Stress-Based Strategies for Word Segmentation. Language Learning And Development 3(1): 73–100.

Van Lancker Sidtis, D. 2009. Formulaic and novel language in a ‘dual process’ model of language competence: Evidence from surveys, speech samples, and schemata. In R. L. Corrigan, E. A. Moravcsik, H. Ouali, and K. M. Wheatley (eds.), Formulaic Language: Volume 2. Acquisition, loss, psychological reality, functional applications, 151–176. Amsterdam: Benjamins Publishing Co.

Wang, H., and T. H. Mintz. 2010. From Linear Sequences to Abstract Structures: Distributional Information in Infant-direct Speech. In J. Chandlee, K. Franich, K. Iserman, and L. Keil (eds.), Boston University Conference on Language Development 34 Online Proceedings Supplement. Somerville, MA: Cascadilla Press. Retrieved from http://www.bu.edu/bucld/proceedings/supplement/vol34/.

Weisleder, A, and S. R. Waxman, 2010. What’s in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English. Journal of Child Language 37(5): 1089–108.



Drienkó, L. (2020). Word-based largest chunks for Agreement Groups processing: Cross-linguistic observations. Linguistics Beyond and Within (LingBaW), 6(1), 60–73. https://doi.org/10.31743/lingbaw.11831

László Drienkó 
SZSZC-Jáky Székesfehérvár http://orcid.org/0000-0002-6749-2017


Contributions to this journal are published under a Creative Commons license.

Creative Commons Licence