Largest-chunking and group formation: Two basic strategies for a cognitive model of linguistic processing
Abstract
The present study aims at shedding further light on how Agreement Groups (AG) processing (e.g. Drienkó 2020a) and Largest Chunk (LCh) segmentation (e.g. Drienkó 2018a) can be combined to model the emergence of language. The AG model is based on groups of similar utterances which enable combinatorial mapping of novel utterances. LCh segmentation is concerned with cognitive text segmentation, i.e. with detecting word boundaries in a sequence of linguistic symbols. Previous cross-linguistic research on French, English, and Hungarian texts (Drienkó 2020b) demonstrated that LCh segmentation is not efficient when words are the basic segmentation units and utterances are the target sequences. However, almost all utterance boundaries were identified at the expense of inserting relatively many extra boundaries. These extra boundaries delineated reoccurring fragments for building longer utterances. The present analysis of English mother-child data confirms previous findings that in spite of the relatively low efficiency of word-based LCh segmentation with respect to utterance boundaries, LCh segments can still prove to be useful word combinations for AG processing. Furthermore, compared with the previous experiments, the data suggest higher boundary precision (42%) and higher coverage (85%). These findings, on the one hand, support the claim that LCh fragments can be useful in linguistic processing (with AGs), and, on the other hand, are in line with a view that mother-child language facilitates processing more than other speech contexts.
Keywords:
Cognitive computer modelling, segmentation, syntactic processing, language acquisitionReferences
Bagou, O., C. Fougeron, and U. H. Frauenfelder. 2002. Contribution of prosody to the segmentation and storage of “words” in the acquisition of a new mini-language. Speech Prosody 2002, Aix-en-Provence, France, April 11–13, 2002. (Crossref)
Bahlmann, G., and A. D. Friederici. 2006. Hierarchical and linear sequence processing: An electrophysiological exploration of two different grammar types. Journal of Cognitive Neuroscience 18(11): 1829–1842. (Crossref)
Bannard, C., and D. Matthews. 2008. Stored word sequences in language learning. Psychological Science 19(3): 241–248. (Crossref)
Cameron-Faulkner, Th., E. Lieven, and M. Tomasello. 2003. A construction based analysis of child directed speech. Cognitive Science 27: 843–873. (Crossref)
Cutler, A., and D. M. Carter. 1987. The predominance of strong initial syllables in English vocabulary. Computer Speech and Language 2: 133–142. (Crossref)
Cutler, A., and D. G. Norris. 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14: 113–121. (Crossref)
Drienkó, L. 2013a. Distributional cues for language acquisition: a cross-linguistic agreement groups analysis. Poster presentation for the 11th International Symposium of Psycholinguistics, Tenerife, Spain 20–23 March, 2013.
Drienkó, L. 2013b. Agreement groups coverage of mother-child language. Talk presented at the Child Language Seminar, Manchester, UK, 23–25 June, 2013.
Drienkó, L. 2014. Agreement groups analysis of mother-child discourse. In G. Rundblad, A. Tytus, O. Knapton, and C. Tang (eds.), Selected Papers from the 4th UK Cognitive Linguistics Conference, 52–67. London: UK Cognitive Linguistics Association. http://www.uk-cla.org.uk/proceedings/volume_2_36/36-32
Drienkó, L. 2015. Discontinuous coverage of English mother-child speech. Talk presented at the Budapest Linguistics Conference, Budapest, Hungary, 18–20 June, 2015.
Drienkó, L. 2016a. Discovering utterance fragment boundaries in small unsegmented texts. In A. Tanács, V. Varga, and V. Vincze (eds.) XII. Magyar Számítógépes Nyelvészeti Konferencia. (12th Hungarian Computational Linguistics Conference), 273–281. http://rgai.inf.u-szeged.hu/mszny2016/
Drienkó, L. 2016b. Agreement groups coverage of English mother-child utterances for modelling linguistic generalisations. Journal of Child Language Acquisition and Development – JCLAD 4(3): 113–158.
Drienkó, L. 2017. Largest chunks as short text segmentation strategy: a cross-linguistic study. In A. Wallington, A. Foltz, and J. Ryan (eds.), Selected Papers from the 6th UK Cognitive Linguistics Conference, 273–292. The UK Cognitive Linguistics Association. http://www.uk-cla.org.uk/files/downloads/15_drienko_273_292.pdf
Drienkó, L. 2018a. Largest-Chunk strategy for syllable-based segmentation. Language and Cognition 10(3): 391–407.
Drienkó, L. 2018b. The effects of utterance-boundary information on Largest-Chunk segmentation. Talk presented at the 20th Summer School of Psycholinguistics, Balatonalmádi, Hungary, 10–14 June, 2018. (Crossref)
Drienkó, L. 2020a. Agreement Groups and dualistic syntactic processing. In A. Haselow, and G. Kaltenböck (eds.), Grammar and cognition: Dualistic models of language structure and language processing, 310–354. John Benjamins Publishing Company. (Crossref)
Drienkó, L. 2020b. Word-based largest chunks for Agreement Groups processing: Cross-linguistic observations. Linguistics Beyond and Within (LingBaW) 6(1): 60–73. https://doi.org/10.31743/lingbaw.11831 (Crossref)
Finch, S., N. Chater, and M. Redington, 1995. Acquiring syntactic information from distributional statistics. In J. P. Levy, D. Bairaktaris, J. A. Bullinaria, and P. Cairns (eds.), Connectionist models of memory and language, 229–242. UCL Press: London.
Ganger, J., and M. R. Brent. 2004. Reexamining the Vocabulary Spurt. Developmental Psychology 40(4): 621–632. (Crossref)
Harris, Z. S. 1951. Methods in structural linguistics. Chicago, IL, US: University of Chicago Press.
Harris, Z. S. 1952. Discourse analysis. Language 28(1): 1–30. (Crossref)
Harris, Z. S. 1955. From phoneme to morpheme. Language 31: 190–222. (Crossref)
Kiss, G. R. 1973. Grammatical word classes: A learning process and its simulation. Psychology of Learning and Motivation 7: 1–41. (Crossref)
MacWhinney, B. 2000. The CHILDES Project: Tools for analyzing talk. 3rd Edition. Vol. 2: The Database. Mahwah, NJ: Lawrence Erlbaum Associates.
Mattys, S. L, L. White, and J. F. Melhorn. 2005. Integration of multiple speech segmentation cues: a hierarchical framework. Journal of Experimental Psychology: General 134(4): 477–500. (Crossref)
Mintz, T. H. 2003. Frequent frames as a cue for grammatical categories in child directed speech. Cognition 90(1): 91–117. (Crossref)
Newport, E. L. 1990. Maturational constraints on language learning. Cognitive Science 14: 11–28. (Crossref)
Peters, A. 1983. The units of language acquisition. Cambridge, Cambridge University Press.
Redington, M., N. Chater, and S. Finch. 1998. Distributional Information: A Powerful Cue for Acquiring Syntactic Categories. Cognitive Science 22 (4): 425–469. (Crossref)
Saffran, J. R., R. N. Aslin, and E. L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274(5294): 1926–8. (Crossref)
Seidl, A., and E. K. Johnson. 2006, Infant word segmentation revisited: edge alignment facilitates target extraction. Developmental Science 9: 565–573. (Crossref)
Sidtis, J. J., D. V. Sidtis, V. Dhawan, and D. Eidelberg. 2018. Switching Language Modes: Complementary Brain Patterns for Formulaic and Propositional Language. Brain connectivity 8(3): 189–196. (Crossref)
St. Clair, M. C., P. Monaghan, and M. H. Christiansen. 2010. Learning grammatical categories from distributional cues: Flexible frames for language acquisition. Cognition 116(3): 341–360. (Crossref)
Stoll, S., K. Abbot-Smith, and E. Lieven. 2009. Lexically Restricted Utterances in Russian, German, and English Child-Directed Speech. Cognitive Science 33: 75–103. (Crossref)
Strauss, S. 1982. Ancestral and descendent behaviours: The case of U-shaped behavioural growth. In T. G. Bever (ed.), Regressions in mental development: Basic phenomena and theories, 191–220. Hillsdale, NJ: Lawrence Erlbaum Associate, Inc. (Crossref)
Theakston, A. L., E. V. Lieven, J. M. Pine, and C. F. Rowland. 2001. The role of performance limitations in the acquisition of verb-argument structure: an alternative account. Journal of Child Language 28(1): 127–52. (Crossref)
Thiessen, E. D., and J. R. Saffran. 2007. Learning to Learn: Infants’ Acquisition of Stress-Based Strategies for Word Segmentation. Language Learning and Development 3(1): 73–100. (Crossref)
Van Lancker Sidtis, D. 2009. Formulaic and novel language in a ‘dual process’ model of language competence: evidence from surveys, speech samples, and schemata. In R. L. Corrigan, E. A. Moravcsik, H. Ouali, and K. M. Wheatley (eds.), Formulaic Language: Volume 2. Acquisition, loss, psychological reality, functional applications, 151–176. Amsterdam: Benjamins Publishing Co. (Crossref)
Wang, H., and T. H. Mintz. 2010. From linear sequences to abstract structures: Distributional information in infant-direct speech. In J. Chandlee, K. Franich, K. Iserman, and L. Keil (eds.), Proceedings Supplement of the 34th Boston University Conference on Language Development 34 Online Proceedings Supplement. Somerville, MA: Cascadilla Press. http://www.bu.edu/bucld/proceedings/supplement/vol34/
Weisleder, A, and S. R. Waxman. 2010. What’s in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English. Journal of Child Language 37(5): 1089–108. (Crossref)