Acoustic analysis of monophthongs, diphthongs, and triphthongs in Mandarin for 3- to 5-year-old children with articulatory phonological disorders

Ten 3to 5-year old children (5M, 5F) who were diagnosed as children with articulatory phonological disorders (CWAPD) and attending a therapy program were recruited to participate in a ‘repeat-after-her’ experiment. They were asked to produce a total of 85 real Mandarin words, including 28 monophthongs, 41 diphthongs, and 16 triphthongs. The results indicated that CWAPD have no problem producing monophthongs. However, attempts to articulate diphthongs and triphthongs induced more errors. CWAPD showed more errors when producing words with 1st sonorant diphthongs than words with 2nd sonorant diphthongs—this is because the least sonorant segment in the last position is prone to distortion. Similar phenomena were found in other triphthongs, except with /iai/ and /iou/, which did not see deviant pronunciation. Comparing our study to the information provided by two therapists showed that the participating CWAPD encountered difficulties in producing multi-vowel syllables, where the position and sonorant matters. In addition, our results also reveal a similar vowel acquisition order among CWAPD as among normal children.


Introduction
Learning language is an important developmental phase for human beings. All children need to experience and master the phonological development process in order to acquire their first language (L1). Children who do not succeed in acquiring L1 usually need medical intervention from speech therapists. In addition, children who cannot communicate linguistically might also display some mental or physical issues. Conditions such as cerebral palsy or various kinds of intellectual disability may reduce a child's learning ability and may cover situations beyond issues relating to problems with the articulatory organs, including: the size and shape of the oral cavity; the length of the tongue and other issues with the frenulum; the arrangement of the teeth; the bite of the upper and lower jaws; cleft lip and palate; and poor coordination of lips and tongue. The articulatory phonological disorders in this research rule out nerve damage and cognitive mental and physiological structures. Only those who showed symptoms of articulatory phonological disorders were assessed and it was suggested they receive further language therapy. As such, participation in this research was limited to children with functional articulation disorders as defined by Bernthal, Bankson, and Flipsen (2009). Children with impaired hearing, cerebral palsy, cleft lip and palate, and intellectual disability were not a part of this study.
This study is organized as follows: relevant studies on CWAPD are presented in Section 2; methodology and experimental design illustrating data collection, procedures, and analysis are presented in Section 3; the findings are presented in Section 4; and Section 5 offers interpretation of the results. Conclusions and suggestions for further research are presented in the final section.

Previous studies
Problems of speech processing involve pronunciation, place, speed, intensity, and coordination. Articulatory disorders can be defined as those disorders involving difficulty in controlling the action of speaking and voicing some sounds. Van Riper (1978) has pointed out that the most important cause of such issues is that the speaker has poor identification of the sound. The speaker cannot distinguish the difference between correct and erroneous pronunciation through hearing alone. According to the American Speech-Language-Hearing Association (ASHA, 1993) the most frequent types of articulatory disorders include substitution, omission, distortion, and addition. 'Substitution' may be described as a way by which speakers replace one sound with another; 'omission' involves the deletion of sounds in words and sentences; in 'addition' one or more extra sounds are added or inserted into a word. The most complex phenomenon is 'distortion,' where the sound is produced partially correctly, meaning some feature of it is distorted. A number of studies (on English (Dodd et al., 1989); Cantonese ; Spanish (Goldstein, 1996); Turkish (Topbas, 1997); German (Fox, 1997); and Putonghua (Mandarin) (Zhu and Dodd, 2000b)) have investigated subgroups of speech disorders, such as articulation, delay, and (in) consistent disorders. All studies revealed similar developmental processes among children with speech disorders and normal children from similar language backgrounds. For example, Zhu et al. (2000b) analyzed 33 Putonghua-speaking children with speech disorders and compared their data across language backgrounds. One of the conclusions they drew was as to the saliency of the components in the language system in determining the order of acquisition. A similar acquisition process was found among normal Putonghuaspeaking children. However, none of the above studies has solely explored the behaviors of children with articulation phonological disorders (CWAPD).
Children come to understand and communicate with language during early childhood and the development of their language skills follows a particular process. This process also provides a basis for growth in a child's cognitive learning, human interactions, emotional development, and social adaptation. The articulation of vowels is usually considered to be easiest and is acquired earlier than that of consonants. Concerning Mandarin vowels, Zheng-Fen Zhang and Yu-Mei Zhong (1986) demonstrated that a child's learning of phonemes begins with simple vowels. The order of vowel development starts with monophthongs, which are then followed by diphthongs and the vowel system may be sufficiently mature by the age of 3. In terms of consonantal development, most oral and nasal stops are acquired before the age of 3; laterals and most fricatives are acquired by age 4; and affricates are acquired after the age of 4. To sum up, the capacity to sufficiently articulate all vowels and consonants has developed by the age of 7. Children's consonantal systems, involving the lips, teeth, and tongue, develop after their vowel systems. Since vowel acquisition is complete by the age of 3, and stop consonants are acquired first, we chose stop consonants to construct a meaningful syllable structure for the study of articulation among CWAPD to avoid the unnecessary influence of vowels. Many studies (Wang et al., 1984;Zhang et al., 1986;Zhu et al., 2000a) have revealed that simple consonants, such as stops, both aspirated and un-aspirated, are acquired before the age of 3. Furthermore, Zhu et al. (2000a) found that triphthongs and diphthongs induced more systematic errors.
It is well-known that the first formant, F1, corresponds to vowel openness. F1 is inversely proportional to mouth openness. Open vowels have high F1 frequencies, while closed vowels have low F1 frequencies. The second formant, F2, corresponds to vowel frontness. Back vowels have low F2 frequencies, while front vowels have high F2 frequencies (Pickett, 1999). Since the midpoint of F1 and the F2 vowel segment is a steady area for vowel formants, unaffected by tone, the length of the form-word, and speech tempo (Jeng, 2005), it is reasonable to measure the values of F1 and F2 in order to examine whether the target (openness or the backness of the vowel) has been achieved. Furthermore, Cao (2007) reported a perceptual experiment where the stimuli, 13 diphthongs and triphthongs, were read by 2 males and 2 females from Beijing. She found that the 6:4 duration ratio in the 1 st sonorous diphthongs, where the first vowel is more sonorous than the second vowel, was most recognizable. The ratio in the 2 nd sonorous diphthongs was 4:6, and that in the triphthongs was 4:4:2.
The primary goal of this study was to see if acoustic evidence drawn from studying CWAPD, especially on the characteristics of the vowels, revealed a pattern. It would be of interest to further examine CWAPD between the ages of 3 and 5 who speak Mandarin as their first language in terms of both vowels and relatively easy consonants. It was observed that CWAPD needed to generalize the way they produced a particular sound during the therapy process to other settings. This means they need to remember how they produce a particular sound through practice and mimicry. Because CWAPD are at the stage of generalization, their pronunciation is unstable and this may explain differences in their performance between the experiment and therapy. As such, it is reasonable to predict that the least salient element within a syllable may see the largest errors among CWAPD.

Participants and Procedure
14 participations were recruited for the experiment, including: two speech therapists; two normal children (1M (age 5), 1F (age 4.5)); and 10 CWAPD. The CWAPD (5M, 5F, age 3-5) were accompanied by a parent and were asked to repeat after the researcher (the second author) in a quiet room. The stimuli read by the children were recorded using the Praat computer software package at 44100Hz. Each stimulus was read twice with the best example chosen by the second author 1 . Two normal children with no significant abnormalities in oral structures and functions, and who could speak clearly, underwent the same procedure to provide a control. The two speech therapists, who were responsible for 9 of the 10 CWAPD were interviewed about the language deficiency of each CWAPD. The language background of the participants is shown in Appendix B.

Measurement
The vowels were manually labelled according to acoustic and perceptual cues from F2 front to F1 terminal (Peterson et al., 1960). The duration was divided evenly into 10 sections, giving 11 data points. Following Cao (2007) on the duration ratio in the 1 st and 2 nd sonorous diphthongs and triphthongs, 6:4, 4:6, and 4:4:2 respectively, the method used for the monophthongs was applied to the diphthongs and triphthongs, getting each midpoint within a vowel. Several paired t-tests were run to compare the vowels articulated by each CWAPD and a normal child of the same gender.

Results
The results show that CWAPD have no difficulty pronouncing monophthongs as there were no significant differences between their vowels compared to those of normal children. However, the number of errors occurred with diphthongs and triphthongs. When a diphthong has [i] or [u] as its last segment, the least sonorant segment within the syllable, it is pronounced differently when compared to normal children. This means that the diphthongs /ia/, /ie/, /ua/, /uo/, and /ye/ are produced in a similar fashion to normal children. The diphthongs /iai/ and /iou/ did not induce any errors, meaning that /iau/, /uai/, and /uei/ are problematic. What remains of interest is that the relatively less sonorant segment was usually distorted whether there were two or three vowels.
In the following figures the blue lines represent the data from either Normal I or Normal II, depending on the gender of the CWAPD, while the red lines represent the children: A, B, C, D, E, F, G, H, I, and J. None of the participants showed any problems producing monophthongs and only the problematic diphthongs and triphthongs are reported.

Monophthongs
Children, even those with articulatory phonological disorders, acquire monophthongs earlier than diphthongs and triphthongs and we found no significant difference between the normal subjects and the CWAPD in their articulation of monophthongs. This agrees with what a number of scholars (Xu, 1987;Zhu et al., 2000a and2000b) have observed in normal and abnormal children. As such, this study sheds light on the fact that CWAPD follow a similar vowel developmental process to that of normal children.

Diphthongs
In analyzing the diphthongs, the slope of each vowel's mid-point (F1 and F2) served as the dependent variable and was compared to either child I or II depending on the child's gender.
In the production of [ai]: A had F1 of (M=329.714, SD=119.568), t(6)=7.296; B had F2 of (M=1.295, SD=1362.686), t(6)=2.515; C had F1 of (M=235, SD=229.284), t(6)=2.712, and with F2 of (M=127, SD=134.284), t(6)=2.712; H had F2 of (M=854.429, SD=643.552), t(6)=3.513; and I had F1 of (M=-380.857, SD=305.233), t(6)=-3.301, and F2 of (M=736.714, SD=599.009), t(6)=3.254, all p<.05. C and I showed difficulty moving upwards and forwards; B and H showed difficulty moving forwards; and only A showed difficulty moving upwards. For example, Figure  1 shows that C needs to move the tongue forward when producing the second segment, [i]. A similar issue was found in the performances of B and H. I's [i] seemed to be deleted, while A's [i] was not produced as high as it should be. Overall, there was not a consistent pattern demonstrated by their production, but rather a distortion of the second vowel. To sum up, 5 out of the 10 children had problems producing [ai]. A, B, C, H, and I (left to right) In the production of [au]: B had F1 of (M=-326, SD=79.603); t(6)=-10.835, and D had F1 of (M=81.00, SD=52.898), t(3)=4.211, all p<.05. Both B and D had problems in moving upwards. However, Figure 2 indicates that D seemed to have problems with both [a] and [u], while B seems to move more upwards and backwards when producing [u].

Figure 2: [au] produced by B and D
In the production of [ei]: F had F1 of (M=-220, SD=126.815), t(3)=-3.470, p<.05. This means F has a problem moving upward. However, Figure 3 shows that F seems to have problems with both [e] and [i].

Figure 3: [ei] produced by F
In the production of [ou]: B had F1 of (M=-174.167, SD=92.244), t(5)=-4.625, and F2 of (M=-304.167, SD=283.03), t(5)=-2.632; C had F1 of (M=-124, SD=113.37); t(5)=-2.679, and F2 of (M=327, SD=104.384), t(6)=2.822; and child F had F1 of (M=-214.167, SD=123.543); t(5)=-4.246, and F2 of (M=-375.333, SD=244.979); t(5)=-3.753, all p<.05. This means that F had a problem in moving upwards and backwards, and B and C in moving upwards. However, C did not produce back vowels far enough back. Neither [o] nor [u] was produced as they should have been. Similar difficulties with [ou] were shown by B and F, as shown in Figure 4. To sum up, the slope of the vowel only gives us information on dynamic movement, but a more detail description of the children's vowel performance is needed. It is hard to generalize the difficulty scale of diphthongs represented by our 10 CWAPD. It seems that [ai] is more problematic than other diphthongs-more than half the CWAPD made mistakes. However, we still find a pattern revealed by plotting F1/F2-some small changes need to be made to the articulation of the second vowel. Compared to other difficulties, the CWAPD seemed to be able to acquire [ai] earlier than other problematic diphthongs.

Triphthongs
In triphthongs, the slope of each vowel's mid-point (F1 and F2) served as the dependent variable and was compare to either I or II depending on the child's gender. This means that two separate paired t-tests covered the performance of one triphthong. Take   -14.977; and G in [ai] had F1 of (M=750, SD=45.530), t(3)=3.967, all p<.05. All children had problems moving upwards in the last segment, except B 2 who also had difficulty moving forwards. Figure 6 shows that A, B, C, and G need to practice more on the height of the last segment within the triphthong [uai].  The slope of the vowel gives us some information, for example that the last segment is usually problematic. However, it is difficult to guide the participating CWAPD to move their tongues in the correct direction, either more forwards or backwards, solely based on the F1/F2 plotting for CWAPD generated in this study and we suggest that further examination is required. We could only identify the difficulty of a triphthong in its last segment, which is always the least sonorant segment within the syllable. In addition, it seems that [uei] was very problematic with more than half of the CWAPD making mistakes.

Discussion
Acoustic analysis of CWAPD from 3 to 5 years old has shown the order of acquisition of vowels from monophthongs to multiphthongs. This process parallels that of normal children-from a single vowel to several different ones within a syllable in Mandarin (Zhu et al., 2000a). In addition, the least salient sound within a syllable does not necessary cause errors, but rather when it is located in the last position, which is usually the least salient position. This means that the primary challenge that CWAPD face centers on the least salient components, both in terms of the nature of the segment and its position. This study also observes that deviant pronunciations of CWAPD cannot easily be generalized, for example as to which dimension of the tongue needs more practice.
Distortion might adequately explain this phenomenon such as when a sound in a word is completely deleted, which is usually diagnosed as an omission by experienced therapists, for example [ue] for [uei] (喂). Distortion, however, where the sound is partially correctly produced, means some feature is distorted, such as [ɕan] for [san] (山). The difference between omission and distortion lies in the observation that the distortion patterns show clearly that the speakers have correctly acquired the metrical structure of diphthongs and triphthongs, i.e. they know that there are three vowels/moras involved. When the omission occurs, the metrical structure is not fully acquired, meaning the tone is likely not maintained. This study offers some acoustic data to show that CWAPD from 3 to 5 years old tend to distort rather than delete the least sonorous sounds in the last position, even though the therapists had not noted this. As such, emphasizing the distorted sound, i.e. encouraging CWAPD to perceive the difference between the correct sound and the distorted one, may be helpful. Once the diagnosis is correct, therapists can then use more methods to help CWAPD of 3 to 5 years old.
One may argue that omission or distortion of the sounds may provide an important distinction when correcting CWAPD. Our study has shown that different CWAPD may have different problems in producing the relevant sounds. Once one particular distorted sound has been detected, it is easier for therapists to emphasize this sound, for example, if a CWAPD distorts the sound [ai], as shown in Figure 1 with A is probably more helpful for him. As such, the contribution of this study is not only to highlight the difficulties shown by CWAPD from 3 to 5 years old, but also to provide acoustic evidence for theorists to help them choose the methods best suited to individual subjects.
Furthermore, this study aims to provide acoustic evidence to examine whether CWAPD reveal a pattern. Table 1 listed all the problematic diphthongs and triphthongs analyzed by our study. F, G, H, and I (all female) made fewer mistakes than A, B, C, D, and E (all male); the female CWAPD seemed to make mistakes in [uei], while only one male CWAPD did (C). Some mirror errors were found, e.g. A and B having problems with [iau] and [uai]. The relationship between the last two segments of a triphthong and a diphthong was also observed. For example, B could not produce [uai] [uei] showed significant differences compared to those of normal children. Their therapists did not consider these sounds to be problematic. On the contrary, our study could not successfully pinpoint C's [iau], G's [ai], and H's [ei], which were diagnosed as problematic sounds by their therapists. Generally speaking, our study indicates more problematic sounds than the therapists.
One might guess that the inconsistency lies in the fact that the production of CWAPD is not stable-when the CWAPD is under the supervision of their therapist, they may be able to pronounce the sound correctly. Once leaving the clinic, the CWAPD may forget how or not be required to pronounce the sound correctly, resulting in more errors in our study. However, this did not explain why some sounds diagnosed by therapists as problematic were not seen in our study.  [uai] lies in the fact that the latter structure is more complex than the former. Thus, it seems that our analysis provides a way to identify the location of errors, rather than offer an explanation. On the other hand, regarding the difference between the researcher and therapist analysis, the therapist judged whether the sounds were correct in terms of the children's performance, while analysis with Praat dealt with acoustic signals, reflecting the trajectories of the tongue. One should note that in Peterson and Barney (1952)  , and [a]-all F1 values were much higher than for normal children and 10 to 12-year-old English L1 children. It seems that children have difficulties in controlling the vocal organs to articulate all vowels at preschool age. One should also note that their F2 values ranges were similar to those of the normal children. This means that CWAPD from 3 to 5 years old may hyperarticulate the sound by over-rising or over-lowering the tongue, while controlling the tongue's extension seems not to be a problem. The results from analysis with Praat provides some details that describe the subtle differences made by CWAPD of 3 to 5 years old and this data can provide a reference to help therapists in their diagnoses.