Coarticulation as synchronized dimension-specific sequential target approximation

The term ‘coarticulation’ (‘Koartikulation’) was first proposed by Menzerath and De Lacerda (1933) to refer to the phenomenon that the movement of the vowel in a consonant-to-vowel (CV) sequence starts at the same time as the consonant (Kühnert & Nolan, 1999).
This classic definition suggests that the articulation of C is fully overlapped with that of V at the beginning of the syllable, or the two are co-produced (Fowler, 1980).
The C-V overlap, however, implies an inevitable conflict between the articulatory goals of C and V, and the conflict is likely behind the phenomenon of coarticulatory resistance (Bladon & Al-Bamerni, 1976; Recasens, 1985), a problem that so far is still unresolved (Iskarous et al., 2013).

Our hypothesis: Dimension-specific sequential target approximation (DSSTA)

Coarticulation is due to synchronized onset (co-onset) of consonant and vowel at the beginning of the syllable
Despite the C-V overlap due to co-onset, at the level of individual articulator dimension, articulation follows the principle of sequential target approximation—approaching one target at a time.
The conflict of articulatory goals during co-onset is resolved by allowing individual dimensions of any specific articulator to be controlled by either C or V.

For example, in a /gV/ sequence, tongue body height can be controlled by the consonant to achieve a velar closure, while tongue body frontness can be controlled by the vowel for achieving an appropriate vocal tract shape. This DSSTA strategy may have led to variable velar contact location depending on the vowel context (Dembowski et al., 1998).

Testing DSSTA by acoustically trained articulatory synthesis

VocalTractLab (Birkholz 2013)—a state-of-the-art articulatory synthesizer—is trained to automatically learn articulatory parameters of consonants and vowels with only acoustic signals of various CV syllables as training data. The process is illustrated in the flow chart below.

B. Specific articulatory dimensions are hypothetically assigned to either C or V, and target approximation of all dimensions uniformly start at the syllable onset, except for those whose control is shifted to V only after the end of C (top tier in B).

C. In each learning cycle, VTL uses a random set of parameters to synthesize a CV syllable and the MFCC matrix of the synthetic signal is compared with that of the target natural syllable.

D. The sum of squared errors (SSE) determines whether the current parameter set has led to sufficient improvement over the previous best set to warrant adoption.

This learning via analysis-by-synthesis typically takes many cycles. Once the learning has shown convergence—significant reduction and stabilization of SSE, the last synthetic CV syllable is perceptually evaluated in terms of both intelligibility and naturalness.

Sample synthetic sounds

/bV/ syllables
Synthetic

Natural

bead

bid

bed

bad

bod

booed

bud

/dV/ syllables
Synthetic

Natural

deed

did

dead

Synthetic
get

good

god

Spectrogram

Sound

Natural
get

good

god

Spectrogram

Sound

References

Bladon, R. A. W. and Al-Bamerni, A. (1976). Coarticulation resistance of English /l/. Journal of Phonetics 4: 135-150.
Dembowski, J., Lindstrom, M. J. and Westbury, J. R. (1998). Articulator point variability in the production of stop consonants. In Neuromotor speech disorders: nature, assessment, and management. M. P. Cannito, K. M. Yorkston and D. R. Beukelman. Baltimore: Paul H. Brookes pp. 27-46.
Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of Phonetics 8: 113-133.
Iskarous, K., Mooshammer, C., Hoole, P., Recasens, D., Shadle, C. H., Saltzman, E. and Whalen, D. H. (2013). The coarticulation/invariance scale: Mutual information as a measure of coarticulation resistance, motor synergy, and articulatory invariance. Journal of the Acoustical Society of America 134(2): 1271-1282.
Kühnert, B. and Nolan, F. (1999). The origin of coarticulation. In Coarticulation: Theory, Data and Techniques. W. J. Hardcastle and N. Newlett. Cambridge: Cambridge University Press pp. 7-30.
Menzerath, P., de Lacerda, A. (1933). Koartikulation, Seuerung und Lautabgrenzung. Berlin and Bonn: Fred. Dummlers.
Recasens, D. (1985). Coarticulatory patterns and degrees of coarticulatory resistance in Catalan CV sequences. Language and Speech 28: 97-114.

© 2019 AAVL Project Team Contact Us