qTAtrainer (previously PENTAtrainer1) ---- A Praat script for automatic analysis and synthesis of intonation based on the qTA model, working on individual sound files (Version 1.2) [Download]

by Yi Xu and Santitham Prom-on

An interactive Praat script that allows you to:

_ _ _ _ _Original* _ _ _ _ _Resynthesis

* Original sound from ToBI training web site: www.ling.ohio-state.edu/research/phonetics/E_ToBI/


Explanation

This script is for automatic extraction of pitch target parameters. A pitch target is the ideal f0 trajectory associated with a segmental unit, which is defined by three parameters: slope, height and strength. The target notion is the core of the PENTA model (Parallel Encoding and Target Approximation, cf. Xu, 2005). The current script is based on the implementation of Prom-on, Xu and Thipakorn (2009).

In qTA, a target is defined by the linear equation f0 = mt + b, where f0 is the surface f0, m is the slope of the target and b is the height of the target defined as the intercept of the target offset with the y-axis. The surface f0 is the outcome of sequential asymptotic approximation of successive pitch targets based on a critically damped 3rd-order linear system.

The extraction of target parameters in this script is done by analysis-by-synthesis. For each target interval, the script uses all possible combinations of the three parameters within the search range to generates f0 contours based on qTA, at a certain step size, and the difference between the synthesized and original contours is computed in terms of sum of squared errors (SSE). The parameter set with the least SSE is chosen as the target of the interval.

The target intervals are defined by user by marking its boundaries and entering a label in the top tier of the TextGrid. No targets are extracted from intervals with no labels.

The target search ranges can be restricted by user in a number of ways:

  1. In the startup window, users can change the global search ranges defined by the maximum and minimum parameter values.
  2. For each sound file, blank intervals in the target tier (2) are given full search ranges defined by the maximum and minimum parameter values.
  3. Tier 2 intervals labeled as H, M, L, h, m or l are given a fixed 0 slope.
  4. Tier 2 intervals labeled as R or r are searched only for positive slopes.
  5. Tier 2 intervals labeled as F or f are searched only for negative slopes.

qTAtrainer is useful not only for resynthesizing f0 contours of individual sentences, but also as a research tool. Here are some examples:

  1. Determining pitch targets corresponding to specific communicative functions, e.g., lexical contrast marked by tone. Targets can be determined by extracting target parameters from many tokens of a functional unit, and the average values of the parameters can be considered as characteristic of the target (cf. Prom-on, Xu & Thipakorn, 2009).
  2. Identifying contributions of different communications by varying functional specificity when averaging the target parameters, e.g., based on focus condition, position in sentence or phrase, or sentence type (statement vs. question), etc. (cf. Prom-on, Xu & Thipakorn, 2009).
  3. Exploring what is the best target interval, e.g., voiced section, syllable, word, accent or phrase. Our initial testing shows that the syllable is the best target interval for English.
  4. Testing hypotheses about pitch targets of a language. For example, to determine if a tone is high or rising, one may compare the rmse and correlation values of resticting target slope to either 0 or positive (reported in output files X.means and targets.txt).

Instructions

  1. qTAtrainer consists of qTAtrainer.praat -- a Praat script, and learnqta.exe (learnqta for Mac) -- an executable called by the script. See Download
  2. Put both files in the folder containing the sound files to be analyzed, and launch Praat;
  3. Select Open Praat Script... from the top menu;
  4. Locate qTAtrainer.praat in the dialogue window and select it;
  5. When the script window opens in Praat, select Run from the Run menu (or type the key shortcut command-r or control-r);
  6. In the startup window, check or uncheck the boxes according to your need, and set appropriate values in the text fields or simply use the default values. Select the task by checking the appropriate radio button.
  7. Click OK and three windows will appear. The first window (PointProcess) displays the waveform together with vocal cycle marks (vertical lines) generated by Praat. This is where you can manually add the missing marks and delete the redundant ones. You need to do this only for the named intervals, as explained next.
  8. The second window (TextGrid) displays the waveform and spectrogram of the current sound together with optional pitch track and formant tracks in the spectrogram panel, and vocal pulse marks in the waveform panel. (These tracks and marks cannot be manually changed. So you can hide them to reduce processing time by using the corresponding menu.)
  9. At the bottom of this window are three TextGrid tiers, where you can insert interval boundaries (Tier 1) and define search restrictions (Tier 2). For any interval that you want to have results saved, a label in Tier 1 is required. The label can be as simple as a, b, c or 1, 2, 3.
  10. You can make qTAtrainer skip a voiceless region by assigning it a blank interval. Any blank interval with duration < minimum_pause_duration will be treated as a syllable-initial voiceless consonant. Note, however, the skipping of voiceless consonants is not obligatory for parameter extraction.
  11. The third window (qTAtrainer) displays pitch targets (green dashed straight lines) and synthesized f0 (red solid curve) against the original f0 (blue dashed curve). The thickness of a target line represents its strength. The grey vertical lines indicate interval boundaries. When there are no labeled intervals, only the original f0 is displayed.
  12. After labeling the intervals, press "Replot" on the left side of the window and you will see both synthesized and original f0 contours.
  13. The qTAtrainer window allows you to inspect the f0 contours in various ways: zooming in and out, scrolling left and right, and playing part or the whole of the original or resynthesized signal. The window also allows you to move to the next or previous sound file.
  14. When you click "Next" or "Previous" in the qTAtrainer window, the TextGrid and PointProcess windows will be refreshed, displaying the spectrogram, waveform and vocal cycle marks of the next sound. You can repeat this process until all the sounds in the folder are processed. Or you can finish any time by clicking "Exit".
  15. To modify the automatically learned parameters, or use your own parameters, change them in the TextGrid window, after which you can press the Replot button in the qTAtrainer window to see and hear the newly synthesized f0 contours.

Output

Each time you press "Next" in the qTAtrainer window, various analysis results are saved for the current sound as text files (Red ones are directly relevant for target extraction):

If you want to change certain analysis parameters after processing all the sound files, you can rerun the script, set the "Input File No" to 1 in the startup window and check the button "Process all sounds without pause" before pressing "OK". The script will then run by itself and cycle through all the sound files in the folder one by one.

After the analysis of all the individual sound files are done, you can gather the analysis results into a number of ensemble files by running the script again and checking the button "Get ensemble results" in the startup window. The following ensemble files will be saved:

  1. targets.txt
  2. means.txt
  3. normf0.txt
  4. normactutime.txt
  5. samplef0.txt
  6. f0velocity.txt
  7. maxf0.txt
  8. minf0.txt
  9. excursionsize.txt
  10. meanf0.txt
  11. duration.txt
  12. maxvelocity.txt
  13. finalf0.txt
  14. finalvelocity.txt
  15. meanintensity.txt

Note that you can generate the ensemble files only if you have analyzed at least one sound following the steps described earlier.

An example

Click here for the files used in creating the graph shown on top of the page: bloomingdales1.wav, bloomingdales1.pulse and bloomingdales1.target. Put the files in the same fold as qTAtrainer.praat and learnqta, and run qTAtrainer from within Praat.

Download

Need more help?

Detailed instructions can be also found at the beginning of the script. If you are still stuck, please contact me at yi.xu at ucl.ac.uk.

How to cite

Xu, Y. & Prom-on, S. (2010-present year). qTAtrainer.praat. Available from: http://www.homepages.ucl.ac.uk/~uclyyix/qTAtrainer/.

Published research making use of qTAtrainer (or its predecessor PENTAtrainer1)

    2011

  1. Cheng, C., Prom-on, S. and Xu, Y. (2011). Modelling extreme tonal reduction in Taiwan Mandarin based on target approximation. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 468-471.
  2. Prom-on, S., Liu, F. and Xu, Y. (2011). Functional modeling of tone, focus and sentence type in mandarin Chinese. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 1638-1641.
  3. Barbosa, P. A., Mixdorff, H. and Madureira, S. (2011). Applying the quantitative target approximation model (qTA) to German and Brazilian Portuguese. In Proceedings of Interspeech 2011, Florence, Italy
  4. Li, A., Fang, Q. and Dang, J. (2011). Emotional intonation in a tone language: Experimental evidence from Chinese. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 1198-1201.

    2012

  5. Prom-on, S., Liu, F. and Xu, Y. (2012). Post-low bouncing in Mandarin Chinese: Acoustic analysis and computational modeling. Journal of the Acoustical Society of America. 132: 421-432.
  6. BARBOSA, P., MIXDORFF, H. and MADUREIRA, S. (2012). Cross-linguistic analysis of two speaking styles in Brazilian Portuguese and German by using the quantitative Target Approximation model. In Proceedings of VII GSCP INTERNATIONAL CONFERENCE: SPEECH AND CORPORA, Belo Horizonte, Brazil
  7. Li, A., Fang, Q., Jia, Y. and Dang, J. (2012). More targets? Simulating emotional intonation of mandarin with PENTA. In Proceedings of Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on. IEEE: 271-275.

    2013

  8. Liu, F., Xu, Y., Prom-on, S. and Yu, A. C. L. (2013). Morpheme-like prosodic functions: Evidence from acoustic analysis and computational modeling. Journal of Speech Sciences 3(1): 85-140.

    2014

  9. Van Niekerk, D. R. and Barnard, E. (2014). Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis. Speech Communication 56: 229-242.
  10. Lee, A., Xu, Y. and Prom-on, S. (2014). Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer. TAL 2014. Nijmegen: 164-167.
  11. Liu, H. and Xu, Y. (2014). A Simplified Method of Learning Underlying Articulatory Pitch Target. In Proceedings of Speech Prosody 2014, Dublin: 1017-1021.
  12. Liu, H. and Xu, Y. (2014). Learning Model-based F0 Production through Goal-directed Babbling. 9th International Symposium on Chinese Spoken Language Processing. Singapore.

    2015

  13. Liu, H. and Xu, Y. (2015). Simulating online compensation for pitch-shifted auditory feedback with the target approximation model. In Proceedings of The 18th International Congress of Phonetic Sciences, Glasgow, UK: ISBN 978-0-85261-941-4. Paper number 0437.1-5.



    Yi's other tools