qTAtrainer (previously PENTAtrainer1) ---- A Praat script for automatic analysis and synthesis of intonation based on the qTA model, working on individual sound files (Version 1.3) [Download]

by Yi Xu and Santitham Prom-on

An interactive Praat script that allows you to:

Automatically extract pitch target parameters (slope, height, strength) based on qTA (Prom-on, Xu & Thipakorn, 2009)
Resynthesize F0 contours based on the extracted target parameters
Resynthesize F0 contours based on user-modified or any arbitrary target parameters
Specify target location and restrict direction of target slope
Manually rectify vocal pulse markings for accurate f0 tracking
Exhaustively process all wav files in a folder
Perform the same f0 analysis as ProsodyPro
Collect extracted parameters of all sounds in a folder and save them in ensemble files

_ _ _ _ _Original* _ _ _ _ _Resynthesis

* Original sound from ToBI training web site: www.ling.ohio-state.edu/research/phonetics/E_ToBI/

Explanation

This script is for automatic extraction of pitch target parameters. A pitch target is the ideal f0 trajectory associated with a segmental unit, which is defined by three parameters: slope, height and strength. The target notion is the core of the PENTA model (Parallel Encoding and Target Approximation, cf. Xu, 2005). The current script is based on the implementation of Prom-on, Xu and Thipakorn (2009).

In qTA, a target is defined by the linear equation f0 = mt + b, where f0 is the surface f0, m is the slope of the target and b is the height of the target defined as the intercept of the target offset with the y-axis. The surface f0 is the outcome of sequential asymptotic approximation of successive pitch targets based on a critically damped 3rd-order linear system.

The extraction of target parameters in this script is done by analysis-by-synthesis. For each target interval, the script uses all possible combinations of the three parameters within the search range to generates f0 contours based on qTA, at a certain step size, and the difference between the synthesized and original contours is computed in terms of sum of squared errors (SSE). The parameter set with the least SSE is chosen as the target of the interval.

The target intervals are defined by user by marking its boundaries and entering a label in the top tier of the TextGrid. No targets are extracted from intervals with no labels.

The target search ranges can be restricted by user in a number of ways:

In the startup window, users can change the global search ranges defined by the maximum and minimum parameter values.
For each sound file, blank intervals in the target tier (2) are given full search ranges defined by the maximum and minimum parameter values.
Tier 2 intervals labeled as H, M, L, h, m or l are given a fixed 0 slope.
Tier 2 intervals labeled as R or r are searched only for positive slopes.
Tier 2 intervals labeled as F or f are searched only for negative slopes.

qTAtrainer is useful not only for resynthesizing f0 contours of individual sentences, but also as a research tool. Here are some examples:

Determining pitch targets corresponding to specific communicative functions, e.g., lexical contrast marked by tone. Targets can be determined by extracting target parameters from many tokens of a functional unit, and the average values of the parameters can be considered as characteristic of the target (cf. Prom-on, Xu & Thipakorn, 2009).
Identifying contributions of different communications by varying functional specificity when averaging the target parameters, e.g., based on focus condition, position in sentence or phrase, or sentence type (statement vs. question), etc. (cf. Prom-on, Xu & Thipakorn, 2009).
Exploring what is the best target interval, e.g., voiced section, syllable, word, accent or phrase. Our initial testing shows that the syllable is the best target interval for English.
Testing hypotheses about pitch targets of a language. For example, to determine if a tone is high or rising, one may compare the rmse and correlation values of resticting target slope to either 0 or positive (reported in output files X.means and targets.txt).

Instructions

qTAtrainer consists of qTAtrainer.praat -- a Praat script, and learnqta.exe (learnqta for Mac) -- an executable called by the script. See Download
Put both files in the folder containing the sound files to be analyzed, and launch Praat;
Select Open Praat Script... from the top menu;
Locate qTAtrainer.praat in the dialogue window and select it;
When the script window opens in Praat, select Run from the Run menu (or type the key shortcut command-r or control-r);
In the startup window, check or uncheck the boxes according to your need, and set appropriate values in the text fields or simply use the default values. Select the task by checking the appropriate radio button.
Click OK and three windows will appear. The first window (PointProcess) displays the waveform together with vocal cycle marks (vertical lines) generated by Praat. This is where you can manually add the missing marks and delete the redundant ones. You need to do this only for the named intervals, as explained next.
The second window (TextGrid) displays the waveform and spectrogram of the current sound together with optional pitch track and formant tracks in the spectrogram panel, and vocal pulse marks in the waveform panel. (These tracks and marks cannot be manually changed. So you can hide them to reduce processing time by using the corresponding menu.)
At the bottom of this window are three TextGrid tiers, where you can insert interval boundaries (Tier 1) and define search restrictions (Tier 2). For any interval that you want to have results saved, a label in Tier 1 is required. The label can be as simple as a, b, c or 1, 2, 3.
You can make qTAtrainer skip a voiceless region by assigning it a blank interval. Any blank interval with duration < minimum_pause_duration will be treated as a syllable-initial voiceless consonant. Note, however, the skipping of voiceless consonants is not obligatory for parameter extraction.
The third window (qTAtrainer) displays pitch targets (green dashed straight lines) and synthesized f0 (red solid curve) against the original f0 (blue dashed curve). The thickness of a target line represents its strength. The grey vertical lines indicate interval boundaries. When there are no labeled intervals, only the original f0 is displayed.
After labeling the intervals, press "Replot" on the left side of the window and you will see both synthesized and original f0 contours.
The qTAtrainer window allows you to inspect the f0 contours in various ways: zooming in and out, scrolling left and right, and playing part or the whole of the original or resynthesized signal. The window also allows you to move to the next or previous sound file.
When you click "Next" or "Previous" in the qTAtrainer window, the TextGrid and PointProcess windows will be refreshed, displaying the spectrogram, waveform and vocal cycle marks of the next sound. You can repeat this process until all the sounds in the folder are processed. Or you can finish any time by clicking "Exit".
To modify the automatically learned parameters, or use your own parameters, change them in the TextGrid window, after which you can press the Replot button in the qTAtrainer window to see and hear the newly synthesized f0 contours.

Output

Each time you press "Next" in the qTAtrainer window, various analysis results are saved for the current sound as text files (Red ones are directly relevant for target extraction):

X.rawf0 -- raw f0 with real time computed directly from the pulse markings
X.f0 -- smoothed f0 with the trimming algorithm (Xu, 1999)
X.samplef0 -- f0 values at fixed time intervals specified by "f0 sample rate"
X.timenormf0 -- time-normalized f0. The f0 in each interval is divided into the same number of points (default = 10).
X.actutimenormf0 -- time-normalized f0 with each interval divided into the same number of points (default = 10). But the time scale is the original, except that the onset time of interval 1 is set to 0, unless the "Set initial time to 0" box in the startup window is unchecked.
X.f0velocity -- velocity profile (instantaneous rates of F0 change) of f0 contour in semitone/s at fixed time intervals specified by "f0 sample rate"
X.means -- Containing the following values (in the order of the columns):
1. maxf0
2. minf0
3. excursion size
4. finalf0 -- Indicator of target height (taken at a point specified by "Final offset" in the startup window)
5. mean intensity
6. duration
7. max_velocity
8. final_velocity -- Indicator of target slope (taken also at a point earlier than the interval offset by time specified by "Final offset" in the startup window)
9. initialf0 -- Initial f0 of the first labeled interval (used in target search)
10. target_slope
11. target_height
12. strength
13. duration (in seconds)
14. rmse (root mean squared error between original and synthesized f0)
15. correlation (between original and synthesized f0)

If you want to change certain analysis parameters after processing all the sound files, you can rerun the script, set the "Input File No" to 1 in the startup window and check the button "Process all sounds without pause" before pressing "OK". The script will then run by itself and cycle through all the sound files in the folder one by one.

After the analysis of all the individual sound files are done, you can gather the analysis results into a number of ensemble files by running the script again and checking the button "Get ensemble results" in the startup window. The following ensemble files will be saved:

targets.txt
means.txt
normf0.txt
normactutime.txt
samplef0.txt
f0velocity.txt
maxf0.txt
minf0.txt
excursionsize.txt
meanf0.txt
duration.txt
maxvelocity.txt
finalf0.txt
finalvelocity.txt
meanintensity.txt

Note that you can generate the ensemble files only if you have analyzed at least one sound following the steps described earlier.

An example

Click here for the files used in creating the graph shown on top of the page: bloomingdales1.wav, bloomingdales1.pulse and bloomingdales1.target. Put the files in the same fold as qTAtrainer.praat and learnqta, and run qTAtrainer from within Praat.

Download

Mac (OS X, Intel/PPC)
Windows
Linux 32-bit
Linux 64-bit

Need more help?

Detailed instructions can be also found at the beginning of the script. If you are still stuck, please contact me at yi.xu at ucl.ac.uk.

How to cite

Xu, Y. & Prom-on, S. (2010-present year). qTAtrainer.praat. Available from: http://www.homepages.ucl.ac.uk/~uclyyix/qTAtrainer/.

Published research making use of qTAtrainer (or its predecessor PENTAtrainer1)

2011

Cheng, C., Prom-on, S. and Xu, Y. (2011). Modelling extreme tonal reduction in Taiwan Mandarin based on target approximation. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 468-471.
Prom-on, S., Liu, F. and Xu, Y. (2011). Functional modeling of tone, focus and sentence type in mandarin Chinese. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 1638-1641.
Barbosa, P. A., Mixdorff, H. and Madureira, S. (2011). Applying the quantitative target approximation model (qTA) to German and Brazilian Portuguese. In Proceedings of Interspeech 2011, Florence, Italy
Li, A., Fang, Q. and Dang, J. (2011). Emotional intonation in a tone language: Experimental evidence from Chinese. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 1198-1201.
2012
Prom-on, S., Liu, F. and Xu, Y. (2012). Post-low bouncing in Mandarin Chinese: Acoustic analysis and computational modeling. Journal of the Acoustical Society of America. 132: 421-432.
BARBOSA, P., MIXDORFF, H. and MADUREIRA, S. (2012). Cross-linguistic analysis of two speaking styles in Brazilian Portuguese and German by using the quantitative Target Approximation model. In Proceedings of VII GSCP INTERNATIONAL CONFERENCE: SPEECH AND CORPORA, Belo Horizonte, Brazil
Li, A., Fang, Q., Jia, Y. and Dang, J. (2012). More targets? Simulating emotional intonation of mandarin with PENTA. In Proceedings of Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on. IEEE: 271-275.
2013
Liu, F., Xu, Y., Prom-on, S. and Yu, A. C. L. (2013). Morpheme-like prosodic functions: Evidence from acoustic analysis and computational modeling. Journal of Speech Sciences 3(1): 85-140.
2014
Van Niekerk, D. R. and Barnard, E. (2014). Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis. Speech Communication 56: 229-242.
Lee, A., Xu, Y. and Prom-on, S. (2014). Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer. TAL 2014. Nijmegen: 164-167.
Liu, H. and Xu, Y. (2014). A Simplified Method of Learning Underlying Articulatory Pitch Target. In Proceedings of Speech Prosody 2014, Dublin: 1017-1021.
Liu, H. and Xu, Y. (2014). Learning Model-based F0 Production through Goal-directed Babbling. 9th International Symposium on Chinese Spoken Language Processing. Singapore.
2015
Liu, H. and Xu, Y. (2015). Simulating online compensation for pitch-shifted auditory feedback with the target approximation model. In Proceedings of The 18th International Congress of Phonetic Sciences, Glasgow, UK: ISBN 978-0-85261-941-4. Paper number 0437.1-5.

Yi's other tools