ProsodyPro * ---- A Praat script for large-scale systematic analysis of continuous prosodic events (Version 5.7.8: What's new?) [Download]

by Yi Xu

Version 6.1.3 beta now available for testing (New features). Please inform me of any bugs or undesirable features.

An interactive Praat script that allows you to:

Motivation and brief history

ProsodyPro is developed as a convenient tool for our own research. It allows us to systematically process large amount of speech data with high precision. It has maximally reduced the amount of human labor by automating tasks that do not require human judgment, such as locating and opening sound files, taking measurements, and saving raw results in formats ready for further graphical and statistical analysis. On the other hand, it also allows human intervention of processes that are prone to error in automatic algorithms such as pitch detection and segmentation.

The f0 trimming and time-normalization algorithms, which are part of the core of the script, were developed in my PhD research (Xu 1993), which were then implemented in a C program working in conjunction with xwaves, which, like Praat, generates automatic vocal cycle markings and saves most of the human labor in marking the cycles manually as done in my dissertation. The arrival of Praat, thanks to the brilliant invention of Paul Boersma and David Weenink, makes it possible to put these algorithms together in a single script that can run on all major computer platforms. It also solved the problem of having to write a different C program for each new experiment.

The first version of the script was made public in 2005. Since then it has been used in a growing number of research projects. Some are listed here.

Why time-normalization? -- Justifications you may need when responding to questions

Instructions (中文说明)

  1. Put ProsodyPro.praat in the folder containing the sound files to be analyzed, and launch Praat;
  2. Select Open Praat Script... from the top menu;
  3. Locate ProsodyPro.praat in the dialogue window and select it;
  4. When the script window opens in Praat, select Run from the Run menu (or type the key shortcut command-r or control-r);
  5. In the startup window, check or uncheck the boxes according to your need, and set appropriate values in the text fields or simply use the default values. Select the task by checking the appropriate radio button.
  6. Click OK and three windows will appear. The first window (PointProcess) displays the waveform together with vocal cycle marks (vertical lines) generated by Praat. This is where you can manually add the missing marks and delete the redundant ones. You need to do this only for the named intervals, as explained next.
  7. The second window (TextGrid) displays the waveform and spectrogram of the current sound together with optional pitch track and formant tracks in the spectrogram panel, and vocal pulse marks in the waveform panel. (These tracks and marks cannot be manually changed. So you can hide them to reduce processing time by using the corresponding menu.)
  8. At the bottom of this window are two TextGrid tiers, where you can insert interval boundaries (Tier 1) and add comments (Tier 2). For any interval that you want to have results saved, a label in Tier 1 is required. The label can be as simple as a, b, c or 1, 2, 3.
  9. The third window (Pause) allows you to control the progression of the analysis. To bring up the next sound to be analyzed, change the number (or leaving it as is) in the current_file box and press "Continue". The number indicates the order in the String object "list" in the Object window (a hardcopy is also saved in the current folder). The next sound will be 1 + current_file (So, type 0 to open sound 1).
  10. To end the progression of the current analysis session, press "Finish" in the Pause window, and the last sound analyzed will be shown in the Praat Info window. You can use that number as a starting point in you next analysis session.
  11. After processing individual files, you can run the script again to get ensemble files by checking the third radio button from the top.
  12. You can also change various parameter after processing individual files by runing the script again with the radio button "Process all sounds without pause" checked. Just watch the script run through all the files on its own.
  13. You can also generate mean normf0 contours averaged** across repetitions of identical sentences. To do this, set the value of Nrepetitions in the opening window according to the number of repetitions in your data set when you run the script with the "Get ensemble files" button checked. Make sure that the number of labeled intervals are identical across the repetitions.
  14. To force ProsodyPro to skip extra repetitions, you need to check "Ignore extra repetition" and also name your sound files with a final digit that indicates repetition.
  15. To average across unequal number of repetitions, you can create a text file (default = repetition_list.txt) in which sound-file names are listed in a single column, with blank lines separating the repetition groups. You can create this file by renaming "FileList.txt" that is always generated by ProsodyPro, and then modifying it by inserting blank lines and deleting sounds that you want to exclude. Note that deleting sound names in this file allows you to skip sounds that you want to exclude in your final analysis.
  16. You can also generate mean normf0 contours averaged** across speakers. To do this, first create a text file (speaker_folders.txt) containing the speaker folder names arranged in a single column. Then run ProsodyPro with the 4th task--Average across speakers--checked. The script will read mean_normf0.txt from all the speaker folders, average the f0 values on a logarithmic scale, and then convert them back to Hz. The grand averages are saved in "mean_normf0_cross_speaker.txt". In the Start window, you also need to tell ProsodyPro where the speaker folder file is. The default location is the current directory: "./". If it is in an upper directory, you should enter "../"


Each time you press "Continue" in the Pause window, various analysis results are saved for the current sound as text files:

If you want to change certain analysis parameters after processing all the sound files, you can rerun the script, set the "Input File No" to 1 in the startup window and check the button "Process all sounds without pause" before pressing "OK". The script will then run by itself and cycle through all the sound files in the folder one by one.

After the analysis of all the individual sound files are done, you can gather the analysis results into a number of ensemble files by running the script again and checking the button "Get ensemble results" in the startup window. The following ensemble files will be saved:

  1. normf0.txt (Hz)
  2. normtime_semitonef0.txt (semitones)
  3. normtime_f0velocity.txt (semitones/s)
  4. normtimeIntensity.txt (dB)
  5. normactutime.txt (s)
  6. maxf0.txt (Hz)
  7. minf0.txt (Hz)
  8. excursionsize.txt (semitones)
  9. meanf0.txt (Hz)
  10. duration.txt (ms)
  11. maxvelocity.txt (semitones/s)
  12. finalvelocity.txt (semitones/s)
  13. finalf0.txt (Hz)
  14. meanintensity.txt (dB)
  15. samplef0.txt (Hz)
  16. f0velocity.txt (semitones/s)
  17. maxf0_loc_ms.txt (ms)
  18. maxf0_loc_ratio.txt (ratio)

    If Nrepetitions > 0, the following files will also be saved:

  19. mean_normf0.txt (Hz)
  20. mean_normtime_semitonef0.txt (semitones)
  21. mean_normtime_f0velocity.txt (semitones/s)
  22. mean_normtimeIntensity.txt (dB)
  23. mean_normactutime.txt (s)
  24. mean_maxf0.txt (Hz)
  25. mean_minf0.txt (Hz)
  26. mean_excursionsize.txt (semitones)
  27. mean_meanf0.txt (Hz)
  28. mean_duration.txt (ms)
  29. mean_maxvelocity.txt (semitones/s)
  30. mean_finalvelocity.txt (semitones/s)
  31. mean_finalf0.txt (Hz)
  32. mean_meanintensity.txt (dB)
  33. mean_maxf0_loc_ms.txt (ms)
  34. mean_maxf0_loc_ratio.txt (ratio)

    If Task 4 "Average across speakers" is selected, the following file will also be saved:

  35. mean_normf0_cross_speaker.txt
  36. mean_normactutime_cross_speaker.txt
  37. mean_normtime_f0velocity_cross_speaker.txt
  38. mean_normtime_semitonef0_cross_speaker.txt

BID (Bio-informational Dimensions) meansurements

A set of emotion-relevant measurements have been added since version 5.6. These measurements were proposed in Xu, Kelly & Smillie (2013). based on Morton, 1977, Ohala, 1984, as well as our own experimental work.

  1. h1-h2 (dB) -- Amplitude difference between 1st and 2nd harmonics
  2. h1*-h2* (dB) -- Formant-adjusted h1-h2 (Iseli, Shue & Alwan 2007)
  3. H1-A1 (dB) -- Amplitude difference between 1st harmonic and 1st formant
  4. H1-A3 (dB) -- Amplitude difference between 1st harmonic and 3rd formant
  5. cpp -- Cepstral Peak Prominence (Hillenbrand et al., 1994)
  6. center_of_gravity (Hz) -- Spectral center of gravity
  7. Hammarberg_index (dB) -- Difference in maximum energy between 0-2000 Hz and 2000-5000 Hz
  8. energy_below_500Hz (dB) -- Energy of voiced segments below 500Hz
  9. energy_below_1000Hz (dB) -- Energy of voiced segments below 1000Hz
  10. Formant_dispersion1_3 (Hz) -- Average distance between adjacent formants up to F3
  11. F_dispersion1_5 (Hz) -- Average distance between adjacent formants up to F5
  12. median_pitch (Hz) -- Median pitch in Hertz
  13. jitter -- Mean absolute difference between consecutive periods, divided by mean period
  14. shimmer -- Mean absolute difference between amplitudes of consecutive periods, divided by mean amplitude
  15. harmonicity (dB) -- Harmonics-to-Noise Ratio (HNR): The degree of acoustic periodicity
  16. energy_porfile (dB) -- Fifteen signal energy values computed from overlapping spectral bands of 500-Hz bandwidth: 0–500, 250–750, 500–1000, ... 3250–3750, 3500–4000

Note that you can generate the ensemble files only if you have analyzed at least one sound following the steps described earlier.


The following examples show how functional contrasts can be easily brought out by time-normalized f0 contours, whether plotted on normalized time or mean time.

_ _

_ _ _ _ _ _ _ _ _ _ Data from Xu (1999) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Data from Xu & Xu (2005)


Need more help?

Detailed instructions can be also found at the beginning of the script.

For more information, take a look at FAQ, and if you are still stuck, please write me (yi.xu at

Bug reports, suggestions on improvement and new features are also welcome.

How to cite

Xu, Y. (2013). ProsodyPro — A Tool for Large-scale Systematic Prosody Analysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France. 7-10.

Published research making use of ProsodyPro (or its predecessor TimeNormalizeF0)

  1. Ambrazaitis, G. and Frid, J. (2012). The prosody of contrastive topics in Southern Swedish. In Proceedings of FONETIK 2012
  2. Arnhold, A., Vainio, M., Suni, A. and Jarvikivi, J. (2010). Intonation of Finnish Verbs. In Proceedings of Interspeech 2010.
  3. Arunima Choudhury & Elsi Kaiser (2012). Prosodic focus in Bangla: A psycholinguistic investigation of production and perception. Linguistic Society of America Annual Meeting, Portland, OR.
  4. Berger, S., Marquard, C. and Niebuhr, O. (2016). INSPECTing read speech – How different typefaces affect speech prosody. In Proceedings of Speech Prosody 2016, Boston, USA: 514-517.
  5. Blanchette, F. and Nadeu, M. (2018). Prosody and the meanings of English negative indefinites. Journal of Pragmatics 129: 123-139.
  6. Chan, K. W. and Hsiao, J. H. (in press). Hemispheric asymmetry in processing low- and high-pass filtered Cantonese speech in tonal and non-tonal language speakers. Language & Cognitive Processes.
  7. Chang, H.-C., Lee, H.-J., Tzeng, O. J. and Kuo, W.-J. (2014). Implicit Target Substitution and Sequencing for Lexical Tone Production in Chinese: An fMRI Study. PloS one 9(1): e83126.
  8. Chen, T.-Y. and Tucker, B. V. (2013). Sonorant onset pitch as a perceptual cue of lexical tones in Mandarin. Phonetica 70: 207-239.
  9. Chen, S.-w. and Tsay, J. (2010). Phonetic realization of suffix vs. non-suffix morphemes in Taiwanese. In Proceedings of Speech Prosody 2010, Chicago.
  10. Chong, C. S., Kim, J. and Davis, C. (2018). Disgust expressive speech: The acoustic consequences of the facial expression of emotion. Speech Communication 98: 68-72.
  11. Choudhury, A. and Kaiser, E. (2012). Prosodic focus in Bangla: A psycholinguistic investigation of production and perception. In Proceedings of LSA2012
  12. Das, K. and Mahanta, S. (2016). Focus marking and Pitch Register modification in Boro. In Proceedings of Speech Prosody 2016, Boston, USA: 864-868.
  13. Ding, H., Hoffmann, R. and Jokisch, O. (2011). An Investigation of Tone Perception and Production in German Learners of Mandarin. Archives of Acoustics 36(3): 509-518.
  14. Ding, H., Hoffmann, R. and Hirst, D. (2016). Prosodic Transfer: A Comparison Study of F0 Patterns in L2 English by Chinese Speakers. In Proceedings of Speech Prosody 2016, Boston, USA: 756-760.
  15. Ding, H., Jokisch, O. and Hoffmann, a. R. (2010). Perception and Production of Mandarin Tones by German Speakers. In Proceedings of Speech Prosody 2010, Chicago
  16. Franich, K. (2015). The effect of cognitive load on tonal coarticulation. In Proceedings of The 18th International Congress of Phonetic Sciences, Glasgow, UK
  17. Gieselman, S., Kluender, R. and Caponigro, I. (2011). Pragmatic Processing Factors in Negative Island Contexts. In Proceedings of The thirty-ninth Western Conference On Linguistics (WECOL 2011), Fresno, CA: 65-76.
  18. Greif, M. (2010). Contrastive Focus in Mandarin Chinese. In Proceedings of Speech Prosody 2010, Chicago.
  19. Hamlaoui, F. and Makasso, E.-M. (2012). An Experimental Investigation of the Prosodic Expression of Focus and Givenness in Bàsàa Declaratives. Colloque du Réseau Français de Phonologie. Paris.
  20. Holt, C. M., Lee, K. Y. S., Dowell, R. C. and Vogel, A. P. (2018). Perception of Cantonese Lexical Tones by Pediatric Cochlear Implant Users. Journal of Speech, Language, and Hearing Research 61(1): 174-185.
  21. Hosono, M. (2010). Scandinavian Object Shift from the Intonational Perspective. In Proceedings of Western Conference On Linguistics, Vancouver, Canada
  22. Hsieh, F.-f. and Kenstowicz, M. J. (2008). Phonetic knowledge in tonal adaptation: Mandarin and English loanwords in Lhasa Tibetan. Journal of East Asian Linguistics 17: 279-297.
  23. Hwang, H. K. (2011). Distinct types of focus and wh-question intonation. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 922-925.
  24. Hwang, H. K. (2012). Asymmetries between production, perception and comprehension of focus types in Japanese. In Proceedings of Speech Prosody 2012, Shanghai: 326-329.
  25. Ito, C. and Kenstowicz, M. (2009). Mandarin Loanwords in Yanbian Korean II: Tones. Language Research 45: 85-109.
  26. Jörg Peters, Jan Michalsky, Judith Hanssen (2012) Intonatie op de grens van Nederland en Duitsland: Nedersaksisch en Hoogduits. Internationale Neerlandistiek, 50e jaargang, nr.1.
  27. Kenstowicz, M. (2008). On the Origin of Tonal Classes in Kinande Noun Stems. Studies in African Linguistics 37: 115-151.
  28. Lai, C. (2012). Response Types and The Prosody of Declaratives. In Proceedings of Speech Prosody 2012, Shanghai.
  29. Lai, C. (2012). Rises All the Way Up: The Interpretation of Prosody, Discourse Attitudes and Dialogue Structure, University of Pennsylvania.
  30. Lai, L.-F. and Gooden, S. (2016). Acoustic cues to prosodic boundaries in Yami: A first look. In Proceedings of Speech Prosody 2016, Boston, USA: 624-628.
  31. Lee, Y.-c. and Nambu, S. (2010). Focus-sensitive operator or focus inducer: always and only. In Proceedings of Interspeech 2010.
  32. Li, V. G. (2016). Pitching in tone and non-tone second languages: Cantonese, Mandarin and English produced by Mandarin and Cantonese speakers. In Proceedings of Speech Prosody 2016, Boston, USA: 548-552.
  33. Ling, B. and Liang, J. (2018). The nature of left-and right-dominant sandhi in Shanghai Chinese—Evidence from the effects of speech rate and focus conditions. Lingua.
  34. Liu, F. (2010). Single vs. double focus in English statements and yes/no questions. In Proceedings of Speech Prosody 2010, Chicago.
  35. McDonough, J., apos, Loughlin, J. and Cox, C. (2013). An investigation of the three tone system in Tsuut'ina (Dene). Proceedings of Meetings on Acoustics 133: 3571-3571.
  36. Nambu, S. and Lee, Y.-c. (2010). Phonetic Realization of Second Occurrence Focus in Japanese. In Proceedings of Interspeech 2010
  37. Ouyang, I. and Kaiser, E. (2012). Focus-marking in a tone language: Prosodic cues in Mandarin Chinese. In Proceedings of LSA2012
  38. Peters, J., Hanssen, J. and Gussenhoven, C. (2014). The phonetic realization of focus in West Frisian, Low Saxon, High German, and three varieties of Dutch. Journal of Phonetics 46(0): 185-209.
  39. Sherr-Ziarko, E. (2018). Prosodic properties of formality in conversational Japanese. Journal of the International Phonetic Association: 1-22.
  40. Shih, S.-h. (2018). On the existence of sonority-driven stress in Gujarati. Phonology 35(2): 327-364.
  41. Soderstrom, M., Ko, E.-S. and Nevzorova, U. (2011). It's a question? Infants attend differently to yes/no questions and declaratives. Infant Behavior and Development 34(1): 107-110.
  42. Simard, C., Wegener, C., Lee, A., Chiu, F. and Youngberg, C. (2014). Savosavo word stress: a quantitative analysis. In Proceedings of Speech Prosody 2014, Dublin: 512-514.
  43. Steinmetzger, K. and Rosen, S. (2015). The role of periodicity in perceiving speech in quiet and in background noise. Journal of the Acoustical Society of America 138(6): 3586-3599.
  44. Tompkinson, J. and Watt, D. (2018). Assessing the abilities of phonetically untrained listeners to determine pitch and speaker accent in unfamiliar voices. Language and Law= Linguagem e Direito 5(1): 19-37.
  45. Wong, P. (2012). Acoustic characteristics of three-year-olds' correct and incorrect monosyllabic Mandarin lexical tone productions. Journal of Phonetics 40: 141-151.
  46. Wu, W. L. (2009). Sentence-final particles in Hong Kong Cantonese: Are they tonal or intonational? In Proceedings of Interspeech 2009.
  47. Xing, L. and Xiaoxiang, C. (2016). The Acquisition of English Pitch Accents by Mandarin Chinese Speakers as Affected by Boundary Tones. In Proceedings of Speech Prosody 2016, Boston, USA: 956-960.
  48. Yan, M., Luo, Y. and Inhoff, A. W. (2014). Syllable articulation influences foveal and parafoveal processing of words during the silent reading of Chinese sentences. Journal of Memory and Language 75(0): 93-103.
  49. Yang, X. and Liang, J. (2012). Declarative and Interrogative Intonations by Brain-damaged Speakers of Uygur and Mandarin Chinese. In Proceedings of Speech Prosody 2012, Shanghai: 286-289.
  50. Zerbian, S. (2011). Intensity in narrow focus across varieties of South African English. In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 2268-2271.
  51. Zhang, J. and Meng, Y. (2016). Structure-dependent tone sandhi in real and nonce disyllables in Shanghai Wu. Journal of Phonetics 54: 169-201.
  52. Zhang, J. and Liu, J. (2011). Tone Sandhi and Tonal Coarticulation in Tianjin Chinese. Phonetica 68: 161-191.
  53. Zhang X. A Comparison of Cue-Weighting in the Perception of Prosodic Phrase Boundaries in English and Chinese. PhD dissertation, University of Michigan; 2012.
  54. Zhao, Y. and Jurafsky, D. (2009). The effect of lexical frequency and Lombard reflex on tone hyperarticulation. Journal of Phonetics 37(2): 231-247.
  55. Zhu, Y. and Mok, P. P. K. (2016). Intonational Phrasing in a Third Language – The Production of German by Cantonese-English Bilingual Learners. In Proceedings of Speech Prosody 2016, Boston, USA: 751-755.
  56. 髙橋 康徳 (2012). 上海語変調ピッチ下降部の音声実現と音韻解釈. コーパスに基づく言語学教育研究報告 No. 8, 51-72.
  57. 王玲、尹巧云、王蓓、刘岩 (2010). 德昂语布雷方言中焦点的韵律编码方式 [Prosodic focus in Bulei dialect of Deang]. Proceedings of The 9th Phonetics Conference of China (PCC2010), Tianjin.
  58. 尹巧云、王玲、杨文华、王蓓、刘岩 (2010). 德昂语中焦点和疑问语气在语调上的共同编码 [Parallel encoding of focus and interrogative modality in Deang]. Proceedings of The 9th Phonetics Conference of China (PCC2010).

* Before 2012: TimeNormalizeF0.praat

** All the F0 averaging is done on a logarithmic scale: mean_f0 = exp(sum(ln(f01-n)) / n)

*** The velocity profiles of F0 are generated according to:

F0' = (F0sti+1 – F0sti-1) / (ti+1 – ti-1)

which yields the discrete first derivatives of F0. The computation of velocity by every two points is known as central differentiation, and is commonly used in data analysis because of its speed, simplicity, and accuracy ( Bahill, A. T., Kallman, J. S. and Lieberman, J. E. (1982). Frequency limitations of the two-point central difference differentiation algorithm. Biological cybernetics 45: 1-4.)


Morton, E. W. (1977). On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. American Naturalist 111: 855-869.

Ohala, J. J. (1984). An ethological perspective on common cross-language utilization of F0 of voice. Phonetica 41: 1-16.

Xu, Y., Lee, A., Wu, W.-L., Liu, X. and Birkholz, P. (2013). Human vocal attractiveness as signaled by body size projection PLoS ONE 8(4): e62397.

Chuenwattanapranithi, S., Xu, Y., Thipakorn, B. and Maneewongvatana, S. (2008). Encoding emotions in speech with the size code -- A perceptual investigation. Phonetica 65: 210-230.

Noble, L. and Xu, Y. (2011). Friendly Speech and Happy Speech – Are they the same? In Proceedings of The 17th International Congress of Phonetic Sciences, Hong Kong: 1502-1505.

Liu, X. and Xu, Y. (2014). Body size projection and its relation to emotional speech—Evidence from Mandarin Chinese. Proceedings of Speech Prosody 2014, Dublin: 974-977.

Hsu, C. and Xu, Y. (2014). Can adolescents with autism perceive emotional prosody? Proceedings of Interspeech 2014, Singapore.

Yi's other tools