The beta version ( 6.1.3 ) allows users to leave individual sentences in the original recording, without extracting them to separate sound files. However, the beta version has not been updated lately and so lacks many of the new features of the standard version. Please use it with caution.
An interactive Praat script that allows you to:
Get accurate f0 tracks using a method that combines automatic vocal pulse marking by Praat, manual correction by yourself, a trimming algorithm that removes spikes and sharp edges (cf. Appendix 1 in Xu 1999), and a triangular smoothing function
Get continuous f0 velocity (= first derivative of f0) curves (for labeled intervals only)
Segment and label intervals for each sound (.wav) file
Cycle through all sound files in a folder without using menu commands
Get time-normalized f0 (for labeled intervals only) (cf. Xu 1997), f0 velocity and intensity. Useful if you want to plot these curves averaged** across multiple repetitions of the same word or sentence
Get time-normalized f0, f0 velocity and intensity with original time preserved (cf. Xu & Xu 2005). Useful if you want to plot these curves with averaged original time for each interval
Get rectified, trimmed f0 as PitchTier objects which can replace the pitch tier in Manipulation objects
Get sampled f0 (for labeled intervals only) -- f0 at fixed time intervals as determined by F0_sample_rate (number of points per second)
Get maxf0, minf0, excursionsize(st), meanf0, mean intensity, duration, max velocity, final velocity, final f0, meanintensity, Maxf0_loc_ms and Maxf0_loc_ratio from each labeled interval
Get results in ensemble files: normf0.txt, normIntensity, samplef0.txt, f0velocity.txt, maxf0.txt, minf0.txt, excursionsize.txt meanf0.txt, maxvelocity.txt, duration.txt, finalvelocity.txt, finalf0.txt, meanintensity.txt, maxf0_loc_ms.txt and maxf0_loc_ratio.txt
Get mean_normf0.txt, which contains meanf0 contours averaged** across repetitions of identical sentences
Get mean_normf0_cross_speaker.txt, which contains meanf0 contours averaged** across identical sentences produced by multiple speakers
Motivation and brief history
ProsodyPro is developed as a convenient tool for our own research. It allows us to systematically process large amount of speech data with high precision. It has maximally reduced the amount of human labor by automating tasks that do not require human judgment, such as locating and opening sound files, taking measurements, and saving raw results in formats ready for further graphical and statistical analysis. On the other hand, it also allows human intervention of processes that are prone to error in automatic algorithms such as pitch detection and segmentation.
The f0 trimming and time-normalization algorithms, which are part of the core of the script, were developed in my PhD research (Xu 1993), which were then implemented in a C program working in conjunction with xwaves, which, like Praat, generates automatic vocal cycle markings and saves most of the human labor in marking the cycles manually as done in my dissertation. The arrival of Praat, thanks to the brilliant invention of Paul Boersma and David Weenink, makes it possible to put these algorithms together in a single script that can run on all major computer platforms. It also solved the problem of having to write a different C program for each new experiment.
The first version of the script was made public in 2005. Since then it has been used in a growing number of research projects. Some are listed here.
Why time-normalization? -- Justifications you may need when responding to questions
First, time-normalized contours are generated only for the purpose of making graphical comparisons. The specific measurements also generated by ProsodyPro, such as maxf0, minf0, meanf0, etc., are all taken from non-time-normalized contours. So, nothing is lost when time-normalized contours are presented in addition to the specific measurements.
Second, in the common practice of reporting only specific measurements, the readers are always left wondering what the rest of F0 contours might look like. There are of course occasional presentation of full f0 contours of exemplary sentences, but one can never be sure how representative they are. Time-normalization allows the averaging of f0 contours across repetitions and even speakers, thus removing most of the random variations while retaining full details of continuous f0 contours, leaving little to guesswork. If there is a concern that averaging across speakers may hide individual differences, one can always decide to average only across repetitions and present speaker-specific contours separately.
Third, a major advantage of time-normalization is that it allows us to clearly see the locations and manners of the maximum differences between experimental conditions by plotting the mean f0 contours in overlaid graphs, like those shown below. This in turn allows us to find measurements that potentially best reflect the real differences between experimental conditions.
Fourth, it is not the case that time-normalization carries more assumptions than other forms of data presentation. When reporting only a single measurement, say, maxf0, from a syllable or word, the assumption is that maxf0 is fully representative of the f0 of that syllable or word. If measurements are taken from fixed or relative time points in a domain, e.g., in the middle, near the beginning and the end, or three points evenly spaced in the domain, this is also time-normalization. But this gives us the maximum time resolution of 1-3 points per domain. The default time resolution in ProsodyPro, in contrast, is 10-points per domain.
Finally, time-normalization does carry assumptions, of course. When using the syllable as the domain of normalization, for example, the assumption is that speakers produce syllable-sized contours consistently (see Xu & Wang 2001 and Xu & Liu 2006 for some empirical basis). But ProsodyPro also allows the use of other units, e.g., words, or even phrase, as the normalization domain, if the assumption is indeed that speakers produce word-sized or phrase-sized f0 contours consistently.
Put ProsodyPro.praat in the folder containing the sound files to be analyzed, and launch Praat;
Select Open Praat Script... from the top menu;
Locate ProsodyPro.praat in the dialogue window and select it;
When the script window opens in Praat, select Run from the Run menu (or type the key shortcut command-r or control-r);
In the startup window, check or uncheck the boxes according to your need, and set appropriate values in the text fields or simply use the default values. Select the task by checking the appropriate radio button.
Click OK and three windows will appear. The first window (PointProcess) displays the waveform together with vocal cycle marks (vertical lines) generated by Praat. This is where you can manually add the missing marks and delete the redundant ones. You need to do this only for the named intervals, as explained next.
The second window (TextGrid) displays the waveform and spectrogram of the current sound together with optional pitch track and formant tracks in the spectrogram panel, and vocal pulse marks in the waveform panel. (These tracks and marks cannot be manually changed. So you can hide them to reduce processing time by using the corresponding menu.)
At the bottom of this window are two TextGrid tiers, where you can insert interval boundaries (Tier 1) and add comments (Tier 2). For any interval that you want to have results saved, a label in Tier 1 is required. The label can be as simple as a, b, c or 1, 2, 3.
The third window (Pause) allows you to control the progression of the analysis. To bring up the next sound to be analyzed, change the number (or leaving it as is) in the current_file box and press "Continue". The number indicates the order in the String object "list" in the Object window (a hardcopy is also saved in the current folder). The next sound will be 1 + current_file (So, type 0 to open sound 1).
To end the progression of the current analysis session, press "Finish" in the Pause window, and the last sound analyzed will be shown in the Praat Info window. You can use that number as a starting point in you next analysis session.
After processing individual files, you can run the script again to get ensemble files by checking the third radio button from the top.
You can also change various parameter after processing individual files by runing the script again with the radio button "Process all sounds without pause" checked. Just watch the script run through all the files on its own.
You can also generate mean normf0 contours averaged** across repetitions of identical sentences. To do this, set the value of Nrepetitions in the opening window according to the number of repetitions in your data set when you run the script with the "Get ensemble files" button checked. Make sure that the number of labeled intervals are identical across the repetitions.
To force ProsodyPro to skip extra repetitions, you need to check "Ignore extra repetition" and also name your sound files with a final digit that indicates repetition.
To average across unequal number of repetitions, you can create a text file (default = repetition_list.txt) in which sound-file names are listed in a single column, with blank lines separating the repetition groups. You can create this file by renaming "FileList.txt" that is always generated by ProsodyPro, and then modifying it by inserting blank lines and deleting sounds that you want to exclude. Note that deleting sound names in this file allows you to skip sounds that you want to exclude in your final analysis.
You can also generate mean normf0 contours averaged** across speakers. To do this, first create a text file (speaker_folders.txt) containing the speaker folder names arranged in a single column. Then run ProsodyPro with the 4th task--Average across speakers--checked. The script will read mean_normf0.txt from all the speaker folders, average the f0 values on a logarithmic scale, and then convert them back to Hz. The grand averages are saved in "mean_normf0_cross_speaker.txt". In the Start window, you also need to tell ProsodyPro where the speaker folder file is. The default location is the current directory: "./". If it is in an upper directory, you should enter "../"
By far the most common problems are caused by special symbols like spaces and hyphens in your file names or file paths (folder names). Please make sure your file and folder names consist of only letters, numbers and underscore, e.g., my_sounds/speaker1/sentence_A1.wav.
Each time you press "Continue" in the Pause window, various analysis results are saved for the current sound as text files:
X.rawf0 (Hz) -- raw f0 with real time computed directly from the pulse markings
X.f0 (Hz) -- smoothed f0 with the trimming algorithm (Xu, 1999)
X.samplef0 (Hz) -- f0 values at fixed time intervals specified by "f0 sample rate"
X.smoothf0 (Hz) -- samplef0 f0 smoothed by a triangular window
X.timenormf0 (Hz) -- time-normalized f0. The f0 in each interval is divided into the same number of points (default = 10).
X.timenormIntensity (dB) -- time-normalized intensity. The intensity in each interval is divided into the same number of points (default = 10).
X.actutimenormf0 (Hz) -- time-normalized f0 with each interval divided into the same number of points (default = 10). But the time scale is the original, except that the onset time of interval 1 is set to 0, unless the "Set initial time to 0" box in the startup window is unchecked.
X.f0velocity (semitones/s) -- velocity profile (instantaneous rates of F0 change) of f0 contour in semitone/s at fixed time intervals specified by "f0 sample rate" ***
X.means -- Containing the following values (in the order of the columns):
maxf0 (Hz)
minf0 (Hz)
meanf0 (Hz)
excursion size (semitones)
finalf0 (Hz) -- Indicator of target height (taken at a point specified by "Final offset" in the startup window)
maxf0_loc_ratio -- Relative location of the f0 peak as a proportion to the duration of the interval
If you want to change certain analysis parameters after processing all the sound files, you can rerun the script, set the "Input File No" to 1 in the startup window and check the button "Process all sounds without pause" before pressing "OK". The script will then run by itself and cycle through all the sound files in the folder one by one.
After the analysis of all the individual sound files are done, you can gather the analysis results into a number of ensemble files by running the script again and checking the button "Get ensemble results" in the startup window. The following ensemble files will be saved:
normf0.txt (Hz)
normtime_semitonef0.txt (semitones)
normtime_f0velocity.txt (semitones/s)
normtimeIntensity.txt (dB)
normactutime.txt (s)
maxf0.txt (Hz)
minf0.txt (Hz)
excursionsize.txt (semitones)
meanf0.txt (Hz)
duration.txt (ms)
maxvelocity.txt (semitones/s)
finalvelocity.txt (semitones/s)
finalf0.txt (Hz)
meanintensity.txt (dB)
samplef0.txt (Hz)
f0velocity.txt (semitones/s)
maxf0_loc_ms.txt (ms)
maxf0_loc_ratio.txt (ratio)
If Nrepetitions > 0, the following files will also be saved:
mean_normf0.txt (Hz)
mean_normtime_semitonef0.txt (semitones)
mean_normtime_f0velocity.txt (semitones/s)
mean_normtimeIntensity.txt (dB)
mean_normactutime.txt (s)
mean_maxf0.txt (Hz)
mean_minf0.txt (Hz)
mean_excursionsize.txt (semitones)
mean_meanf0.txt (Hz)
mean_duration.txt (ms)
mean_maxvelocity.txt (semitones/s)
mean_finalvelocity.txt (semitones/s)
mean_finalf0.txt (Hz)
mean_meanintensity.txt (dB)
mean_maxf0_loc_ms.txt (ms)
mean_maxf0_loc_ratio.txt (ratio)
If Task 4 "Average across speakers" is selected, the following file will also be saved:
H1-A1 (dB) -- Amplitude difference between 1st harmonic and 1st formant
H1-A3 (dB) -- Amplitude difference between 1st harmonic and 3rd formant
cpp -- Cepstral Peak Prominence (Hillenbrand et al., 1994)
center_of_gravity (Hz) -- Spectral center of gravity
Hammarberg_index (dB) -- Difference in maximum energy between 0-2000 Hz and 2000-5000 Hz
energy_below_500Hz (dB) -- Energy of voiced segments below 500Hz
energy_below_1000Hz (dB) -- Energy of voiced segments below 1000Hz
Formant_dispersion1_3 (Hz) -- Average distance between adjacent formants up to F3
F_dispersion1_5 (Hz) -- Average distance between adjacent formants up to F5
median_pitch (Hz) -- Median pitch in Hertz
jitter -- Mean absolute difference between consecutive periods, divided by mean period
shimmer -- Mean absolute difference between amplitudes of consecutive periods, divided by mean amplitude
harmonicity (dB) -- Harmonics-to-Noise Ratio (HNR): The degree of acoustic periodicity
energy_porfile (dB) -- Fifteen signal energy values computed from overlapping spectral bands of 500-Hz bandwidth: 0–500, 250–750, 500–1000, ... 3250–3750, 3500–4000
Note that you can generate the ensemble files only if you have analyzed at least one sound following the steps described earlier.
The following examples show how functional contrasts can be easily brought out by time-normalized f0 contours, whether plotted on normalized time or mean time.
_ _
_ _ _ _ _ _ _ _ _ _
Data from Xu (1999)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Data from Xu & Xu (2005)
** All the F0 averaging is done on a logarithmic scale: mean_f0 = exp(sum(ln(f01-n)) / n)
*** The velocity profiles of F0 are generated according to:
F0' = (F0sti+1 – F0sti-1) / (ti+1 – ti-1)
which yields the discrete first derivatives of F0. The computation of velocity by every two points is known as central differentiation, and is commonly used in data analysis because of its speed, simplicity, and accuracy (
Bahill, A. T., Kallman, J. S. and Lieberman, J. E. (1982). Frequency limitations of the two-point central difference differentiation algorithm. Biological cybernetics 45: 1-4.)
