Research Committee | Current Projects

Laboratory reference data on long-term formant distributions

Michael Jessen, Bundeskriminalamt Wiesbaden, Department of Speaker Identification and Audio Analysis

During the past years an increasing amount of research has shown that formant frequencies contain important speaker-specific information and that formant analysis is feasible even under typical forensic conditions. Nolan and Grigoras (2005, International Journal of Speech, Language and the Law 12: 143-173) introduced Long Term Formant Distribution (LTF) as a specific way of capturing formant structure. While LTFs should not replace more traditional ways of measuring formants - where different vowels are identified and measured separately - LTF-analysis has a number of specific advantages. One advantage is that LTF can be applied to cases in languages not spoken by the expert, where vowel selection and segmentation for traditional formant analysis would be extremely difficult or impractical but where LTF analysis is still feasible - given a number of safety precautions.

Casework experience has shown that the degree of similarity in formant structure between different speech samples can be captured quite well with LTF analysis. However, not only the similarity aspect has to be captured in a voice comparison but it also needs to be known how rare or frequent different formant values are in a representative sample of the relevant population of speakers. Even for traditional individual-vowel-based formant analysis multispeaker data of this sort are rare, although large-scale studies are currently in progress, but the situation is worse for LTF analysis, which has only been introduced recently. Obtaining LTF reference data from a large number of speakers is the main goal of the current research project.

A secondary goal of this project is to determine whether a net duration threshold value of available speech material can be established beyond which LTFs are saturated, i.e. where more data within the same recording do not change the formant values significantly. Such a threshold value would help to determine how much case material is necessary in order to use LTF evidence in a reliable manner.

Another secondary goal is to study the effects of vowel space on LTF. From previous studies and casework experience it is known that slower/more careful speaking generally leads to more peripheral formant values than faster/less careful speaking. With LTF analysis the formant data for different vowel categories are mixed and it is not clear whether the notion of vowel peripherality can still be captured. One hypothesis is that speech with more peripheral vowel targets has larger standard deviation of (especially) F2 than speech with more central vowel targets. In order to test this hypothesis, data from read speech will be compared with data from spontaneous speech, where more peripherality is expected in the former than the latter speech style.

A related aspect to be investigated is the direct relation between speech tempo and vowel peripherality. Tsao et al. (2006; JASA 119: 1074-1082) present the interesting finding that although vowel space correlates with tempo for within-speaker comparisons, it does not for between-speaker comparisons. This is good news for forensic phonetics, since, if these results turn out to be stable, formant measurements and tempo measurements are essentially independent speaker parameters. For the speech data to be used in the current project, tempo measurements have been made in a separate study, so that it will be possible to correlate speaker differences in tempo with speaker differences in LTF values, especially F2 standard deviation.

The project is based on "Pool 2010", a labspeech database of 100 German-speaking men, which has proven useful in other forensically motivated investigations. The speech signals will be edited manually in a way that only vocalic portions with clear formant structure remain. LTF analysis is based on the results of the formant tracker of the Wavesurfer software which is applied to these edited data. Essentially, the outcome of this project will occur in the form of histograms of long-term F1, F2 and F3 among 100 speakers, as well as statistical results on the difference between the two speech conditions, on the saturation issue, and on the relation between speech tempo and LTF.