HTK Study Group, 1/2006
This page posts configuration files and a training script that can be
used, together with the HTK Tools, in order to create a Switchboard
telephone-band conversational speech recognizer
with at least 50% WRA. Informal lectures describing the steps in
development of this speech recognizer are posted
on the Wiki.
In order to run these tools, you will need:
- The
MS98 Switchboard transcriptions.
- Switchboard speech data is available from the LDC; if your institution is an LDC member, then I am allowed to send you speech data for the 12hour sub-corpus.
- The training script, train.pl.
- The dictionary, ISLE_03Jan2006.dict.
- train.pl will create all other configuration files for you. In case you
wish to debug the script, however, or in case you wish to write your own
HTK training scripts, you may want a a TGZ of the configuration files.
- Here is the log file produced by train.pl when I ran it.
A few of the
features are as follows. The percentage (absolute) of word
recognition accuracy supplied by each of these features is listed in
parentheses. These percentages are based on the development chain
outlined in train.pl; this is not a carefully
controlled scientific experiment.
- Features
- Clustered allophone acoustic models (worth about 10% absolute)
- Lexical stress as a QS context variable (worth about 7% absolute)
- Split from 1 to 5 Gaussians per state (worth about 8% absolute)
- Unsupervised MLLR adaptation to target talker (worth about 1%
absolute)
Page created by Mark Hasegawa-Johnson, jhasegaw at uiuc.