[an error occurred while processing this directive] [an error occurred while processing this directive]

Discriminatively learning factorized finite-state pronunciation models from dynamic Bayesian networks

Preethi Jyothi, 10/18/2013, 2-3pm, BI 2369

Standard pronunciation models are derived from existing dictionaries with limited means of addressing the various ways in which a single word can be pronounced in casual speech. These phone-based pronunciation models have been found to be lacking due to the coarseness of their atomic phone units; instead, modeling speech as multiple streams of linguistically motivated sub-phonetic features has been explored as an effective way of handling pronunciation variability. Our work explores one such model that uses dynamic Bayesian networks (DBNs) to relate the movements of a speaker's articulators (i.e lips, tongue, etc) to sounds produced in the form of loosely coupled feature streams. We present a general approach to transform such DBN models into a finite state representation that allows for more flexible models that can be further trained to improve the accuracy of the recognizer. Our experimental results using a phonetically transcribed subset of the Switchboard corpus show that the proposed approach performs significantly better than the original DBN model.

[an error occurred while processing this directive]