soft@UIUC
Our central location is
the UIUC-SST
Github Group.
Mark supports
the UIUC
G2Ps and
associated phonecode
converters which are ports of work started
at Jelinek
WS15. He is also trying to
port image2speech
from the
Jelinek WS17
group; he's got the first eight stages or so (of
perhaps eleven) working.
Camille
supports ASR24
as described in
our ArXiV
paper. He also
supports MCASR
and PTgen
based on the work of
the Jelinek WS15
group. He also
wrote timeliner,
which is kind of like Audacity, but with way more
channels.
Kaizhi
supports AutoVC
repo based on
his ICML
paper.
Yijia wrote the
infant-vocalize
classifier.
Data Formats
Acoustic Features
Acoustic Modeling
Pronunciation Modeling
-
Arthur Kantor and Mark
Hasegawa-Johnson, HMM-based
Pronunciation Dictionary Generation, Workshop on New
Tools and Methods for Very Large Scale Phonetics
Research, University of Pennsylvania, Jan. 2011 (NSF
0703624,
0913188;
Wiki page
,
Software).
-
Arthur
Kantor, Pronunciation
modeling for large vocabulary speech recognition,
Ph.D. Thesis 2010, University of Illinois (NSF 0703624,
0913188; Software).
Prosody
-
Jennifer Cole, Timothy Mart, and Jose
I. Hualde, Listening
for sound, listening for meaning: Task effects on
prosodic transcription, Speech Prosody 2014, Dublin,
May 2014
(LMEDS
software)
-
Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, and
Jennifer Cole,
A
Maximum Likelihood Prosody Recognizer, SpeechProsody 2004, Nara,
Japan, March 2004, 509-512 (NSF 0132900; Illinois CRI;
Software).
-
Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah
Borys, and Jennifer
Cole,
Prosody Dependent Speech Recognition with Explicit
Duration Modelling at Intonational Phrase
Boundaries. Interspeech, September, 2003, 393-396
(Illinois CRI;
Software diffs,
TGZ,
ZIP)
Multimedia Analytics
- Shengze Wang,
Interlocutor-aware AAC
-
Camille Goudeseune,
2012. Effective
browsing of long audio recordings. ACM
International Workshop on Interactive Multimedia on
Mobile and Portable Devices, 2012 (NSF
0807329; Software
on
github, Software
as a TGZ).
-
David Cohen, Camille Goudeseune and Mark
Hasegawa-Johnson. 2009.
Efficient Simultaneous Multi-Scale
Computation of FFTs. Technical report GT-FODAVA-09-01 (NSF
0807329; Software).
-
Mark Hasegawa-Johnson, Jul Cha, Shamala Pizza and
Katherine
Haker,
CTMRedit: A case study in human-computer interface
design, International Conference On Public
Participation and Information Tech., Lisbon,
pp. 575-584, 1999 (NIH DC0032301;
Software).
-
Robin Bargar, Insook Choi, Sumit Das, Camille
Goudeseune. 1994.
Model-based interactive sound for an immersive virtual
environment. Proc. Intl. Computer Music Conf., 471-474, Aarhus,
Denmark. (software,
tutorial)
Audio Enhancement
-
Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson and Paris
Smaragdis, ``Joint Optimization of Masks and Deep
Recurrent Neural Networks for Monaural Source
Separation,'' IEEE Trans. Audio, Speech and Language
Processing, accepted for publication
(Software, Examples).
-
Po-Sen Huang, Scott Deeann Chen, Paris Smaragdis, and Mark
Hasegawa-Johnson,
Singing-Voice Separation from Monaural Recordings
using Robust Principal Component Analysis,
ICASSP 2012 (ARO W911NF-09-1-0383)
(
Software
,
Examples
)
-
Lae-Hoon Kim, Kyung-Tae Kim, and Mark
Hasegawa-Johnson, Robust
Automatic Speech Recognition with Decoder Oriented Ideal
Binary Mask Estimation, Proceedings of Interspeech
2010 pp. 2066-2069 (NSF
0913188; software)
Voice Activity Detection
Formant Tracking