Fall 2017 SST Group Meetings
- Monday, November 27, 4:00-5:00, BI 2169
Amit
Das, End-to-End Automatic Speech Recognition
- Monday, November 13, 4:00-5:00, BI 2169
Leda Sari, Speaker-Adaptive Fusion for
Audio-Visual Speech Recognition
Spring 2017 SST Group Meetings
- Tuesday, January 24, 4:00-5:00, BI 2169
Organization
- Tuesday, January 31, 4:00-5:00, BI 2169
No Meeting
- Tuesday, February 7, 4:00-5:00, BI 2169
Xiang Kong: I want to talk about the landmark-based
consonant voicing detector and progress of landmark-based
ASR Xuesong and I did last semester. Anytime after
February(except March 7th) will be fine for me to present.
Here are the URLs related to my presentation.
(Stevens, 2002),
(Stevens and Klatt, 1974),
(Abdelatty Ali, van der Spiegel and Mueller, 2001)
- Tuesday, February 14, 4:00-5:00, BI 2169
Mary Pietrowicz, Discovering Dimensions of
Perceived Vocal Expression in Semi-Structured, Unscripted Oral
History Accounts,'' ICASSP practice talk.
- Tuesday, February 21, 4:00-5:00, BI 2169
Meeting Cancelled
- Tuesday, February 28, 4:00-5:00, BI 2169
Wenda Chen: unsupervised and supervised learning
for clusters of zero-resource language data. In this
talk, I will firstly review the general unsupervised
learning techniques (PhD thesis, Kamper,
https://arxiv.org/pdf/1701.00851.pdf)
and the task of learning probabilistic transcriptions
from mismatched crowdsourcing data (Jyothi and
Hasegawa-Johnson,
http://www.isle.illinois.edu/sst/pubs/2014/jyothi14aaai.pdf). Then
I will continue from my own work (Chen
et. al. http://aclweb.org/anthology/W/W16/W16-3714.pdf
and discuss the recent approaches. Thanks a lot!
- Tuesday, March 7, 4:00-5:00, BI 2169
Meeting cancelled
- Tuesday, March 14, 4:00-5:00, BI 2169
Meeting cancelled
- Tuesday, March 28, 4:00-5:00, BI 2169
Kaizhi Qian: I will talk about speech enhancement
using Wavenet, a deep generative neural network.
https://arxiv.org/pdf/1609.03499.pdf.
- Tuesday, April 4, 4:00-5:00, BI 2169, CANCELLED
- Tuesday, April 11, CANCELLED
- Tuesday, April 18, 4:00-5:00, BI 2169
Yang Zhang, Title: RNN-TA: F0 Model with Semantics
Abstract: F0 models with deep learning structures are widely used in speech synthesis systems. The most common paradigm of these models is to fit F0 contour or its state-wise statistics directly. One limitation of this approach is that much of the memory and model power is used to capture local F0 movement, and little is left to capture the long-term semantic information encoded in F0. As a result, most F0 models mainly focus on modeling phonetic, lexical and syntactic information. The recently proposed RNN-TA provides a promising alternative. It frees the RNN's memory and modeling power by introducing the pitch target model for local F0 movement. This talk discusses our work that investigates RNN-TA's ability to capture long-term relation and semantic information by directly feeding word embedding features.
- Tuesday, April 25, 4:00-5:00, BI 2169
Leda Sari: Audiovisual speech recognition using
CNN+HMM: (Mroueh,
Marcheret and Goel, 2015). Speaker normalization
using fMLLR: Mark
Gales, Maximum
Likelihood Linear Transformations for HMM-Based Speech
Recognition.
- Tuesday, May 2, 4:00-5:00, BI 2169
Mary Pietrowicz: The function of laughter in the
veterans history corpus and its relationship to emotion,
voice quality, prosody, etc. How it functions within the
different expressive dimensions we have found. An LSA
paper would be useful: Thomas K. Landauer, Peter W. Foltz,
Darrell Laham, "An Introduction to Latent Semantic
Analysis," Discourse Processes, 25(2,3): 259-284, 1998.
Useful LSA Tutorial I found on the web: Alex
Thomo, Latent
Semantic Analysis (Tutorial). Papers on laughter
detection would be useful as background (but we will
probably do something different): Lakshmish Kaushik,
Abhijeet Sangwan, and John H.L. Hansen, "Laughter and
Filler Detection in Naturalistic Audio," INTERSPEECH 2015.
2017S |
2016F |
2016S |
2015F |
2015S |
2014F |
2014S |
2013F