Tuesday, January 24, 4:00-5:00, BI 2169 Organization
Tuesday, January 31, 4:00-5:00, BI 2169
No Meeting
Tuesday, February 7, 4:00-5:00, BI 2169 Xiang Kong: I want to talk about the landmark-based
consonant voicing detector and progress of landmark-based
ASR Xuesong and I did last semester. Anytime after
February(except March 7th) will be fine for me to present.
Here are the URLs related to my presentation.
(Stevens, 2002),
(Stevens and Klatt, 1974),
(Abdelatty Ali, van der Spiegel and Mueller, 2001)
Tuesday, February 14, 4:00-5:00, BI 2169
Mary Pietrowicz, Discovering Dimensions of
Perceived Vocal Expression in Semi-Structured, Unscripted Oral
History Accounts,'' ICASSP practice talk.
Tuesday, February 21, 4:00-5:00, BI 2169 Meeting Cancelled
Tuesday, February 28, 4:00-5:00, BI 2169 Wenda Chen: unsupervised and supervised learning
for clusters of zero-resource language data. In this
talk, I will firstly review the general unsupervised
learning techniques (PhD thesis, Kamper,
https://arxiv.org/pdf/1701.00851.pdf)
and the task of learning probabilistic transcriptions
from mismatched crowdsourcing data
(Jyothi
and Hasegawa-Johnson).
Then
I will continue from my own work (Chen
et. al. http://aclweb.org/anthology/W/W16/W16-3714.pdf
and discuss the recent approaches. Thanks a lot!
Tuesday, March 7, 4:00-5:00, BI 2169 Meeting cancelled
Tuesday, March 14, 4:00-5:00, BI 2169 Meeting cancelled
Tuesday, March 28, 4:00-5:00, BI 2169 Kaizhi Qian: I will talk about speech enhancement
using Wavenet, a deep generative neural network.
https://arxiv.org/pdf/1609.03499.pdf.
Tuesday, April 4, 4:00-5:00, BI 2169, CANCELLED
Tuesday, April 11, CANCELLED
Tuesday, April 18, 4:00-5:00, BI 2169 Yang Zhang, Title: RNN-TA: F0 Model with Semantics
Abstract: F0 models with deep learning structures are widely used in speech synthesis systems. The most common paradigm of these models is to fit F0 contour or its state-wise statistics directly. One limitation of this approach is that much of the memory and model power is used to capture local F0 movement, and little is left to capture the long-term semantic information encoded in F0. As a result, most F0 models mainly focus on modeling phonetic, lexical and syntactic information. The recently proposed RNN-TA provides a promising alternative. It frees the RNN's memory and modeling power by introducing the pitch target model for local F0 movement. This talk discusses our work that investigates RNN-TA's ability to capture long-term relation and semantic information by directly feeding word embedding features.
Tuesday, May 2, 4:00-5:00, BI 2169 Mary Pietrowicz: The function of laughter in the
veterans history corpus and its relationship to emotion,
voice quality, prosody, etc. How it functions within the
different expressive dimensions we have found. An LSA
paper would be useful: Thomas K. Landauer, Peter W. Foltz,
Darrell Laham, "An Introduction to Latent Semantic
Analysis," Discourse Processes, 25(2,3): 259-284, 1998.
Useful LSA Tutorial I found on the web: Alex
Thomo, Latent
Semantic Analysis (Tutorial). Papers on laughter
detection would be useful as background (but we will
probably do something different): Lakshmish Kaushik,
Abhijeet Sangwan, and John H.L. Hansen, "Laughter and
Filler Detection in Naturalistic Audio," INTERSPEECH 2015.
Monday, November 13, 4:00-5:00, BI 2169
Leda Sari, Speaker-Adaptive Fusion for
Audio-Visual Speech Recognition
Monday, November 27, 4:00-5:00, BI 2169 Amit
Das, End-to-End Automatic Speech Recognition