《微软与你同行》学术报告会

暂时无用栏目

对外交流

您的位置 : 首页暂时无用栏目交流合作对外交流

《微软与你同行》学术报告会

发布日期：2009-11-23 阅读次数：5741

《微软与你同行》学术报告会

如真似幻会说话的人头

-- 巧妙应用数据与其统计模型

Creating a Photo-Realistic Talking Head

-- The magic touch of data and its statistical model

时间：2009年11月24日星期二晚上7：00

地点：信电楼215会议室（图书馆左侧上行）

报告人： Dr. Frank Soong

Manager, Speech Research Group

Microsoft Research Asia, Beijing

l Co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package

l Associate Editor, IEEE Speech and Audio Transactions

l Co-Director of the MSRA-CUHK Joint Research Lab, visiting professor of the Chinese University of Hong Kong (CUHK)

l More than 200 paper publications, co-editor for “Automatic Speech and Speech Recognition- Advanced Topics,” Kluwer, 1996

内容摘要：

A universal principle, where a statistics embodied in a Maximum Likelihood (ML) Hidden Markov Model (HMM) is used for searching the optimal sequence of real samples (examples), to creating a high quality, Text-to-Speech (TTS) system and a photo-realistic talking head. Both systems have many human-machine interface applications, e.g. reading email, news stories, or eBooks, acting as an intelligent agent, a language learning tutor, etc. and a lively, lip-sync talking head can make such an interface even more engaging. In this talk we will concentrate more on the photo-realistic talking head which consists of two parts, training and synthesis. In training, an audio/visual database is first collected to train statistical HMM. In synthesis, trained HMM is used to generate (synthesize) smooth mouth trajectories from given natural or Text-to-Speech (TTS) synthesized speech. The rendered talking head is photo-realistic. Additionally, facial expressions like eye blinking/gazing, eyebrow movement, smiles or other emotions can be learned and rendered with collected and labeled data.

微软亚洲研究院，浙江大学信电系博士生会联合举办