确定 取消 应用
对外交流

《微软与你同行》学术报告会

发布日期 :2009-11-23    阅读次数 :5737

《微软与你同行》学术报告会

如真似幻会说话的人头

-- 巧妙应用数据与其统计模型

Creating a Photo-Realistic Talking Head

-- The magic touch of data and its statistical model

 

时间:20091124日星期二晚上700

地点:信电楼215会议室(图书馆左侧上行)

报告人:     Dr. Frank Soong

Manager, Speech Research Group

Microsoft Research Asia, Beijing

 

l        Co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package

l        Associate Editor, IEEE Speech and Audio Transactions

l        Co-Director of the MSRA-CUHK Joint Research Lab, visiting professor of the Chinese University of Hong Kong (CUHK)

l        More than 200 paper publications, co-editor for “Automatic Speech and Speech Recognition- Advanced Topics,” Kluwer, 1996

 

内容摘要:

 

A universal principle, where a statistics embodied in a Maximum Likelihood (ML) Hidden Markov Model (HMM) is used for searching the optimal sequence of real samples (examples), to creating a high quality, Text-to-Speech (TTS) system and a photo-realistic talking head. Both systems have many human-machine interface applications, e.g. reading email, news stories, or eBooks, acting as an intelligent agent, a language learning tutor, etc. and a lively, lip-sync talking head can make such an interface even more engaging. In this talk we will concentrate more on the photo-realistic talking head which consists of two parts, training and synthesis. In training, an audio/visual database is first collected to train statistical HMM. In synthesis, trained HMM is used to generate (synthesize) smooth mouth trajectories from given natural or Text-to-Speech (TTS) synthesized speech. The rendered talking head is photo-realistic. Additionally, facial expressions like eye blinking/gazing, eyebrow movement, smiles or other emotions can be learned and rendered with collected and labeled data.

 

微软亚洲研究院,浙江大学信电系博士生会联合举办