《微软与你同行》学术报告会
如真似幻会说话的人头
-- 巧妙应用数据与其统计模型
Creating a Photo-Realistic Talking Head
-- The magic touch of data and its statistical model
时间:2009年11月24日星期二晚上7:00
地点:信电楼215会议室(图书馆左侧上行)
报告人: Dr. Frank Soong
Manager, Speech Research Group
Microsoft Research Asia, Beijing
l Co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package
l Associate Editor, IEEE Speech and Audio Transactions
l Co-Director of the MSRA-CUHK Joint Research Lab, visiting professor of the Chinese University of Hong Kong (CUHK)
l More than 200 paper publications, co-editor for “Automatic Speech and Speech Recognition- Advanced Topics,” Kluwer, 1996
内容摘要:
A universal principle, where a statistics embodied in a Maximum Likelihood (ML) Hidden Markov Model (HMM) is used for searching the optimal sequence of real samples (examples), to creating a high quality, Text-to-Speech (TTS) system and a photo-realistic talking head. Both systems have many human-machine interface applications, e.g. reading email, news stories, or eBooks, acting as an intelligent agent, a language learning tutor, etc. and a lively, lip-sync talking head can make such an interface even more engaging. In this talk we will concentrate more on the photo-realistic talking head which consists of two parts, training and synthesis. In training, an audio/visual database is first collected to train statistical HMM. In synthesis, trained HMM is used to generate (synthesize) smooth mouth trajectories from given natural or Text-to-Speech (TTS) synthesized speech. The rendered talking head is photo-realistic. Additionally, facial expressions like eye blinking/gazing, eyebrow movement, smiles or other emotions can be learned and rendered with collected and labeled data.
微软亚洲研究院,浙江大学信电系博士生会联合举办