Audio-Visual Speech Processing for Robust Human-Computer Interaction
|Ομιλητής||Γεράσιμος Ποταμιανός, Μanager of the Multimodal Conversational Solutions Department Human Language Technologies,IBM T. J. Watson Research Center, USA.|
|Τίτλος||Audio-Visual Speech Processing for Robust Human-Computer Interaction|
|Ημερομηνία||Δευτέρα 25/02/2008, ώρα 15:00|
|Χώρος||Αμφιθέατρο Σαράτση, στο Κτίριο Δελμούζου, Παραλιακό Συγκρότημα Παπαστράτου|
|Διεύθυνση||Αργοναυτών και Φιλελλήνων, Βόλος|
Gerasimos (Makis) Potamianos received the Diploma degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece in 1988, and the M.S.E. and Ph.D. degrees in Electrical and Computer Engineering from the Johns Hopkins University, Baltimore, Maryland, in 1990 and 1994, respectively. His thesis work has focused on statistical models for image processing. During 1994-1996 he has been a Postdoctoral Fellow with the Center for Language and Speech Processing, and from 1996 to 1999 a Senior Member of Technical Staff with the Speech and Image Processing Services Laboratory at AT&T Labs-Research. In 1999, he joined the Human Language Technologies department at the IBM Thomas J. Watson Research Center as a Research Staff Member, where he is currently manager of the Multimodal Conversational Solutions Department. Makis’ research interests span the areas of multimodal speech processing and human-computer interaction with particular emphasis on audio-visual speech processing, automatic speech recognition, multimedia signal processing and fusion, as well as computer vision for human detection and tracking. Makis has published over 75 articles in these areas that have received over 400 citations and has six patents granted. He is a member of IEEE and a member of the Technical Chamber of Greece.
More information can be found at:
This talk will be structured in two parts. In the first half I will provide an overview of the activities in my group. The main emphasis will be placed on recent work conducted as part of three FP6 EU projects, namely CHIL, DICIT, and NETCARITY. Among them, CHIL – “Computers in the Human Interaction Loop”, is a recently completed technology driven integrated project focusing on the development of robust audio-visual perception technologies of human interaction during meetings and lectures inside smart rooms. The second part of the talk will delve more deeply into a specific class of audio-visual perceptual technologies, namely the problem of audio-visual speech processing with emphasis on automatic bimodal speech recognition. This line of work aims to exploit visual speech information to improve speech recognition robustness in noisy environments, in a process akin to human lipreading. I will discuss in detail my work in this field, with emphasis on visual feature extraction in realistic environments and ongoing research in the area of audio-visual fusion.