Speech-in-Noise Comprehension with DNN-Generated Talking Face

Overview

This project investigates how a deep-neural-network (DNN) system that synthesizes video of a talking face can supplement an acoustic-only speech signal to improve comprehension in noisy environments. This human-computer interaction study demonstrates that DNN-generated visual speech cues significantly improve speech comprehension in noise, with direct relevance to hearing aid noise-reduction feature evaluation.

Key Findings

Designed and evaluated an experiment showing that DNN-synthesized visual speech cues significantly improve speech comprehension in noise.
Measured improvement in human speech comprehension across different environmental noise levels and its interaction with neural network-generated visual cues.
Results demonstrate the potential of AI-generated visual augmentation as a tool for hearing assistance.

Shan, T., Wenner, C.E., Xu, C., Duan, Z. & Maddox, R.K. Speech-in-noise comprehension is improved when viewing a deep-neural-network-generated talking face. Trends in Hearing, 26 (2022). https://doi.org/10.1177/23312165221136934