Convert face images into speech waveforms

AI News Clips by Morris Lee: News to help your R&D

1 min readJul 27, 2021

Convert face images into speech waveforms

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
arXiv paper abstract https://arxiv.org/abs/2107.12003
arXiv PDF paper https://arxiv.org/pdf/2107.12003.pdf
Project Web page https://realanonymousiccv.github.io/

… synthesize speaker-specific speech waveforms by conditioning on videos of an individual’s face.

… method directly converts face images into speech waveforms under an end-to-end training framework.

The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model.

… can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images.

Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model.

We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. …

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Convert face images into speech waveforms

Written by AI News Clips by Morris Lee: News to help your R&D

No responses yet