Convert face images into speech waveforms
Convert face images into speech waveforms
Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
arXiv paper abstract https://arxiv.org/abs/2107.12003
arXiv PDF paper https://arxiv.org/pdf/2107.12003.pdf
Project Web page https://realanonymousiccv.github.io/
… synthesize speaker-specific speech waveforms by conditioning on videos of an individual’s face.
… method directly converts face images into speech waveforms under an end-to-end training framework.
The linguistic features are extracted from lip movements using a lip-reading model, and the speaker characteristic features are predicted from face images using cross-modal learning with a pre-trained acoustic model.
… can flexibly synthesize speech waveforms whose speaker characteristics vary depending on the input face images.
Therefore, our method can be regarded as a multi-speaker face-to-speech waveform model.
We show the superiority of our proposed model over conventional methods in terms of both objective and subjective evaluation results. …
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b