Meta launches new AI voice cloning tool

Proof-of-concept technology can also generate sound effects from text prompts

Social media giant Meta has released new AI-powered voice cloning tools, as part of a proof-of-concept for audio generation models being developed by researchers at the company.

The new Audiobox tool, created by teams within the Facebook AI Research division, is able to generate speech or sound effects based on text prompts, as well as cloning existing voices based on a short recorded sample. The program is a limited demo intended to illustrate the model’s full capabilities, intending to encourage developers to build additional technologies on top of it.

The research paper published alongside the model does not state where the data on which the model was trained - which includes 160,000 hours of speech, 20,000 hours of music and 6,000 hours of sound samples - was obtained. However, the company has detailed the safety and ethics measures that have been put in place to ensure responsible use.

“This demo implements several safety and integrity-based guardrails and filters,” the company said. “These include filtering prompts based on protected categories such as race, gender, sex, and religion, harmful associations, violence, and other sensitive topics. We also filter some location-based queries which are susceptible to stereotyping.”

“Moreover, a user is limited to modifying only their own voice when using this demo. Our voice verification process seeks to prevent adversarial uploads of voices other than the users' by displaying an arbitrary text phrase in the demo that a user must speak in order to upload their voice.”

Audio samples generated via the tool also contain an ‘audio watermark’ which is “mostly indistinguishable to the human ear”, but which can be detected by Facebook’s technology.

The model currently only supports English-language speech generation, and cannot be used for any commercial purposes. It also cannot be used in the states of Illinois or Texas.

AI development is rapidly gaining pace, and is increasingly being used within podcasting - although according to a recent poll, a significant proportion of podcasters still have apprehensions about its impact on the industry.