A new GamesBeat event is just around the corner! Learn more about what’s next.
Roadrunner, the documentary about Anthony Bourdain, contains a scene in which the epicurean utters words from letters he wrote to the artist David Choe. This would not be unusual in itself, were it not for the fact that Bourdain never read the letters. Rather, the clips were generated by a company that director Morgan Neville hired to model Bourdain’s voice.
Synthetic media, or AI-generated likes and voices, has almost crossed the unsettling valley. Earlier this month, Sonantic, a UK-based company that clones voices for actors and studios, released an AI-generated recording of a voice inspired by actor Val Kilmer. An imitation of Kilmer’s natural voice, which he lost after throat cancer surgery in 2015, faithfully reflects the actor’s intonation.
The rise of synthetic media has raised concerns about deepfakes, or AI-generated media used for fraud and other criminal activities. Ethical questions abound: voice in Roadrunner It was created without Bourdain’s permission. But if used responsibly, synthetic media has the potential to cut costs while allowing actors to focus on more interesting work.
To create synthetic voices and videos, companies use a combination of artificial intelligence and machine learning techniques, including generative confrontational networks (GANs). GANs are two-part machine learning models that consist of a generator that creates samples and a discriminated which attempts to differentiate between these samples and those in the real world. High-performance GANs can create realistic portraits of people who don’t exist, or even snapshots of fictional apartment buildings.
It only takes a few seconds or minutes for AI to mimic a person’s prosody. Baidu’s latest Deep Voice service can clone a voice with just 3.7 seconds of audio samples, and WellSaid Labs, which was launched as a research project at the Allen Institute for Artificial Intelligence, can create a 10-second audio file at starting with approximately 4 seconds of speech.
As R&D refines technology and becomes more scalable, media synthesis is transforming from a novelty to an expanding market. Companies like Amazon, Microsoft, Papercup, Deepdub, and Synthesia have created projects like ad campaigns featuring an AI-generated Snoop Dogg and David Beckham’s voice translated into nearly a dozen languages. They have also partnered with news organizations such as Sky News, Discovery, and Reuters to develop prototype for automated news and sports reports.
Synthetic media platforms provide different capabilities depending on your approach. For example, Synthesia allows clients to choose from a variety of “voice avatars” and create voiceovers directly from a script, with one or more voices depending on style, genre, and type of production. On the other hand, Amazon pairs customers with its engineers to build AI-generated voices that represent certain people.
Startups like Alethea AI, Genies, and Possible Reality fall into a separate category of synthetic media generation. From just a few images, your tools can generate high-fidelity, expressive, and photorealistic avatars. Possible Reality is leveraging its technology to turn images of people into 3D avatars within video games and virtual worlds. And Genies is generating 2D avatars in the form of celebrity cartoons for social media.
Challenges and opportunities
As pandemic restrictions make conventional filming complicated and risky, the benefits of AI-generated video have been magnified. According to Dogtown Media, a business education campaign under normal circumstances could require up to 20 different scripts to target a global workforce, with each video costing tens of thousands of dollars. Synthetic media can reduce expenses to a lump sum of around $ 100,000.
Brand voices like Progressive Flo, played by comedian Stephanie Courtney, are often tasked with recording phone trees for interactive voice response systems or e-learning scripts for corporate training videos. Synthesization could boost actors’ productivity by reducing ancillary recordings and captures (recording sessions to address errors, changes, or additions to voice-over scripts) while freeing them up for creative work and allowing them to collect waste.
Additionally, synthetic media platforms give creators, product developers, and brands the ability to power experiences with a wide range of voice styles, accents, and languages. Likeness CEO Zohaib Ahmed envisions game developers creating actor voices during pre-production for scratching and iteration, as well as voices tailored to suit a character’s personality and the sonic tastes of voice assistants and apps. .
There is also the translation aspect. Because quality dubbing is prohibitively expensive – Dear for a 90 minute program, they range from $ 30,000 to $ 100,000; Most of the world’s videos have been recorded in a single language. (In the first week of 2019, 33% of popular YouTube videos were in English.) Statista found that 59% of American adults said they would rather watch foreign language movies dubbed into English than watch the original feature with subtitles, highlighting the demand. for synthetic media translation technologies.
Experts have raised concerns that synthetic media tools could be co-opted to create deepfakes; the fear is that these forgeries could be used to do things like sway opinion during an election or implicate a person in a crime. Deepfakes have already been abused to generate pornographic material of actors and defraud a great deal of energy. producer.
Fighting deepfakes is likely to remain a challenge, especially as media generation techniques continue to improve. Earlier this year, fake Tom Cruise images posted to an unverified TikTok account racked up 11 million views on the app and millions more on other platforms. And when scanning through several of the best publicly available deepfake detection tools, them discovery avoided, according to Vice.
Some companies have taken steps to prevent misuse of their platforms. For example, Synthesia says it examines its clients and their scripts and requires a person’s formal consent before synthesizing their appearance, and the company refuses to touch on political content. WellSaid also does not create voice avatars without the permission of the actors and subscribes to the “Hippocratic Oath for AI“Proposed by Microsoft executives Brad Smith and Harry Shum. In terms of resemblance, it released an open source tool that detects deepfakes by deriving high-level representations of speech samples and predicting whether they are real or generated.
Founders like Ahmed think the pros outweigh the possible cons. As he told VentureBeat in a recent interview, “We set out to create a product that helps creatives overcome the hurdle of creating audio content. With more audio content being produced year after year – smart speakers … AirPods, podcasts, audiobooks and digital characters in virtual and augmented reality – there is a great and growing need for fast and accurate voice cloning. “
VentureBeat’s mission is to be a digital urban plaza for technical decision makers to gain insight into transformative technology and transact. Our site offers essential information on data technologies and strategies to guide you as you run your organizations. We invite you to become a member of our community, to access:
- updated information on the topics of your interest
- our newsletters
- Exclusive content from thought leaders and discounted access to our treasured events, such as Transform 2021: Learn more
- network features and more
Become a member