New techniques expand the possibilities for fake video and audio
Will Clark Gable and Vivien Leigh star in Gone With the Wind 2035? What about a new Beatles album with John Lennon and George Harrison rejoining the group posthumously?
These scenarios are among the limitless possibilities as artificial intelligence and machine learning techniques help to manipulate video and audio in new ways. Techniques using machine learning will enable producers to clone people into footage and easily edit out glitches in video interviews.
If you think fake news is a problem, we’ll see a flood of near-perfect phony videos online. We’re on the precipice of an age of incredibly authentic-looking fake videos starring real people saying outrageous things. And we’ll end up not trusting video.
Without complex editing techniques or expensive equipment, they will portray people perfectly.
Toby Walsh, professor of artificial intelligence at the University of NSW and the CSIRO’s Data61, says it is only a matter of months before you’ll find it impossible to distinguish fake and real video footage, and even computer algorithms will struggle to tell the difference.
“You can’t fake very well the blinking … there are a few telltale signs, but I have already said publicly that before the end of the year I’m sure all those telltale signs will be eliminated,” he says.
Television producers will be able to touch up video footage in the same way that a glossy magazine uses Photoshop.
Instead of a human newsreader, a fake could explain the latest events. The only giveaway may be a lack of spontaneity. “You won’t have to do 10 takes. You do one take and you can fix whatever problems happen afterwards with technology,” Walsh says.
If the public loses faith in video, a system of secure digital signatures will be needed to ensure the authenticity of footage.
“We’re going to increasingly want a security certificate like we do for websites for video and audio so that we can ensure it has not been tampered with,” Walsh says.
It’s already possible to duplicate the voice of a person — that was how Apple’s Siri and a host of synthetic voices were created. Now it can be done more readily and accurately with AI and ML.
A machine learns to reproduce any sentence fed to it with the same pitch, intonation, phrasing, pauses and idiosyncratic pronunciation as you would use.
DeepMind, a Google acquisition, says its WaveNet capability can generate speech that mimics any human voice and sounds more natural than the best text-to-speech systems.
Video reproduction also is more realistic. AI technology now goes beyond re-creating the shape of a person’s mouth as they pronounce words. It copies and replicates a speaker moving their facial muscles, eyebrows and tilt of the head.
Last year the University of Washington demonstrated a chilling research project called Synthesizing Obama after teaching a machine the former US president’s facial mannerisms and other speaking habits.
This wasn’t done laboriously by noting detailed face movements one by one and entering them into a database. A computer system modelled on the human brain called a neural network was fed 17 hours of footage of Barack Obama from his weekly addresses. The project’s website says machine learning can map raw audio features to mouth shapes and other personal attributes.
A research paper by the Synthesizing Obama team says: “We synthesise high-quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.”
The project’s authors, Steven Seitz and Ira Kemelmacher-Shlizerman from the university and Supasorn Suwajanakorn, who is now with Google Brain, say the paper is the first attempt at solving the audio speech to video speech issue by analysing a large amount of video data of a single person.
It shows how easy it is to fake a video of a public figure when plenty of interviews, speeches and news stories are in the public domain to train the neural network.
By combining these video capabilities with accurate voice synthesis, humanity will be able to manufacture fake videos to the point that video could lose its status as a source of the truth.
Primitive video certificate systems exist now but they need to be more sophisticated.
“There are some challenges with that in terms of, if you want to take a chopper segment out or want to compress media but not destroy the certificate, we’ve still got some things to deal with that,” Walsh says.
Blockchain also could keep a video signature authentic.
“But you do wonder how long (it will be) before an election gets significantly disrupted by the emergence of some fake video,” Walsh says.
It may be too late to expose the fake before the vote.
The ability to mix the real and fake in video will have interesting applications. If a neural network can comb through hours of footage of Obama to create a fake version, it’s odds on that neural networks soon will ingest years of footage of Hollywood movies so that a new film can replicate actors from the past and cast them into new movies.
In Android, the Shadow Avatars app already can take a person’s face scanned with the Sony 3D Creator App and place it in movie trailers.
Alan Blair, from UNSW’s school of computer science and engineering, says AI-aided computers in future may create personalised movies starring whoever the user wants.
“Instead of watching a pre-made movie, people will synthesise their own movies,” Blair says. “If they want to watch a western movie with Cary Grant in it, the computer will chug away and generate a brand-new film.”
Blair notes that in the 1990s there was an attempt to create a voice print of US president John F. Kennedy, but at a time the technology was primitive.
In March this year, it was announced that audio had been created in Kennedy’s voice of the speech he was to give on the day of his assassination. It was created by speech-to-text firm CereProc.
The possibilities are limitless. If neural networks can replicate a human voice speaking, it won’t be long before they replicate a human voice singing. New songs could be composed with your favourite artist as the lead vocalist — even if they died many years ago. We may see singers copyrighting their voices so they can’t be reproduced without their permission, as well as their songs.
But storing a footprint of your voice will be useful. If you’re an author, a computer accessing your voice print could create an audiobook from the text of your book if your voice has been banked. You won’t need to read out the book yourself.
As we have reported previously, a sufferer of motor neurone disease can create a voice print so they can communicate through a machine that uses their voice after they have lost the ability to speak.
It’s amazing technology, but the casualty is going to be our trust in video and audio. The fake world is a real possibility.
Source: New techniques expand the possibilities for fake video and audio
There are no comments at the moment, do you want to add one?
Write a comment