Citing concerns about enabling deep false content, Microsoft researchers unveiled a new artificial tool that can produce remarkably lifelike human avatars, but they did not provide a timeline for when it will be made public.
An animated movie of a person speaking with synchronized lip movements can be produced using the AI model known as VASA-1, for “visual affective skills,” with just one image and one spoken audio clip.
In a crucial election year, disinformation experts are concerned about the widespread misuse of AI-powered applications to produce “deep fake” images, videos, and audio samples.
The VASA-1 report’s authors declared, “We are opposed to any behavior to create misleading or harmful contents of real persons,” which Microsoft Research Asia released this week.
“We are dedicated to developing AI responsibly, with the goal of advancing human well-being,” they stated.
“We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”
According to Microsoft researchers, a broad range of realistic head motions and facial characteristics can be captured by the technology.
Researchers stated in the report that “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.”
Microsoft states that VASA can work with creative images, music, and non-English speech.
Researchers highlighted the technology’s potential advantages, including things like giving kids virtual teachers or helping those in need of therapy.
“It is not intended to create content that is used to mislead or deceive,” they stated.
The post says that VASA videos still retain “artifacts” that show they are artificial intelligence (AI) produced.
Ben Werdmuller, head technology at ProPublica stated he’d be, “excited to hear about someone using it to represent them in a Zoom meeting for the first time.”
“Like, how did it go? Did anyone notice?” he asked on the social media platform Threads.
In March, the creators of ChatGPT, OpenAI, unveiled “Voice Engine,” a voice-cloning technology that can virtually mimic a person’s speech based on a 15-second audio sample.
However, it noted that it was “taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse.”
A consultant for a Democratic presidential candidate who was a long shot earlier this year acknowledged that he was the one responsible for a robocall that was made to voters in New Hampshire, posing as Joe Biden, with the intention of drawing attention to the risks associated with artificial intelligence.
Experts are concerned about a wave of AI-powered deepfake misinformation in the 2024 presidential election after hearing what sounded like Joe Biden’s voice advising people not to vote in the state’s January primary.