Google's Big Leap Forward for Real-Time Translations Is Deepfaking Your Voice

I can’t count the number of times I’ve been promised the Next Big Thing in app-enabled translation. Since the dawn of Google Translate, it’s felt like tech companies (not just Google, but Samsung and Apple, too) have been teasing a future where speech can be translated instantaneously, allowing for near-seamless communication between people from *sarcastic SpongeBob voice* across the world. But the truth is, that future, however enticing on paper, hasn’t exactly panned out.

As incredible as apps have gotten at translating speech and text with a fairly high degree of accuracy, they haven’t quite risen to the speed and cadence of real-life conversations. Designing a translation tool that can keep pace with our mouths (like, actually talking) isn’t an easy feat. We talk fast, and we expect even faster responses, which makes live translation less of a marathon than a sprint, or I guess more accurately, a sprint that could be a marathon in length.

Given that long promise of snappy, useful, real-time translation, I’m conditioned to roll an eye or two when live translate enters into a keynote, which is exactly what I did during Google’s annual Pixel hardware event. This year, though, that eye roll might not be warranted. At its Made by Google keynote, Google showcased a feature that not only translates your speech in real time, but also deepfakes your actual voice (also in near real time) so that the person on the other end can hear you speaking in their native language. And yes, it works in the inverse, too. That’s right, just two deepfakes talking to each other; nothing to see here, folks.

And the extra wild part is Google was so confident in its new live translation feature that it offered up a live demo, which, I’m not going to lie… it kind of nailed? Gizmodo’s Senior Editor, Consumer Tech, Raymond Wong, captured the whole thing live at Google’s keynote. For your viewing pleasure, Jimmy Fallon’s voice deepfaked into Spanish:

I was also watching along from home during this segment, and my partner, who’s Spanish-speaking and bilingual, confirmed that Google’s new AI translate feature seemed to ace the assignment, inflections and all. Don’t get me wrong, I still want to test those translation features for myself, but from the looks of it, Google is off to a pretty amazing start here.

Powering those translation abilities is Gemini Nano, a compact version of Google’s increasingly iterated-upon large language model and the Pixel 10’s Tensor G5 chip. Google says its Nano model and the translation feature are run on-device in this case, which means that nothing—including your calls—gets teleported to the cloud while you’re translating. That makes its new feature less icky, and I say “less” in this case because let’s be honest, this thing is still deepfaking your voice.

Truthfully, if Google’s new translation feature wasn’t happening on-device, I might be a little worried. As cool as it is, the thought of having a facsimile of your voice stored on a server somewhere is a bad one, given the fact that people use biometrics for all sorts of important digital security, banking included. And in a way, on-device or not, the feature is still creepy. Apparently we’re at the stage of instantaneous voice deepfakes. Just imagine what AI can do with a little bit of time and training.

But more than anything, I’m impressed by what Google showed off today, especially as someone who’s watched tech companies overpromise on translation features for years at this point. It’s still too early to declare that Google has hit the Holy Grail of real-time translation, but for once, I’m left thinking that the idea of seamless, phone-enabled translations has actually taken a major leap forward. So, consider my eye roll officially rescinded, Google.

Like
Love
Haha
3
Nâng cấp lên Pro
Chọn gói phù hợp với bạn
Đọc Thêm