We’re developing AI for real-time speech translation to break down language barriers in the physical world and in the metaverse.https://t.co/PC0BzLCHAr pic.twitter.com/wW8NlyndmM
— Meta Newsroom (@MetaNewsroom) October 19, 2022
Meta has released an AI translation model for languages with limited or no written form. Researchers at the tech giant demonstrated the now open-source software by translating Hokkien, a language related to Mandarin and spoken by around 46 million people in southern mainland China, Taiwan, Singapore, and parts of Malaysia and the Philippines. Hokkien is largely oral, with no standardized written version, and thus a challenge for direct translation.
As seen in the video at the top, the AI translation for Hokkien is at least partly a success. The researchers, under the umbrella of Meta’s Universal Speech Translator (UST) project, devised the model by first relying on Mandarin as an intermediate language between Hokkien and English, bolstered by reviews from native Hokkien speakers to devise the speech-to-speech translation option. Despite its accuracy, the model is limited to one sentence at a time and is still far more limited than similar software for languages with written libraries to draw from. That’s partly why Meta made the model open-source, to encourage developers to improve it and to look to do the same for other languages with limited translation technology options. Until now, AI translation has mainly focused on written languages. Yet Hokkien is far from unique in its oral emphasis. Languages with no standard or common writing system account for an estimated half of the more than 7,000 living languages around today.
“It can be a barrier to confidence, fluency, and authenticity,” Meta researcher and linguistic anthropologist Laura Brown said. “We know at Meta that there are tons of people all over the world who have their interface set to English, who use English on our platforms — even though they are much more confident in other languages and writing systems. As soon as we give them the ability to do audio in their own language, their comfort and confidence in the digital space shoot way up.”
The oral-only model is Meta’s latest in a series of AI translation projects. The company shared a model claiming particularly high-quality translations among 200 languages in July, named NLLB-200 for its roots in Meta’s No Language Left Behind (NLLB) project. The NLLB-200 model came only a couple of years after a 100-language translation model, the first to work without requiring English as an intermediary language between any of the other tongues. The latest effort also dovetails with the two giant conversational AI datasets Meta released last year, partly to widen digital technology access to those who don’t speak the most common online languages.
Meta is hardly alone in exploring AI interpreters for various purposes. Mozilla Foundation’s Common Voice project supports tech developers without access to proprietary data a few years ago. The Common Voice Database boasts more than 9,000 hours of 60 different languages and claims to be the world’s largest public domain voice dataset. Nvidia made a $1.5 million investment in Mozilla Common Voice and started working with Mozilla on voice AI and speech recognition. The global focus of the project led to a $3.4 million investment this spring to focus on creating such a resource for Kiswahili, known elsewhere as Swahili.
Follow @voicebotaiFollow @erichschwartz
Verbit Closes $31 Million Series B Round Bringing Total Funding to $65 Million
Speak Too Loudly While Playing Phasmophobia and Video Game Ghosts Will Hunt You Down
Twitter Spaces Opens Spark Social Audio Accelerator to Support Creators
Read the full article here