Back in August 2023, Meta unveiled an ‘all-in-one’ AI translation model capable of understanding close to 100 different languages.
Dubbed SeamlessM4T (Massively Multilingual and Multimodal Machine Translation), this is Meta’s attempt to create a ‘universal translator’ similar to Babel Fish in Douglas Adams’ classic sci-fi series The Hitchhiker’s Guide to the Galaxy .
The team behind the SeamlessM4T tool has now described its work in a piece in the journal Nature unveils the advanced system that delivers an all-in-one solution for text-to-text, speech-to-text, speech-to-speech and text-to-speech translation across an impressive and growing range of languages .
Over 400 years of raw sound
SeamlessM4T, which is used to automatically dub videos on Facebook and Instagram, among other things, currently supports speech-to-speech translation from 101 to 36 languages, speech-to-text translation for from 101 to 96 languages, text-to – -text translation for 96 languages, text-to-speech translation from 96 to 36 languages and automatic speech recognition for 96 languages. This unified approach overcomes the limitations of traditional cascaded systems, which often require separate subsystems for speech recognition, translation, and text-to-speech synthesis.
By streamlining these processes, Meta says SeamlessM4T outperforms existing models, achieving up to 23% higher BLEU (Bilingual Evaluation Understudy) scores in translation accuracy and demonstrating impressive resilience to background noise and speaker variations.
To create SeamlessM4T, Meta started with 4 million hours (over 400 years) of multilingual raw audio derived from a publicly available repository of crawled web data. The team developed SeamlessAlign, a multimodal corpus containing over 470,000 hours of aligned speech, and combined the dataset with cutting-edge machine learning techniques, including SONAR (Sentence-level Multimodal and Language-Agnostic Representations) embeddings, which enable multilingual and modality-agnostic coding for text and speech.
Meta says that by addressing social and ethical challenges through the use of security measures, SeamlessM4T can be a valuable tool for global communication. These safeguards reduce gender bias – errors in grammatical gender determination – and address the problem of increased toxicity – where offensive words appear in translations but not in the original source.
You also like