What just happened? Google Translate has added support for 110 new languages in what the company claims is its largest expansion drive ever. The newly supported languages, which include Afar, Cantonese, Manx, Nko, Punjabi (Shahmukhi), Tamazight (Amazigh), and Tok Pisin, represent over 614 million speakers, or around 8 percent of the world's population.
Google is using AI to expand the number of supported languages as part of its "1,000 Languages Initiative," which was announced back in 2022. The company says it is committed to building AI models that will support the 1,000 most-spoken languages around the world.
To add support for the new languages, Google used its PaLM 2 large language model, which also powers generative AI features like email summarization in Gmail and rewriting in Google Docs. PaLM 2 is trained on parallel multilingual text, and according to Google, helps its translation service more efficiently learn languages that are closely related to each other.
We're using AI to add over 100 new languages to Google Translate, our largest expansion ever. Learn more ↓ https://t.co/jLGouceAIG
– Google (@Google) June 27, 2024
The newly added languages include some major ones that are used by more than 100 million people, while others are spoken by small indigenous communities. A few of the languages have almost no native speakers, but Google hopes that this update will help their revitalization efforts.
This is also Google Translate's largest expansion of African languages to date, with almost a quarter of the new languages, like Afar, Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof, coming from the continent.
On the other end of the spectrum, there's Cantonese, which is a massive language with millions of speakers in China, Hong Kong, and Macau. However, it was not part of Google Translate until now, despite being one of the most requested languages, as it often overlaps with Mandarin in writing, making it difficult to find data and train models.
The update marks the most significant expansion of Google Translate since 2022, when the service added support for 24 new languages using Zero-Shot Machine Translation. It added languages like Mizo, which is native to around 800,000 people in northeast India, and Lingala, used by over 45 million people across Central Africa. It also introduced support for multiple Indigenous languages of the Americas, such as Quechua, Guarani, and Aymara, as well as a Sierra Leonean English dialect called Krio.