argia.eus
INPRIMATU
Google Translate: 24 more languages with a new model
  • Google Translate incorporates 24 languages into its automatic translation system. There are languages like the Guarani, the Aimara, the Bambara or the ewe of Ghana that colonialism crushed but did not kill.
Sustatu 2022ko maiatzaren 16a

Technically, it has marked a new milestone with this increase in Google, as explained in this note. The translation capacity of these languages has been reached through the use of Zero-Shot Machine Translation, based on artificial intelligence, and is characterized by the functioning of this system without the use of bilingual corpus. That is, Google (say) has managed to build the model of that language using only Aymara texts and enable it for translations.

We found this model related to the work and thesis of the Basque computer scientist Mikel Artetxe (Unsupervised Machine Translation), where he developed an automatic translation procedure for minority languages without bilingual corpus. Artetxe now works in the artificial intelligence branch of Facebook-Meta, not Google.

We've tried, translating a text from the Google ad into Aimara language, and then into Euskera. Here are the screen images:

A phrase has been a little special in Basque, "if you want to help the night in the next update...", but, finally, so much.

We have seen that the new languages are already in https://translate.google.es/, but not in the lower Translate window integrated into the search engine home page. The added languages are:

Here are the new languages added by Google Translate:

  • Assam, 25 million speakers in India.
  • Aimara, 2 million speakers, mainly in Bolivia.
  • Bambara, 14 million speakers in Mali and Senegal.
  • Bhojpuri, 50 million speakers in India, Nepal and the diaspora.
  • Maldivera or Dhivehi, 300,000 speakers, national language of the Maldives.
  • Dogri, 3 million speakers in India and Pakistan.
  • Ewe, 7 million speakers in Togo and Ghana.
  • Guarani, 7 million speakers, indigenous and national language of Paraguay.
  • Ilocano, 10 million speakers in the northern Philippines.
  • Konkanera, 2 million speakers, in India, around Goa.
  • The Creole Child, the main language of Sierra Leone.
  • Kurduera (Sorani variant), 15 million speakers in Iraq and Iran.
  • Lingal, 45 million speakers, the main language of Congo, also speaking in neighbouring countries.
  • Luganda, 20 million speakers in Uganda and Rwanda.
  • Love, 34 million speakers in India.
  • Manipurera, 2 million speakers in India.
  • Look, 830,000 speakers in India.
  • 37 million speakers in Ethiopia and Kenya.
  • Quechua, 10 million speakers in Peru, in the Andes in general and in the diaspora.
  • Sanscritic, the ancient classical language of India (its "Latin"), which can contain up to 20,000 speakers.
  • Sepedi or pediera, 14 million speakers in South Africa.
  • Tigrinya, 8 million speakers in Eritrea and Ethiopia.
  • Tsonga, 7 million speakers in South Africa and neighbouring countries.
  • Twi, 11 million speakers in Ghana.

Euskera has been on Google Translate for 12 years, which was added in 2010. Then it had a reasonable quality, but then it has improved a lot, but we believe that tools like Elia.eus or Batua.eus, created in Euskal Herria, are better than Google.