Elhuyar's Expert Technology Turns Conversation Into Text In A Moment

  • Talk to the microphone on your computer or mobile and give you a written text back. It does the same with already recorded audio and video files. The Elhuyar Foundation has offered Basque technology in several languages. He has also made the Spanish version, and the English and French versions will soon be available.

01 April 2020 - 17:30
Bat-bateko hizketa testu bihurtzeko gai da Aditu.
What does an expert do?

He's an expert conversationalist. Take audio and video files and offer written text. It also includes links to videos and audios of the Internet, that is, it does the same with content from Youtube, Facebook and Instagram.

It shows the written text of the transcription as such, the files of the ready and prepared subtitles and the transcription with the timestamps of each word, that is, they can be searched in the videos, in addition to words, by timestamps.

The jump from word to writing is quick, so exercise can be done directly. You can talk to the microphone and read in an instant what you have written.

In what situation is it best?

It is useful for many audiovisual content. It can be used for information, documentaries, interviews. In such cases, the Expert would create subtitles. It also serves to transcribe radio audio recordings. For example, Elhuyar is working with Antxetamedia. Being the radio, Antxeta also offers interviews and reports on its web, where you can follow live. The intention is to bring some of these audio recordings into writing and also to offer readers a written version.

Another area of application is that of public institutions: municipal meetings, parliamentary hearings and interventions of this kind can be put in writing. The text can be displayed instantly or saved. If we look at the current communicative relationship, combining language with other technologies, new opportunities emerge. Let us take an example: the municipal assembly is being followed on the internet, the municipal representatives are seeing the written text as it is being talked about, and you can also translate into Spanish the text that Elhuyar has recently presented through the neural translator. And if we go further, that text that is collected in Spanish can be converted into audio. This application is not yet developed, but it is feasible.

It is a valid service for deaf people and people with physical disabilities. Audiovisuals, any kind of conference in the hall of events of the locality, municipal assemblies… all that is what the deaf could see, as the discourse would become text. Anyone who cannot write can use the system in the opposite direction, that is, through a good quality microphone their speech would be received and it would become a written conversation.

Those who, without being disabled, cannot write in certain situations, but want to take the opportunity to speak, are attentive to this technology. The head of Igor Leturia Adituren tells us how someone told him that he would use it in the car. In other words, to advance the work bills, while driving the car, they would talk to the mobile phone and the mobile phone would turn it into text. In other words, instead of writing the report in front of the computer, you can take advantage of the car journey to tell the mobile report orally and in a moment you will have the text in your hand. They haven't done any real tests, and they don't know what the outcome would be. In fact, cognitive speech needs a number of conditions to do its job.

When does it not work well?

Elhuyar has hung several demo on the Aditu website so everyone can check the results of the conversation they know. The CAV lehendakari has attended a year-end conference, a municipal assembly, the Teknopolis television programme, the small audiovisual dose of Imanol Pagola and a simultaneous exercise. The result is very tidy, although there are still mistakes. Paste the text. Anyway, in this kind of audio and video recordings you get good results, but you have limitations and you don't always get results from those demo.

The audio must have a good quality, it is seriously seen if a good microphone is not used or if there is an echo or if there is more than one person talking. If the speaker is suddenly and leaves the words in half, if he retracts, then he stops... there are problems to receive these kinds of conversations well. An example of the journalist who will publish the interview is valid. The result would be very different if the journalist interviewed one person in the living room and with a good recorder, or if instead of interviewing two or three people and in the bar where the music is put. If you want to get a good result, you have to look at the audio quality and also the speaker's speech form. It is better prepared for the Basque Country batua and the dialect does not catch them as well.

What is there and what do they want to improve on?

At the moment, Elhuyar has opened the Aditu website and anyone can try the language they know, with their own errors. There is the possibility of free tests, as well as a space for clients. From now on, concrete applications will be made according to customer requirements. Looking ahead, Leturia believes that the combination of machine translation and speech creation opens the door to several options and they will develop them. One of them would be the example we mentioned earlier, that is, to move to written language, to convert that text into Spanish and the text into audio. If all this technology were applied in a live speech, simultaneous translation would be spoken: the speaker in Basque, and thanks to technology, without the need for a translator or professional interpreter, the listeners could follow the conference in Spanish.

