80 percent of the world communicates in a mother tongue that isn’t English. So why aren't more CS leaders offering their customers communication in their language of choice? And how could a new form of conversational AI called Natural Language Understanding (NLU) challenge the status quo?
Why is so much conversational AI in English?
Our COO at Ultimate, Sarah Al-Hussaini, explains: Many of the world’s tech leaders — Google, Apple, Microsoft — are companies based in the United States. Innovations in conversational AI technology are usually based on analyses of natural language. Due to the extremely high costs that it takes to create AI models, much of the data used to build them is skewed towards the English language simply because it's what the people paying for them speak.
“In regions with fewer inhabitants on average — like Scandinavia or the Benelux countries — many technical innovations cannot be found in natural language.”
- Sarah Al-Hussaini, Co-Founder and COO, Ultimate
Can NLU AI go global?
According to NLU expert Prof. Maya Popovich, the less common a language, the harder it is to create AI models for it. That's because complex languages likely have rich morphologies but very small data sets, which makes it hard to produce accurate AI models for them using NLU. Take Serbian, for example: Because it is bi-alphabetical (it uses both Latin and Cyrillic scripts), it can be tough to parse for AI models used to dealing with only one of the two.
But there have been some efforts to change that.
According to Prof. Popovich, “Out of Eastern European languages, one of the best-supported is Czech — a strong research group has already been working for many years on several NLP aspects such as syntax and parsing, POS tagging, and machine translation.”
Russian, as well, is being increasingly supported in recent years. This is thanks, in large part, to investment in R&D by Russian search engine provider Yandex.
Northern European languages, like the Baltic tongues, have also received some attention from the AI community, primarily from universities, but also Tilde — an organisation working to enable multilingual communication in tech innovation.
In fact, we at Ultimate were one of the first AI companies to develop NLU AI models for Finnish — one of the most complex languages in the world.
Learn about Ultimate’s multilingual AI
There’s a potential solution to the unique challenge with bi-alphabetical languages like Serbian, too. Serbian is quite similar to Croatian, so combining data from the two languages in an appropriate way has proven to be very helpful with training AI. This can be said for other less represented languages as well.
Another potential solution is for governments to start taking action. As Sarah Al-Hussaini points out:
“Government action can play a key role in democratizing AI. First, government funding can promote research and development of technologies in national languages. For example, grants, subsidies, and tax benefits can be given to companies that use and/or create applied AI using natural language.”
Sarah Al-Hussaini, Co-Founder and COO, Ultimate
Sarah has another suggestion: Governments could assist in the collection of large databases in non-English speaking markets, which can be made publicly accessible. The effect of reducing the obstacles of AI technology development could be crucial in closing the innovation gap between English and non-English NLU and AI technologies.
Looking beyond Europe for innovations in NLU AI
Finally, no article on investment in AI R&D in complex languages would be complete without at least touching on China. The Asian superpower dwarfs the US in terms of non-military AI R&D spending, at 5.7 billion USD, compared to the US’ 1 billion USD. As such, it probably goes without saying that Mandarin and Cantonese are well supported in the NLU community. So while the investment in conversational AI has not been spread evenly across languages in the past, we might inch closer with a dash of government regulation, a slice of creative linguistic training, and a splash of R&D investment by non-English speaking tech innovators.
And we're proud to say we're one of them — offering multilingual AI in 109 languages, including Arabic, Hindi and Mandarin.