Unbabel says the new TowerLLM AI model beats OpenAI's GPT-4 in translation.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Unbabel, a tech company that provides both machine and human-based translation services for businesses, has created a new AI model that predicts the difference between English and six commonly spoken European and Asian languages. Beats OpenAI's GPT-4o and other commercially available AI systems at translation. .

Translation has been one of the more attractive business use cases for large language models (LLMs), the kind of AI systems that underpin chatbots like OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude. And to date, GPT-4o, the latest version of OpenAI's most powerful AI model, has outperformed all competitors when it comes to translating languages ​​with large amounts of digital text. (GPT-4's performance on “low-resource languages”, which have little digital documentation for training, has never been very good.)

Unbabel tested its AI model, which it calls TowerLLM, against GPT-4o and the original GPT-4, as well as OpenAI's GPT-3.5 and competing models from Google and language translation company DeepL. He saw translations from English to Spanish, French, German, Portuguese, Italian and Korean. In almost every case, TowerLLM narrowly outperforms GPT-4o and GPT-4. TowerLLM's highest accuracy came in English-Korean translation, where it beat OpenAI's best models by about 1.5%. On the English-German translation, the GPT-4 and GPT-4o were a fraction of a percentage point better.

Unbabel also tested its model on translating documents for specific professional domains such as finance, medicine, law, and technical writing. Here again, TowerLLM outperformed OpenAI's best models by between 1% and 2%.

Unbabel's results have not been independently verified, but if confirmed, the fact that GPT-4 is now optimized in translation may indicate that the model, which 15 months ago Despite its debut, it has been a top-performing LLM on most language standards. Eternity—in the fast-paced world of AI development—may now be vulnerable to new AI systems trained in different ways. OpenAI is reportedly training a more powerful LLM — though its release date is uncertain.

Unbabel, which is headquartered in both San Francisco and Lisbon, said TowerLLM was trained to be multilingual on a large public data set of multilingual text. This means that the model also performs better on reasoning tasks in multiple languages ​​than some competing open-source AI models of similar size created by companies such as Meta and French AI startup Mistral.

TowerLLM was then fine-tuned with a carefully curated dataset of high-quality translations between language pairs. Unbabel was able to use another AI model it trained to evaluate translation quality—called COMETKiwi—to help validate this fine-tuning data set.

Unbabel Chief Technology Officer João Graça said good fortune that most other LLMs have a high proportion of English language texts in their initial training set and they acquire the ability to translate incidentally. But TowerLLM was trained on a dataset that was specifically designed to include multilingual text. He also noted that fine-tuning on a small, curated dataset of high-quality translations is key to the resulting model's high performance.

This was one of several recent examples in which small AI models have equaled or exceeded the performance of much larger models when trained on better-quality data sets. For example, Microsoft created a small language model called Phi 3 with only 3.8 billion parameters (tunable variables in the model), which Microsoft created a “textbook-quality” dataset for models twice that size. Also performed better. “Phi's insight is that people should focus on data quality,” Graca said. He noted that all AI companies are now using the same basic algorithmic design with some subtle variations. What differentiates the models is the data. “It's all about the data and the training curriculum, the way you feed the model,” he said.

TowerLLM is currently available in two sizes, one with 7 billion parameters and the other with 13 billion. An older version of the model, launched in January, came close to GPT-4's performance, but didn't exceed it. This model also worked for only 10 language pairs. The new model goes beyond GPT-4 and supports 18 language pairs.

The model has only been tested against GPT-4o for translation, meaning that GPT-4 may still have an advantage in other tasks such as reasoning, coding, writing, and summarization.

Unbabel plans to expand the number of languages ​​TowerLLM supports, adding 10 additional languages ​​soon, Graça said. The model is also being fine-tuned to work on very specific translation tasks that businesses often care more about—such as translating complex legal documents or patent and copyright information. He is trained to become better at “transcription,” the skill of translating a piece of material word for word, but also to pick up on very subtle cultural nuances, such as colloquial expressions or sayings. A trick used by natives belonging to a particular race. Use, said Graça.

Subscribe to the Eye on AI newsletter to learn how AI is shaping the future of business. Register for free.
WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment