Cohere Launches New Aya Expanse Models to Enhance Multilingual AI Capabilities
Cohere, a leader in AI research, has unveiled two new open-weight models under its innovative Aya project, designed to bridge the language divide in foundational models. The Aya Expanse 8B and 35B models, now accessible on Hugging Face, boast enhanced performance across 23 languages, furthering the company’s mission to democratize AI technology globally.
According to a blog post by Cohere, the 8B parameter model significantly enhances accessibility to cutting-edge research for users worldwide, whilst the 35B version offers state-of-the-art multilingual functionalities. This move is part of the Aya initiative, which was launched by Cohere for AI in the previous year to broaden the reach of foundational models beyond English.
The Aya Expanse models build upon the groundwork laid by the Aya 101 large language model (LLM), released earlier this year. The Aya 101 model, featuring 13 billion parameters, supports 101 languages and was accompanied by the introduction of the Aya dataset, aimed at improving access for model training in diverse languages.
Cohere attributes the advancements in Aya Expanse to a concerted research effort focused on how AI can better serve a multitude of languages. This initiative included breakthroughs in data arbitrage, preferences training for improved performance and safety, and innovative model merging strategies. “Our research agenda has been dedicated to bridging the language gap,” the company noted, highlighting the critical steps taken to refine their model’s capabilities.
Aya Expanse Outperforms Competitors
In benchmark tests, both Aya Expanse models demonstrated superior performance compared to similarly-sized models from industry giants like Google, Meta, and Mistral. Notably, the 35B model outperformed renowned models such as Gemma 2 27B and Llama 3.1 70B, while the 8B model also surpassed Gemma 2 9B and Llama 3.1 8B, marking a significant achievement for Cohere.
The Aya models are developed using an innovative data sampling method known as data arbitrage, which mitigates the production of nonsensical outputs often generated by models reliant on synthetic data. Many existing models utilize synthetic training data from “teacher” models; however, Cohere’s approach aims to avoid the pitfalls typically associated with low-resource languages, where high-quality teacher models are scarce.
Cohere has also focused on understanding and incorporating "global preferences" to address the varying cultural and linguistic contexts pertinent to users around the world. This means that while ensuring a baseline safety, the models are being trained to account for diverse cultural nuances—an important step given the predominance of Western-centric datasets in existing safety protocols.
Expanding Global Language Capabilities
The Aya project emphasizes the importance of enhancing research and performance in languages other than English. While many large language models do become available in other languages, there is often a lack of comprehensive data for model training. English continues to dominate as the official language across various sectors, including government, finance, and online communications, making it easier to gather data in this language.
Additionally, accurate performance benchmarking for models in multiple languages poses its own set of challenges, largely due to the quality of translations. In response, developers across the industry are dedicating efforts to create robust datasets that bolster research into non-English AI models. For instance, OpenAI recently released the Multilingual Massive Multitask Language Understanding Dataset on Hugging Face, which enhances testing capabilities for language models in 14 different languages, including Arabic and German.
Cohere’s recent activities haven’t gone unnoticed either, as the company also launched image search capabilities for its Embed 3 enterprise product this week and improved fine-tuning features for the Command R 08-2024 model.
With the rollout of the Aya Expanse models, Cohere takes a significant step forward in making AI accessible to non-English speakers, marking a pivotal moment in the evolution of multilingual technology. As research and innovation in this space continue to flourish, the gap in language accessibility is poised to close, paving the way for a more inclusive digital future.