AI21 CEO Warns: Transformers May Not Be Suitable for AI Agents Because of Error Propagation Issues

Aiden Techtonic By Aiden Techtonic 5 Min Read

The Future of Enterprise AI: A Shift Towards Efficient Architectures

As enterprises increasingly transition towards an "agentic" future, a significant challenge lies in the limitations of current AI model architectures. According to Ari Goshen, CEO of AI21, a leading player in the enterprise AI development space, exploring alternative models is essential for unlocking the full potential of AI agents.

In an interview with VentureBeat, Goshen pointed out the shortcomings of Transformers—currently the most widely used model architecture—which may hinder the viability of multi-agent ecosystems. He highlighted a growing trend in the industry: the move away from Transformers to more efficient architectures.

"Transformers can create an overwhelming number of tokens, leading to increased costs," Goshen noted. This has propelled AI21 to advocate for a broader selection of model architectures rather than solely relying on Transformers. To this end, the company is advancing its innovations with the JAMBA architecture (Joint Attention and Mamba Architecture), derived from the Mamba architecture developed by researchers at Princeton and Carnegie Mellon. This alternative is designed to deliver faster inference times and accommodate more extensive context lengths.

Goshen further explained that these emerging architectures, like Mamba and JAMBA, not only reduce operational costs but enhance memory performance. This improvement is crucial for agents, particularly those designed to interact with other AI models. He believes the nascent stage of AI agents in the market primarily stems from the inconsistencies associated with Transformer-based large language models (LLMs). “The main reason agents haven’t fully entered production is due to their lack of reliability,” he remarked, highlighting the stochastic nature of these models, which can perpetuate errors.

The Rise of Enterprise AI Agents

AI agents have become one of the most talked-about trends in the enterprise AI landscape this year. Several companies, including ServiceNow, Salesforce, and Slack, have recently rolled out advanced AI platforms aimed at simplifying the agent development process. ServiceNow enhanced its Now Assist AI platform with a library of AI agents tailored for its users, while Salesforce unveiled its Agentforce suite. Simultaneously, Slack has begun enabling integrations with agents from various providers, including Cohere, Workday, Asana, and Adobe.

Goshen is optimistic about the uptick in interest surrounding these agents, especially with improved model architectures. He explained that many current applications, such as chatbots that provide answers, simply serve as enhanced search functions, while true intelligence lies in retrieving and synthesizing diverse information sources.

AI21 is actively exploring new offerings centered around the development of AI agents, striving to improve their practical utility.

Alternative Architectures Making Waves

Goshen is a strong advocate for alternatives to Transformer models, such as the Mamba and JAMBA architectures, citing the high operational costs associated with Transformers. These newer models diverge from the traditional attention mechanisms and instead focus on optimizing data prioritization and memory utilization, maximizing GPU processing capabilities.

The Mamba architecture is gaining traction in the developer community, with several open-source initiatives releasing Mamba-based models, such as Mistral’s recent Codestral Mamba 7B and Falcon’s Falcon Mamba 7B.

Despite the emergence of these alternatives, Transformers remain the dominant choice for constructing foundation models. Prominent examples, including OpenAI’s GPT, exemplify this trend.

Ultimately, enterprises are keen on fostering reliability over the charisma of flashy demos. Goshen cautions that while we are witnessing impressive demonstrations of AI capabilities, there is still a gap between concept and viable product implementation. "It’s a great time to leverage AI for research purposes," he stated, "but we are not yet at the stage where these technologies can support critical decision-making within enterprises."

As organizations explore paths toward smarter, more efficient AI, the key may lie in embracing a wider array of model architectures that go beyond the conventional, potentially reshaping the landscape of enterprise AI.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *