Cohere Launches Multimodal Embeddings for Enhanced Enterprise Search
Cohere, a leader in AI language technology, has unveiled a groundbreaking update to its search model by introducing multimodal embeddings. This innovative feature allows enterprises to utilize both images and text within retrieval-augmented generation (RAG) style searches, significantly enhancing the way businesses can retrieve information.
The new embeddings, branded as Embed 3, emerged from Cohere’s continuous development in embedding models designed to convert a wide range of data into numerical representations. These embeddings are essential for enterprises, enabling them to create numerical maps of their documents. This process allows the model to retrieve relevant information in response to user prompts effectively.
In an exciting announcement on social media, Aidan Gomez, co-founder and CEO of Cohere, enthusiastically stated, "Your search can see now. We’re excited to release fully multimodal embeddings for folks to start building with!" He showcased a performance graph revealing a dramatic improvement in image search capabilities with Embed 3, emphasizing the model’s potential across various categories.
Cohere describes Embed 3 as "the most generally capable multimodal embedding model on the market," highlighting its ability to process both text and images seamlessly. The company asserts that this advancement allows organizations to harness valuable insights from their data stored in different formats, including complex reports and product catalogs, ultimately driving workforce productivity.
A significant advantage of this new system is its ability to broaden the scope of data accessible through RAG searches. Traditionally, many enterprises limited their searches to structured and unstructured text, ignoring a wealth of data in different formats, which could include charts, graphs, design files, and more. By accommodating these diverse file types, businesses can leverage their entire data repository more effectively.
Cohere’s Embed 3 incorporates a unified latent space in its encoding process, enabling a more integrated approach to searching compared to models that require separate databases for text and images. The company notes that many competitors tend to cluster textual and visual data into silos, resulting in search outcomes that predominantly favor text. In contrast, Embed 3 seeks to neutralize this bias by focusing on the contextual meaning behind the data, regardless of its modality.
With the ability to operate in over 100 languages, Embed 3 is currently accessible on Cohere’s platform as well as Amazon SageMaker, broadening its reach and usability for enterprises around the globe.
As businesses increasingly recognize the utility of multimodal searches—echoing the trends set by consumer platforms like Google and advanced chat interfaces like ChatGPT—Cohere is positioning itself as a key player in this rapidly evolving landscape. The demand for fast, accurate, and secure multimodal embedding models is on the rise, and Cohere is stepping up to meet that challenge.
The competition in the realm of multimodal embeddings is intensifying, with major players such as Google and OpenAI also investing in similar technologies. Cohere, founded by some of the pioneering researchers of the Transformer model, has revamped its APIs to ease the transition for customers moving from competitors to their models, aiming to bolster its presence in the enterprise segment.
As enterprises adapt to the digital age, solutions like Cohere’s Embed 3 stand to revolutionize how organizations interact with their data, contributing to smarter and faster decision-making processes.