Apple is reportedly exploring a new generation of artificial intelligence (AI) models using an Ajax-based language. Following this, the company has launched its own AI called ReALM (Reference Resolution As Language Modeling). This announcement comes after the news of Apple’s potential partnership with Google to integrate Gemini into iOS 18 and discussions with OpenAI – the company that created ChatGPT. Apple has also acquired a series of startups related to the AI industry.
According to ReALM developers, this AI model will outperform ChatGPT in certain operations, making it faster and more efficient. The AI will be showcased at the WWDC 2024 developers conference in June, where it will demonstrate the ability to understand different referential contexts of visual elements on-screen. It will also convert images into text versions, creating more natural and conversational interactions with voice assistants.
The ReALM research team has confirmed that AI is faster and more efficient than GPT-4 in contextual data assimilation. They have also shared some basic principles in developing their language model, stating that understanding context, including references, is crucial for a conversational assistant. This enables users to ask questions about what they see on the screen, making it a key step in ensuring a hands-free experience with voice assistants.
It is a well-known fact that computers cannot interpret images the same way humans do. However, Apple has found a solution with ReALM, which can reconstruct the entire context of the screen into text. This model promises similar performance to GPT-4 using fewer parameters but is more efficient due to increasing its parameters.
According to the development team, ReALM demonstrated major improvements over existing systems with similar functionality across a range of reference types, with the small model achieving a 5% improvement in document count referred to the screen. Larger models perform significantly better than GPT-4. However, ReALM has limitations when managing more complex visual references when it comes to differentiating between multiple images.
In conclusion, ReALM looks very promising and could make Siri and other components of Apple’s operating system bring great benefits to users. Although Apple still has a lot of work ahead, the initial results are impressive.