A new research paper published by Apple has revealed how the company can get artificial intelligence (AI) to run locally on its iPhones, including older models.
According to Android Authority’s research paper, Apple details a solution for running large language models (LLMs) on devices with limited RAM. The article reveals how the company can store “model parameters” and put a portion of them into the device’s RAM when needed instead of loading the entire model into RAM.
The article claims that this method allows running models that require twice the RAM that an iPhone can have but still ensures inference speeds 4 – 5 times and 20 – 25 times faster than loading methods. simply go to the CPU and GPU respectively.
Implementing synthetic AI on devices with lots of RAM will be of great benefit as it provides faster read/write speeds. Fast speed is important for on-device AI, allowing for much faster inference times because users don’t necessarily have to wait tens of seconds (or more) to receive a response or final result. All of this means an on-device AI assistant is capable of running at conversational speed, generating images/text much faster, and summarizing articles faster But Apple’s solution means users It doesn’t necessarily take a lot of RAM to speed up your device’s ability to respond to AI tasks.
Apple’s approach could allow old and new iPhones to offer synthetic AI features right on their devices. That’s important because Apple’s iPhones typically offer less RAM than high-end Android phones. For example, the iPhone 11 series only offers 4 GB of RAM, while even the regular iPhone 15 only has 6 GB of RAM.
Apple isn’t the only mobile company trying to scale back the LLM. Recent flagship chips from Qualcomm and MediaTek both support INT4 precision to miniaturize these models. Either way, companies are trying to find new solutions to minimize system requirements for on-device AI, allowing even low-end phones to offer the feature.