Apple Unveils Cutting-Edge AI Models Surpassing Mistral and Hugging Face

Wed Jul 24, 2024 - 4:37am GMT+0000

In a significant move, Apple has unveiled a new family of open-source AI models, named DCLM, on Hugging Face. This release includes two primary models: one with 7 billion parameters and another with 1.4 billion parameters, both showing strong performance on various benchmarks.

Details of Apple DCLM Models:

The DCLM-7B model, featuring 7 billion parameters and a 2K context window (extendable to 8K), was trained on 2.5 trillion tokens and achieved a 5-shot accuracy of 63.7% on the MMLU benchmark. It is available under Apple’s Sample Code License. The DCLM-1.4B model, with 1.4 billion parameters and a 2K context window, was trained on 2.6 trillion tokens and scored 41.9% on the same benchmark. This smaller model is released under the Apache 2.0 license, allowing for commercial use and modification.

Apple’s new DCLM models were developed as part of the DataComp for Language Models project, a collaborative effort involving researchers from Apple, University of Washington, Tel Aviv University, and Toyota Institute of Research. The project’s aim is to create high-quality datasets for training AI models, focusing on data curation strategies to enhance model performance.

The DCLM-7B model, trained on 2.5 trillion tokens, achieves a 5-shot accuracy of 63.7% on the MMLU benchmark, outperforming previous state-of-the-art open-data language models like MAP-Neo. Its performance is close to leading open models such as Mistral-7B-v0.3 and Llama3 8B. The model’s capabilities were further improved by extending the context window to 8K using additional training on the same dataset.

Similarly, the DCLM-1.4B model, trained jointly with Toyota Research Institute, also shows impressive results. It scored 41.9% on the MMLU 5-shot test, outperforming other models in its category, including Hugging Face’s SmolLM and Qwen-1.5B.

Both models have been released under different licenses, with the larger model available under Apple’s Sample Code License and the smaller one under Apache 2.0, allowing for commercial use and modification. Additionally, an instruction-tuned version of the 7B model is available in the Hugging Face library.