Document Library

Reference architectures, white papers, and solutions briefs to help build and enhance your network infrastructure, at any level of deployment.

Optimizing and Running LLaMA2 on Intel® CPU

Last Updated: Mar 24, 2025

Large Language Models (LLMs) are deep learning algorithms that have gained significant attention in recent years due to their impressive performance in natural language processing (NLP) tasks. However, deploying LLM applications in production has a few challenges ranging from hardware-specific limitations, software toolkits to support LLMs, and software optimization on specific hardware platforms. In this whitepaper, we demonstrate how you can perform hardware platform-specific optimization to improve the inference speed of your LLaMA2 LLM model on the llama.

Download PDF