Document Library
Reference architectures, white papers, and solutions briefs to help build and enhance your network infrastructure, at any level of deployment.
Engagement / Document Library / Optimizing and Running LLaMA2 on Intel® CPU
Optimizing and Running LLaMA2 on Intel® CPU
Last Updated: Mar 24, 2025
Large Language Models (LLMs) are deep learning algorithms that have gained significant attention in recent years due to their impressive performance in natural language processing (NLP) tasks. However, deploying LLM applications in production has a few challenges ranging from hardware-specific limitations, software toolkits to support LLMs, and software optimization on specific hardware platforms. In this whitepaper, we demonstrate how you can perform hardware platform-specific optimization to improve the inference speed of your LLaMA2 LLM model on the llama.









