This article was originally published on cloudera.com.
Cloudera and Intel have a long history of innovation, driving big data analytics and machine learning into the enterprise with unparalleled performance and security. We are pleased to build upon that direction with our collaboration on Intel® Optane™ DC persistent memory. Available to customers running 2nd Generation Intel® Xeon® Scalable processors, Intel Optane DC persistent memory can significantly enhance the performance of real-time and streaming applications. Data-driven organizations – particularly those utilizing IoT – require low-latency, high-performance compute to process data at the edge in order to make faster, smarter decisions.
Intel Optane DC persistent memory is a new tier in the memory and storage hierarchy located between DRAM and Solid State Drives with latency closer to DRAM (see Figure 1). It uniquely combines high capacity, low latency, and non-volatility into a single package and is offered in 128GB, 256GB, and 512GB modules, expected to be at a lower cost-per-gigabyte and larger memory capacity compared to large DRAM DIMMs.
Intel Optane DC persistent memory was designed with ease of adoption in mind and therefore can be configured in two different operating modes: 1. memory mode, and 2. app direct mode. In memory mode, done with a simple BIOS change, the operating system simply sees the memory as a large pool of volatile memory similar to DRAM. When a data request is made the memory controller will first check for the data in DRAM, if not present, then it checks the Intel Optane DC persistent memory, with only slightly longer latencies. When in memory mode, the data is not saved in the event of a power loss. In app direct mode, the application can utilize DRAM for operations that require low latency without persistency and leave Intel Optane DC persistent memory to handle large data structures at memory bus speeds without volatility. It is possible to enable a portion of persistent memory for app direct mode while the other runs in memory mode.
Cloudera customers who want more flexibility in how and where they run their applications can benefit from Intel Optane DC persistent memory as well. A key characteristic of an enterprise data cloud is its ability to run multiple workloads on shared data without encountering “noisy neighbor” problems. This is achieved through an architecture that fundamentally separates compute from storage. Intel Optane DC persistent memory mitigates the I/O bottlenecks associated with these bifurcated environments.
Apache HBase® is one of many analytics applications that benefit from the capabilities of Intel Optane DC persistent memory. HBase is a distributed, scalable NoSQL database that enterprises use to power applications that need random, real time read/write access to semi-structured data. Enterprises use HBase for low-latency storage, scenarios that require real-time access to power custom applications, apply Machine Learning/Artificial Intelligence to real-world problems, and to support real-time analysis of data. It supports a wide variety of use cases from powering web & mobile applications to operationalizing IoT data. From a technical perspective, the data read from the Hadoop Distributed File System is cached in HBase’s BucketCache.
The BucketCache is a memory management implementation used to reduce latency for random reads (versus reading data directly from disk) as well as provide higher throughput. In a typical deployment, the BucketCache size in a node is restricted by the size of the DRAM available. Using app direct mode, Intel Optane DC persistent memory can be an alternate target for HBase BucketCache providing a much larger cache than possible in DRAM. 123Analysis shows that systems configured with equivalent capacities of Intel Optane DC persistent memory and DRAM result in only a 5% lower performance difference for persistent memory, while potentially providing a 21% cost savings as compared to DRAM. “Based on our initial testing, customer applications on HBase can see an efficiency boost by having bucket cache implementation take advantage of Intel Optane DC persistent memory” stated Amit Virmani, Engineering Manager, Cloudera
Cloudera is planning to support Intel Optane DC persistent memory as an alternate target for the HBase BucketCache in an upcoming release.
Intel and Cloudera are relentlessly working together to bring value and performance to our customers with innovative technologies that deliver on our enterprise data cloud vision. We are evaluating additional applications where Intel Optane DC persistent memory can accelerate our customer’s capabilities and drive new solutions.
For more information learn how to get data insights faster from bigger workloads with Intel® Optane™ DC persistent memory
¹Intel® Optane™ DC persistent memory is available on servers equipped with 2nd Generation Intel® Xeon® Gold processors and Intel® Xeon® Platinum processors
²System Configurations System A: DRAM 768GB/socket, 2nd Generation Intel® Xeon® Scalable Gold 6254, 18 core 2 socket; System B: Intel Optane DC persistent memory 768GB/socket + DRAM 96GB/socket,2nd Generation Intel® Xeon® Scalable Gold 6254, 18 core 2 socket. Testing also conducted on Hewlett Packard Enterprise servers and Google Cloud Platform™
³Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any difference in your system hardware, software, or configuration may affect your actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Cloudera and Intel engineers performed HBase testing. Cost reduction scenarios described are intended as examples of how a given Intel based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.