I recently gave a talk at SNIA’s annual Storage Developer Conference (SDC), filling in for a colleague who was unable to travel to the event. The talk, “Big Data Analytics on Object Storage—Hadoop Over Ceph Object Storage with SSD Cache,” highlighted work Intel is doing in the open source community to enable end users to fully realize the benefits of non-volatile memory technologies employed in object storage platforms.
In this post, I will walk through some of the key themes from the SNIA talk, with one caveat: This discussion is inherently technical. It is meant for people who enjoy diving down into the details of storage architectures.
Let’s begin with the basics. Object storage systems, such as Amazon Web Service’s Simple Storage Service (S3) or the Hadoop Distributed File System (HDFS), consist of two components: an access model that adheres to a well-specified interface and a non-POSIX-compliant file system.
In the context of this latter component, storage tiering has emerged as a required capability. For the purpose of this discussion, I will refer to two tiers: hot and warm. The hot tier is used to service all end-user-initiated I/O requests while the warm tier is responsible for maximizing storage efficiency. This data center storage design is becoming increasingly popular.
Storage services in a data center are concerned with the placement of data on the increasingly diverse storage media options available to them. Such data placement is an economic decision based on the performance and capacity requirements of an organization’s workloads. On the $-per-unit-of-performance vector there is often significant advantage to placing data in DRAM or on a storage media such as 3D XPoint that has near-DRAM speed. On the dollar-per-unit-capacity vector there is great motivation to place infrequently accessed data on the least expensive media, typically rotational disk drives but increasingly 3D NAND is an option.
With the diversity of media: DRAM, 3D XPoint, NAND, and Rotational, the dynamic of placing frequently accessed, so-called “hot” data on higher performing media while moving less frequently accessed, “warm” data to less expensive media is increasingly important. How is data classified, in terms of its “access frequency?” How are data placement actions carried out based on this information? We’ll look more closely on the “how” in a subsequent post. The focus of this discussion is on data classification. Specifically, we look at data classification within the context of distributed block and file storage systems deployed in a data center.
Google applications such as F1, a distributed relational database system built to support their AdWords business and Megastore, a storage system developed to support online services such as Google Application and Compute Engine [2,3]. These applications are built on top of Spanner and BigTable respectively. In turn, Spanner and BigTable store their data, b-tree-like files and write-ahead log to Colossus and Google File Systems respectively . In the case of the Colossus File System (CFS), Google has written about using “Janus” to partition CFS's flash storage tier for workloads that benefit from the use of the higher performing media . The focus of this work is on characterizing workloads to differentiate these based on a “cacheability” metric that measures cache hit rates. More recently, companies such as Cloud Physics and Coho Data have published papers along similar lines. The focus of this work is on characterizations that efficiently produce a [cache] "miss ratio curve (MRC)” [6-8] Like Google, the goal is to keep “hot” data in higher-performing media while moving less frequently accessed data to a lower cost media.
What feature of the data-center-wide storage architecture enables such characterization? In both Google’s and Coho’s approaches, the distinction between servicing incoming end-user I/O requests for storage services and accessing backend storage is fundamental. In Google’s case, applications such as F1 and Megastore indirectly layer on top of the distributed storage platform. However, the Janus abstraction is transparently interposed with such applications and the file system. Similarly, Coho presents a storage service, such as NFS mount points or HDFS mount points, to end-user applications via a network virtualization layer. [9,10] This approach allows for the processing pipeline to be inserted between the incoming end-user application I/O requests and Coho’s distributed storage backend.
One can imagine incoming I/O operation requests—such as create, delete, open, close, read, write, snapshot, record, and append—encoded in well-specified form. Distinguishing between, or classifying, incoming operations with regard to workload, operation type, etc. becomes analogous to logging in to an HTTP/web farm . And like such web farms, read/access requests are readily directed toward caching facilities while write/mutation operations can be appended to a log.
In other words, from the end user’s perspective storage is just another data-center-resident distributed service, one of many running over shared infrastructure.
And what about the requisite data management features of the storage platform? While the end user interacts with a storage service, the backend storage platform is no longer burdened with this type processing. It is now free to focus on maximizing storage efficiency and providing stewardship over the life of an organization’s ever-growing stream of data.
- Zhou et al, “Big Data Analytics on Object Storage - Hadoop Over Ceph Object Storage with SSD Cache,” 2015.
- Shute et al, "F1: A Distributed SQL Database That Scales,” 2013 https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf
- Baker et al, "Megastore: Providing Scalable, Highly Available Storage for Interactive Services,” 2011 https://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
- Corbett et al, "Spanner: Google’s Globally-Distributed Database,” 2012 (see section 2.1 Spanserver Software Stack for a discussion on how Spanner uses Colossus) https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
- Albrecht et al, “Janus: Optimal Flash Provisioning for Cloud Storage Workloads,” 2013.
- Waldspurger et al, "Efficient MRC Construction with SHARDS,” 2015 https://cdn1.virtualirfan.com/wp-content/uploads/2013/12/shards-cloudphysics-fast15.pdf
- Warfield, "Some experiences building a medium sized storage system,” 2015 (see specifically slide 21 "Efficient workload characterization”).
- Wires et al, “Characterizing Storage Workloads with Counter Stacks,” 2014.
- Cully et al, “Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory,” 2014.
- Warfield et al, “Converging Enterprise Storage and Business Intelligence: A Reference Architecture for Data-Centric Infrastructure,” 2015.
- Chen et al, "Client-aware Cloud Storage,” 2014 https://www.csc.lsu.edu/~fchen/publications/papers/msst14-cacs.pdf