StorPool released a performance test report presenting 13.8 million IOPS NVMe-class storage performance measured in VMs.
In this performance test StorPool measured 13.8M IOPS on a 12-node cluster built on Intel® Xeon® Scalable processor-based servers configured with Intel® DC SSDs. This is impressive storage performance reached by an HCI (hyper-converged infrastructure).
The test was conducted on a 12-server KVM hyper-converged system set up at Builders Construction Zone facility under the Intel® Data Center Builders program. The tests showed this 12-node hyper-converged environment delivering 13.8M random read IOPS at latency of 404 microseconds. Read/write latencies under low load are 137 and 95 microseconds. These are exceptional results.
Test results

IOPS and MB/s numbers in this table were rounded down to 3 significant digits for clarity of presentation backend IOPS and backend GB/s calculation estimates load on storage servers and underlying NVMe drives. Write operations are counted 3 times for 3x replication. Read operations are counted as 1 backend operation each. For example 20,800 MB/s writes by FIO translate to 63,400 MB/s “backend” writes on underlying drives.
In this 12-node StorPool-powered hyper-converged set-up StorPool ran 96 VMs, each executing a single FIO job. From just 48 Intel SSD P4510, StorPool delivers 143,000 IOPS per VM simultaneously to 96 VMs.
In the sequential tests, 64.6 GB/s is equal to 86% of the theoretical network throughput (75GB/s).
In the random read/write tests, 13.8 M IOPS * 4 KiB is equal to 75% of theoretical network throughput. The relatively small number of NVMe drives (48 Intel SSD DC P4510 total) used is the limiting factor for this benchmark.
There is a performance test by Microsoft on a similar 12-node hardware
set-up, titled “The new HCI industry record: 13.7 million IOPS with Windows Server 2019 and Intel® Optane™ DC persistent memory”. In the Microsoft S2D/Hyper-V/Optane test, because of the very small active set (3.0 TB) and caching, almost all storage operations are served from RAM or Intel Optane memory, not from the underlying NVMe storage drives. It is a different approach than what StorPool took as Chris M Evans explains. The StorPool system
is not only faster, with IO being processed by the actual NVMe
drives, but also uses less storage hardware.
The 95 microseconds write latency includes 3 network transfers and 3 NVMe drives committing the write. It is ten times faster than the “sub millisecond” latency advertised by traditional All-Flash arrays. For the first time one can get a shared storage system, as fast as locally attached drives (DAS).
Description of the environment
The setup consists of 12 servers with identical hardware configuration, provided by the Builders Construction Zone facility at the Intel Data Center Builders program. Each server is connected using dual 25G Ethernet to an Ethernet switch. In this setup there is a single switch for simplicity. Production-grade environments typically use two switches for redundancy.
Each of the 12 servers has the following hardware configuration:
- 56 cores / 112 threads (Two 2nd Gen Intel Xeon Scalable processors)
- 384 GB RAM (Twelve x 32 GB RDIMMs)
- 32 TB NVMe (Four Intel SSD DC P4510 8TB NVMe drives)
- Intel XXV710 dual-port 25G Ethernet Adapters
The software installed on each server is:
- Operating system – CentOS 7.6, Linux kernel 3.10.0-957
- Hypervisor – KVM, libvirt and qemu-kvm-ev from
CentOS Virt-SIG - Storage software (SDS) – StorPool v19
- Storage benchmarking software – FIO
The following diagram illustrates how the hardware and software components inter-operate with each other. For simplicity it shows just one of the 12 servers.

Each host runs 8 VMs, each VM has 4 vCPUs, 1 GiB RAM and a 500 GiB virtual disk on StorPool. VMs are used only for running the benchmark tools.
In this test scenario StorPool uses approximately 11% of the total CPU resources. The remaining 89% of the CPU cores are free for running VMs.
To process more than 1M IOPS per node StorPool uses 6 CPU cores (incl. StorPool server, StorPool client).
Memory usage:
- 25 GiB per node (6.5%) for StorPool (client and server combined) and
- 359 GiB per node (93.5%) remain available for the Host operating system and VMs
The memory usage for StorPool is small – less than 1GB RAM per 1 TB raw storage, and it’s mostly used for metadata.
Testing methodology
We use the FIO client-server functionality to perform the benchmark. A FIO server is installed in each of the 96 VMs. A FIO client is configured to connect to the 96 FIO servers. The FIO servers perform IO operations using the test profile specified by the FIO client. The FIO client collects and summarizes the benchmark results.

Quality controls for test results
- No data locality: Data for each VM was specifically placed on non-local drives to avoid any positive skew caused by the data being locally stored.
- No caching: There is no data caching anywhere in the stack. Each read or write operation is processed by the NVMe drives. The number of operations processed by the underlying drives was compared to the
number of operations counted by the benchmark tool. The numbers were
identical. - Active set size: The active set was sufficiently large to exclude effects from caching, even if it was used. The 48,000 GiB active set is equal to 44% of system capacity.
- No write buffering: All write operations were submitted by the test tools with a “sync” flag, instructing the storage system to persist them to power-loss protected devices (NVMe drives) before completing the write operations.
- Repeatability and steady state: Tests were run for sufficient duration, such that running them for longer does not influence the result.
- End-to-end measurement: IOPS, MB/s and latency are measured by the test tool (FIO) inside the virtual machines, so the presented results are end-to-end (including all overheads across the storage and virtualization stacks)
The FULL report with more tables and details on the test methodology can be found here: The IOPS challenge is over. StorPool holds the new world record – 13.8 mln IOPS
Why these numbers matter
Storage has been one of the main limiting factors in modern IT. There has always been a trade-off between speed and high availability. Here we prove that with a best-of-breed software-defined storage solution and standard hardware, any company can eliminate this trade-off altogether. Having a shared storage system (highly available) with the performance of local Intel SSD DC drives is now possible.
By using this technology any public and private cloud builder can deliver unmatched performance for their applications, VMs and containers. If you aim to build a powerful public or private cloud, this solution can permanently solve all your storage performance issues.
By using this technology any public and private cloud builder can deliver unmatched performance for their applications, VMs and containers. If you aim to build a powerful public or private cloud, this solution can permanently solve all your storage performance issues.
If you want to learn more or have a personal demo, do contact us at
info@storpool.com.