With analysts reporting that up to one billion devices will be HEVC-enabled in 2017, the industry will quickly adopt high efficiency video coding (HEVC) as the preferred next-generation codec. To meet the market demand for higher-density HEVC encoding solutions, Beamr has launched a highly scalable HEVC software encoding solution for the Intel® Xeon® Platinum 8180 processor.
Clearly, encoding speed plays a critical role in channel density. For this reason, Intel and Beamr optimized the Beamr 5 HEVC software encoder in order to utilize the advanced features of the Intel Xeon Platinum processor, including more cores, memory and I/O bandwidth as well as support for faster memory speed. We are excited to demonstrate Beamr 5 on the new Intel® Xeon® Scalable platform in Intel’s booth #B65 in Hall 5 at the upcoming IBC show in Amsterdam. We will also be showcasing Beamr 5 solutions optimized for edge datacenter deployments and leveraging the on-chip graphics capability of the power efficient Intel® Xeon® processor E3-1585L v5 with integrated Intel® Iris® Pro graphics P580.
Today, cloud architectures can make unlimited numbers of CPUs available for video encoding—but the costs for such use escalates very quickly. And given the exploding growth of online video, it is increasingly important to prioritize density when evaluating software video encoders(1).
How Beamr 5 Achieves Its Speed
Beamr 5 is a highly optimized HEVC software encoder in commercial use by one of the world’s largest Over the Top (OTT) streaming services for the past three years. There are many technical innovations that contribute to the performance boost of Beamr 5, but the two that we wish to highlight are the patented motion estimation process and the multi-threading functionality of the Beamr 5 encoder.
For a video encoder, the motion estimation function is one of the most critical as it directly relates to the viewer’s perception of video quality. You may recall the trick of early HEVC encoding demonstrations where the demonstration video was slow moving and the camera was fixed. Such a configuration minimizes the motion estimation requirements of the encoder to provide a satisfactory result. But if you provide that same encoder with a fast-moving scene where most of the pixels are in motion, the quality degrades quickly.
Beamr 5 analyzes incoming frames and determines the complexity of the scene, calculates the rough motion vectors, and estimates the bit demand of the encoded frames. The estimates guide the second stage of the encoder and allow those activities to focus on the meaningful aspects of each frame to refine the previous estimates. By partitioning the encoding process, unproductive calculations can be avoided.
For as long as Intel has manufactured multi-core server processors, software encoders have relied on parallelized, multi-threaded operation to increase their throughput. Parallelization by slices was introduced by the earlier H.264 standard. The design of the HEVC standard was even more heavily influenced by the need to parallelize encoding activities because it requires many more operations. Traditionally, slices and tiles are two common means to break the larger encoding process into smaller tasks. However, the gain in throughput from this approach also typically reduces the encoding quality, because the quality depends on the size of the search space. Also, the different levels of luminosity of adjacent, independently encoded areas (and the visible “seams” between them) can lead to objectionable visual effects.
Beamr 5 builds upon a long heritage of real-time software encoders and relies on micro-level parallelization that chains portions of the encoding tasks in a controlled manner. By staging execution in Beamr 5, each micro task starts when its data is ready, still in cache, so as not to waste power and time writing out and fetching data from memory. Careful design of the micro tasks assures that they execute efficiently and evenly across the whole frame. This ensures that all cores are kept uniformly busy, and none are left waiting too long for their next task.
The Beamr encoder does not rely on the OS to manage the execution threads. It controls their execution based on availability of pipelined data. These techniques allow the encoder to deliver outstanding quality while utilizing all the available cores efficiently. To achieve 6 channels on the dual-socket Intel Xeon Platinum 8180 processor, the Beamr 5 encoder loaded all 108 (from 112 available) threads at an impressive than 90 percent or higher utilization.
With appropriate settings for the Intel Xeon Platinum 8180 processor, Beamr 5 achieved the following reproducible performance results:
[Note that all results are for video encoding only from raw YUV video that resides in memory. For full transcoding from HEVC we estimate a 30 percent additional overhead.]
- Three (3) HDR 10-bit 4Kp60 channels on a single socket Intel Platinum 8180 with broadcast quality.
- Six (6) HDR 10-bit 4Kp60 channels on a dual-socket Intel Platinum 8180 with broadcast quality.
- One (1) HDR 4Kp60 channel on a dual-socket Intel Platinum 8180 with studio quality.
- Eleven (11) individual encoding profiles created by a single channel on a dual-socket Intel Platinum 8180 processor with broadcast quality
- 3840×2160 60 FPS @ 25 Mbps
- 3840×2160 60 FPS @ 15 Mbps
- 1920×1080 60 FPS @ 8 Mbps
- 1280x720p 60 FPS @ 4 Mbps
- 1920×1080 30 FPS @ 4 Mbps
- 1280x720p 30 FPS @ 2 Mbps
- 960×540 30 FPS @ 1.5 Mbps
- 768×432 30 FPS @ 1.1 Mbps
- 640×360 30 FPS @ 730 kbps
- 480×270 30 FPS @ 365 kbps
- 416×234 30 FPS @ 145 kbps
Why Speed Matters
We encourage you to think about the impact of reaching more users with lower operational cost and better device compatibility. Faster video encoding with Beamr 5 on Intel® Xeon® Scalable processors not only reduces the number of servers required, it also significantly reduces the power consumption per channel, and increases your flexibility to respond to customer demand for HDR, OTT and linear TV delivery, lowering your operating costs.
You will no longer be limited to garden-variety encoders intended for all devices. The performance increases of Beamr 5 on Intel Xeon Scalable platforms enables you to create higher-quality profiles that are perfectly matched to your viewer’s device or delivery experience. This improvement in UX and quality translates into higher engagement, longer viewing times, and an overall increase in viewer satisfaction – all elements that directly contribute to growing your video service subscribers and revenue.
For more information on Beamr and Beamr 5, please visit our website at: http://beamr.com
(1) Data reflects performance measurements by Beamr in June 2017. Configuration: 2 socket Intel® Xeon® processor Platinum 8180, 2.5GHz, 28 cores, turbo off / HT on, BIOS “Intel Corporation SE5C620.86B.01.00.0412.020920172159”, SMBIOS v2.8, 192 GB total memory, 12 slots / 16 GB / 2666 MT/s / DDR4 DIMM, 800GB INTEL SSDSC2BA800G4 , Ubuntu 16.04.2 LTS kernel 4.4.0-78-generic