Search Results for “x265” – Streaming Learning Center

Streaming Summer Bootcamp Session 2: Producing H.264

Jan Ozer — Mon, 01 Jul 2024 16:51:11 +0000

Announcing session 2 of Streaming Summer Bootcamp: Learn to Produce H.264 Video.
Thu, Jul 11, 2024, 11:00 AM – 12:30 PM.

The session is to be produced on LinkedIn Live: Click to register.

The free webinar will cover:

• What H.264 is and its origins
• What x264 is and why it’s important
• How x264 compares to other H.264 codecs
• Choosing a preset for x264: VOD edition
• Preset customization options (B-frames/reference frames)
• Choosing the optimal CPU on AWS for producing x264 with FFmpeg (AMD, Graviton, Intel)

This is the second of a series of free webinars this summer covering streaming fundamentals, encoding with H.264, HEVC, AV1, VVC, LCEVC, and encoding and transcoding for live streaming. The classes should benefit newbies and experienced compressionists alike.

You can access the first session on Streaming Fundamentals here: watch

Click here to download the handout: Streaming Summer Bootcamp Session 1: Introduction to Video Compression and Video Quality Metrics (24 downloads )

If you’re new to streaming video, you should check these out before attending this session on Producing H.264.

By way of background, I’m updating my book, Video Encoding by the Numbers. When I wrote the first edition, I backed all decisions with quality metrics like PSNR or VQM, which are “the numbers” referred to in the title. In the upcoming update, I will re-test H.264, HEVC, VP9, AV1, LCEVC, and VVC and compute quality with VMAF.

While the testing required for each updated chapter will generate multiple pages of spreadsheets and analysis, only a few tables from that data will be included in the book. During the webinars, I’ll share and explain all that data, make it available for download, and provide other information relevant to the analysis.

Who Will Benefit?

Novices: Will learn how to encode using various codecs and gain a solid foundation in video compression and practical encoding techniques.
Experienced Compressionists: Will receive detailed data that will only be summarized in the book. This will allow you to confirm or reassess your current or planned encoding strings and ensure you use the most efficient and effective methods.

I’m calling the series the Streaming Summer Bootcamp. All sessions will be free and available on LinkedIn.

Here’s the preliminary schedule for the other five sessions.

Session 3: Encoding with HEVC
Thursday, July 15, 2024, 11:00 AM EST

What HEVC is and its origins
The royalty picture
The playback compatibility picture
What x265 is and why it’s important
- How x265 compares to other HEVC codecs
Choosing a preset for x265
- VOD
- Live
HEVC profiles and levels
About HDR
Additional details to be be provided later

Session 4: Encoding with AV1
Thursday, August 15, 2024, 11:00 AM EST

What AV1 is and its origins
The royalty picture
The playback compatibility picture
What SVT-AV1 and why it’s important
SVT-AV1 vs LibAOM
How SVT-AV1 compares to other AV1 codecs
Choosing a preset for x265
- VOD
- Live
AV1 and HDR
Additional details to be be provided later

Session 5: Encoding with VVC
Thursday, August 29, 2024, 11:00 AM EST

What VVC is and its origins
The royalty picture
The playback compatibility picture
What VVenC is and why it’s important
How VVenC compares to other VVC codecs
Additional details to be be provided later

Session 6: Encoding with LCEVC
Thursday, September 5, 2024, 11:00 AM EST

What LCEVC is and its origins
How LCEVC works
The royalty picture
The playback compatibility picture
Encoding LCEVC with V-Nova tools
Encoding LCEVC with FFmpeg
Additional details to be be provided later

Session 7: All about Live
Thursday, September 26, 2024, 11:00 AM EST

Live encoding overview
Live encoding for origination
- hardware and software options
Live transcoding for distribution
- Hardware and software options (CPU/GPU/FPGA/ASIC)
- Choosing the optimal preset
Working with low-latency

Again, all dates are subject to change.

Simplify Your Workflow: Command-Line Variables in FFmpeg Batch Files

Jan Ozer — Fri, 28 Jun 2024 02:32:47 +0000

Creating batch files with variables is one of the more efficient ways to run FFmpeg. However, most producers build their batch files with variables inserted directly into the batch file, which means that you have to customize each batch file for different source files or encoding parameters. This is particularly inefficient when you’re encoding multiple files to different encoding parameters; in essence, you have to create a separate batch file for each file.

However, as you’ll learn in this blog post, you can build your batch files to accept variables supplied by the external command line. This allows a single batch file to customize encodes for any number of files by creating separate command lines in a master batch file.

You see this in the three lines below. These execute the script contained in encode.bat but encode a different source file with different encoding parameters to three different codecs.

encode.bat animals.mp4 libx264 60 6000k animals_264.mp4 encode.bat Elektra.mp4 libx265 50 3600k Elektra_265.mp4 encode.bat freedom.mp4 libvpx-vp9 4800k freedom_VP9.mp4

By passing variables through the command line, you can create flexible and reusable scripts that streamline your workflow. This post shows you how and provides a practical example you can start using today.

Why Use Command-Line Variables in Batch Files?

As background, I’m updating my book, Video Encoding by the Numbers, which uses the results from dozens of encoding comparisons to recommend different encoding configurations. When I wrote the book back in 2016, I was just learning FFmpeg and didn’t use scripts with variables. When I tested x264 presets, for example, I had to create a batch file with 80 lines; eight clips times ten presets from ultrafast to placebo.

I encoded each file to unique encoding parameters, so I often had to change the file name, GOP size, and target bitrate, maxrate, and VBV buffer, as well as the file name. While I could and did copy and paste from file to file, this was time-consuming and error-prone.

Today, I create one batch file and customize encoding parameters for each file in the command string. So, it’s one batch file and an eight-file command string comprised solely of the parameters that I need to adjust. Faster, and much less error-prone.

In this fashion, using command-line variables in batch files allows you to customize the behavior of your script each time you run it. This technique is particularly useful for:

Bulk Processing: When you need to encode multiple files with the same or slightly different settings.
Consistency: Ensuring the same encoding settings are applied across different files, reducing the chance of human error.
Efficiency: Automating repetitive tasks so you can focus on more creative aspects of video production.

Example: Encoding with Different Parameters

Let’s walk through a practical example. Suppose you need to encode multiple files using different codecs, GOP sizes, bitrates, and output file names. Let’s start with the batch file.

Create the Batch File

Save the following script as encode.bat. This batch file accepts variables for the input file, codec, GOP size, target bitrate, and output file name. It encodes the video using FFmpeg with the specified parameters.

Here’s the batch file which I’ll save as encode.bat.

set input_file=%1 set codec=%2 set gop_size=%3 set target_bitrate=%4 set output_file=%5 ffmpeg -y -i "%input_file%" -c:v %codec% -b:v %target_bitrate% -g %gop_size% "%output_file%"

How It Works

In our batch file, we deploy variables in the command string as normal. But rather than inserting the values in the batch file itself, we insert placeholders like %1, %2, etc. that we will supply in the command line argument. Here’s how this works:

When you run the batch file with this command

animals.mp4 libx264 60 6000k animals_264.mp4

set input_file=%1This assigns the value of the first argument in the command string to the variable input_file. So, %1 is replaced by animals.mp4.
set codec=%2This command assigns the value of the second argument in the command string to the variable codec. So, %2 is replaced by libx264.
set gop_size=%3This assigns the value of the third argument in the command string to the variable gop_size. So, %3 is replaced by 60.
set target_bitrate=%4This assigns the value of the fourth argument in the command string to the variable target_bitrate. Here, %4 is replaced by 6000k.
set output_file=%5This assigns the value of the fifth argument in the command string to the variable output_file. In this case, %5 is replaced by animals_h264.mp4.

By using these placeholders, you can reuse the same script to encode different files with different settings. So, you don’t have to create multiple batch files for different encoding tasks, streamlining your workflow and ensuring consistency.

Running the Batch File

To use the batch file, open a command prompt and run the script with the appropriate parameters. Here’s our example from above:

encode.bat animals.mp4 libx264 60 6000k animals_264.mp4

As noted, this command will encode animals.mp4 using the libx264 codec with a GOP size of 60 and a target bitrate of 6000k. The output file will be saved as animals_264.mp4.

Batch Processing Multiple Files

To encode multiple files with different settings, you can create another batch file that calls encode.bat with the required parameters for each file. You have to use the call function because you’re running a separate batch command, not consecutive command lines in a single file.

call encode.bat animals.mp4 libx264 60 6000k animals_264.mp4 call encode.bat Elektra.mp4 libx265 50 3600k Elektra_265.mp4 call encode.bat freedom.mp4 libvpx-vp9 60 4800k freedom_VP9.webm

I’ve saved this in a file named call.bat. This script will run each encoding task consecutively, ensuring that each file is processed with its specific parameters.

This is obviously a simple example, but as far as I know, any file you create with variables you should be able to run via the command string. The batch file that inspired this article performs the preset encoding described above using 2-pass encoding. I input one command line per file, and the script generates ten output files, from ultrafast to placebo.

Download the Script and Command Strings

To help you get started quickly, I’ve zipped encode.bat and call.bat into a zip file called encode_call.zip you can download here. encode_call-2.zip (245 downloads )

These are for Windows, but the technique should work by any OS supported by FFmpeg with minor changes to the two files. By integrating this batch file into your workflow, you can significantly streamline your video encoding process, ensuring consistency, and saving valuable time.

Streaming Learning Center to Host Streaming Summer Bootcamp Series

Jan Ozer — Thu, 06 Jun 2024 16:23:36 +0000

Jan Ozer here. I’m hosting a series of free webinars this summer covering streaming fundamentals, encoding with H.264, HEVC, AV1, VVC, LCEVC, and encoding and transcoding for live streaming. The classes should benefit newbies and experienced compressionists alike.

Who Will Benefit?

Novices: Will learn how to encode using various codecs and gain a solid foundation in video compression and practical encoding techniques.
Experienced Compressionists: Will receive detailed data that will only be summarized in the book. This will allow you to confirm or reassess your current or planned encoding strings and ensure you use the most efficient and effective methods.

I’m calling the series the Streaming Summer Bootcamp. All sessions will be free and available on LinkedIn.

Here’s the schedule:

Session 1: Introduction to Encoding and Video Quality Metrics (Done and Available for On-Demand Viewing below)
Thursday, June 20, 2024, 11:00 AM EST

This session will teach newbies what they need to know to understand later sessions. This includes:

Compression and codecs
Encoding and packaging
Basic file parameters (Resolution/bitrate/color depth/GOP/Frame types)
What video quality metrics are, and how we use them

Click here to watch the video: watch

Click here to download the handout: Streaming Summer Bootcamp Session 1: Introduction to Video Compression and Video Quality Metrics (266 downloads )

Session 2: Encoding with H.264 (now open for registration)
Thursday, July 11, 2024, 11:00 AM EST

What H.264 is and its origins
What x264 is and why it’s important
- How x264 compares to other H.264 codecs
Choosing a preset for x264
- VOD
- Live
Preset adjustments to consider
- B-frames
- Reference frames
Encoding on AWS
- Best CPU (AMD, Graviton, Intel
- Best encoding strategy for VOD

Session 3: Encoding with HEVC
Thursday, July 15, 2024, 11:00 AM EST

What HEVC is and its origins
The royalty picture
The playback compatibility picture
What x265 is and why it’s important
- How x265 compares to other HEVC codecs
Choosing a preset for x265
- VOD
- Live
HEVC profiles and levels
About HDR
Additional details to be be provided later

Session 4: Encoding with AV1
Thursday, August 15, 2024, 11:00 AM EST

What AV1 is and its origins
The royalty picture
The playback compatibility picture
What SVT-AV1 and why it’s important
SVT-AV1 vs LibAOM
How SVT-AV1 compares to other AV1 codecs
Choosing a preset for x265
- VOD
- Live
AV1 and HDR
Additional details to be be provided later

Session 5: Encoding with VVC
Thursday, August 29, 2024, 11:00 AM EST

What VVC is and its origins
The royalty picture
The playback compatibility picture
What VVenC is and why it’s important
How VVenC compares to other VVC codecs
Additional details to be be provided later

Session 6: Encoding with LCEVC
Thursday, September 5, 2024, 11:00 AM EST

What LCEVC is and its origins
How LCEVC works
The royalty picture
The playback compatibility picture
Encoding LCEVC with V-Nova tools
Encoding LCEVC with FFmpeg
Additional details to be be provided later

Session 7: All about Live
Thursday, September 26, 2024, 11:00 AM EST

Live encoding overview
Live encoding for origination
- hardware and software options
Live transcoding for distribution
- Hardware and software options (CPU/GPU/FPGA/ASIC)
- Choosing the optimal preset
Working with low-latency

Again, all dates are subject to change.

The Quality Cost of Low-Latency Transcoding

Jan Ozer — Tue, 12 Mar 2024 20:18:13 +0000

While low-latency transcoding sounds desirable, low-latency transcode settings can reduce quality and may not noticeably impact latency.

Reducing latency has been a major focus for many live producers, and appropriately so, particularly for events that viewers can watch via other media, like sporting events available through satellite or cable TV. However, it’s important to understand that transcoding latency contributes minimally to overall latency in ABR applications and that low-latency transcode settings reduce video quality. Unless you’re running ultra-low latency applications like gambling, auctions, or conferencing over technologies like WebRTC or HESP, you should strongly consider not using the lowest possible latency settings.

The image above shows the components of overall glass-to-glass latency for a live event delivered via adaptive bitrate technologies. By far, the largest component is the ABR packaging. WebRTC and similar technologies don’t use this form of packaging, which is how they deliver sub-1-second latency.

If you’re distributing live events via a low-latency ABR technology like LL HLS, LL DASH, or LL CMAF, you’re probably in the 5-8 second latency range. The highest transcoding-only latency times I’ve seen is around 500 ms to 750 ms, and the lowest is around 50 ms. So, if you’re in the 5-8 second range, transcoding with ultra-low latency settings doesn’t reduce latency significantly but can cost you quality-wise, particularly with x264. I also measured with x265 and found the quality of zero latency and normal latency output roughly equivalent, though low throughput makes x265 transcoding very expensive.

The Quality Cost of Low-Latency Transcoding – x264

To test the quality of low and normal latency videos, I encoded four files with FFmpeg using the following command string.

ffmpeg -i soccer.mp4 -c:v libx264 -b:v 5000k -minrate 5000k -maxrate 5000k -bufsize 10000k -preset medium -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4

I removed -tune zerolatency and encoded again, adjusting the bitrates until file sizes were within 1%. You can see the results for Harmonic Mean and low-frame (the score of the lowest quality frame in the file, an indicator of the potential for transient quality issues).

Table 1. VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x264 codec.

For harmonic mean VMAF, zerolatency costs about 2.33 VMAF points on the top-quality stream in your encoding ladder. You can look at this in two different ways. First, is that most viewers can’t discern a 3 VMAF point differential, so don’t worry, be happy. The glass-half-empty view is that you’d have to boost the bitrate of the zero latency stream by between 500 kbps to 1 Mbps to achieve the same quality as a stream encoded using the normal latency settings.

Let’s visualize the difference using the Riverplate soccer clip, which showed the greatest Harmonic Mean and low-frame delta. Figure 1 shows the Results Plot from the Moscow State University Video Quality Measurement Tool with the zero latency file in red and normal latency in green. To be fair, most of the really low zones in red were crowd shots that few viewers would notice. Still, better quality is always better, and the frequent red drops in quality are meaningful.

Figure 1. Results Plot comparing the VMAF frame scores with tune – zerolatency in red and without in green.

A quick comparison of the switches used for zero latency (on the right in Table 2) and normal latency settings when using the Medium preset revealed a host of differences that could impact quality. For example, B-frames drop from 3 to 0 while reference frames drop from 3 to 1. Certainly, reducing lookahead from 40 to 0 would impact the encoder’s ability to detect screen changes; hence the reduced low-frame scores, particularly in clips with lots of scene changes like the Riverplate clip.

Table 2. Switches impacted by tune – zerolatency compared to the medium x264 preset.

I’m not going to fully explore the difference between threads and sliced threads here but may do so down the road. Very briefly, using multiple threads during encoding increases latency because each frame is encoded by a single thread; the more threads, the greater the latency.

In contrast, slices divide each frame into slices, which are handled by separate threads. This may reduce quality slightly, but it improves throughput, which may allow you to use a higher-quality preset. That’s why sliced threads are enabled for zero-latency and not for normal (see here for a full explanation).

The Latency Cost of x265 – Not So Bad

I ran the same tests using the x265 codec and the command string below, again with and without the -tune zerolatency option. I used the superfast preset as compared to medium to achieve faster than 30 fps on my test workstation.

ffmpeg -y -i soccer.mp4 -c:v libx265 -b:v 3580k -minrate 3580k -maxrate 3580k -bufsize 7160k -preset superfast -tune zerolatency -force_key_frames "expr:gte(t,n_forced*2)" -an soccer_zerolatency.mp4

As you can see in Table 2, the results were much closer. If you’re transcoding with x265 using a high-speed preset, you may not experience the same quality penalty as there was with x264. In fact, low-frame quality is actually a bit higher.

Table 3. VMAF harmonic mean and low-frame quality with and without -tune zerolatency using the x265 codec.

Table 4 shows why the quality delta may not be that significant, as the values for the Superfast preset aren’t that different from the Zero Latency values. Beyond those shown, though the Zero Latency tune doesn’t control reference frames, x265 uses only a single reference frame for the Superfast preset, which will carry through to the Zero Latency value. The bottom line is that the superfast encoding switches are so constrained that tuning for zerolatency doesn’t further degrade output quality.

Table 4. Switches impacted by tune – zerolatency compared to the superfast x265 preset.

Of course, if you encode using a higher-quality preset, it likely won’t improve quality significantly anyway since the zerolatency tune would likely eliminate many of the high-quality configurations. Since you’d probably have to deploy multiple threads to support a higher-quality preset, you’d also be boosting latency. Any way you look at it—quality, throughput, or latency—encoding with x265 in software appears suboptimal.

The Bottom Line

The bottom line is to recognize that deploying a low-latency transcoding setting may impact video quality, particularly if you’re encoding with x264. When the target latency is sub 1 second, say for conferencing, auctions, gambling, and other interactive applications, you really have no option. However, when encoding for distribution via any low latency ABR application, you may want to consider opting for higher quality as opposed to lower latency.

Five Codec-Related Techniques to Cut Bandwidth Costs

Jan Ozer — Thu, 01 Feb 2024 00:42:35 +0000

The mandate for streaming producers hasn’t changed since we delivered RealVideo streams targeted at 28.8 modems; that is, we must produce the absolute best quality video at the lowest possible bandwidth. With cost control top of mind for many streaming producers, let’s explore five codec-related options to cut bandwidth costs while maintaining quality.

For each, I’ll consider the factors summarized in the table above and below: cost, potential bitrate savings, the addressable targets, the impact on storage and caching, the ease of implementation, and technology risk. Some comments on these factors:

The rating system is necessarily subjective, and reasonable minds can differ. If you strongly disagree with a rating, send me a note at janozer@gmail.com or leave a comment on the article or one of the social media posts related to it.

Assessing the potential bitrate savings is challenging. If you’re distributing all streams to the top of your encoding ladder, changing from H.264 to AV1 should drop your bitrate costs by 50%. If you’re primarily delivering to the middle of the ladder, your bandwidth savings will be minimal, though QoE should improve. I’m assuming the former in these estimates.

Cache impact refers to how the technique impacts your cache. Adding a new codec reduces the effectiveness of your cache because you’ll be splitting it over two sets of files. Most of the others reduce the bitrate of your video so you can fit more streams in the same cache, which has a positive impact.

Tech ease measures the difficulty of implementing the technique, including necessary testing to ensure reliability. Technology risk assesses the likelihood that you’ll break something despite your testing.

On cost, I hate to be the bearer of bad news, but the Avanci Video pool covers “the latest video technologies, AV1, H.265 (HEVC), H.266 (VVC), MPEG-DASH, and VP9.” This means H.264, which comes off patent starting in 2023, is safe, but later codecs aren’t. I have no idea what the costs are or even if this pool will ultimately succeed. At this point, however, if you’re considering supplementing your H.264 encodes with a newer codec, you have to consider the potential for content royalties.

Table 1. Rating the techniques that cut bandwidth costs.

Option 1: Deploying a New Codec

Opting for a codec like HEVC or AV1 can dramatically reduce file sizes, minimizing storage and bandwidth expenses. The chart below is from a presentation that I gave at Streaming Media East in May 2023 (download the handout here). Borrowing a technique started by the great folks at Moscow State University, I normalized quality on H.265 at 100%.

Some notes to reduce blood pressure among the readers: the AV1 and H.265 results are open-source versions of the codecs, and there are optimized proprietary versions that deliver better quality. LCEVC performance will depend on the base layer that you choose, while VVC was the Fraunhofer version, which is capable, but higher quality versions also exist (see the MSU reports here). There are also other quality observations in the presentation download.

All that said, if you’re still encoding with H.264 (x264), you can shave about 33% for files encoded with HEVC (x265) and 58% with AV1 (Libaom). Again, the benefit is mostly in the top rung; most lower rungs will deliver improved QoE but not bandwidth savings.

Figure 1. Relative efficiency of modern codecs.

Looking at the table, the cost is the highest of all alternatives. That’s because:

You’ll have to learn how to encode with a completely new codec and integrate that into your encoding infrastructure.
All encoding costs will be additive to H.264, since not all target players are compatible with these new codecs.
Player and related testing will be extensive and expensive.

Moving through the table, adding a codec adds to your storage expenses because you will still be producing and storing H.264. The new codec will have a negative impact on your CDN caching because you’ll have to share H.264 and HEVC files. This means either a higher caching cost if you add to your cache or higher bandwidth costs and potentially lower QoE because less data is cached at the edge.

As discussed, this alternative involves a lot of work; hence, the three brains and moderate technology risk that you obviously can mitigate with rigorous testing. Don’t get me wrong; hundreds of companies now produce in HEVC and AV1 and made it through unscathed; just don’t minimize the required effort. There’s also a difference between implementing a new codec yourself and flipping a switch in your Brightcove or JWPlayer console to add a new codec, with the end-to-end transcoding to playback already tested by earlier pioneers.

All that said, I’ll share these observations:

By far the most new codec adoption is to address new markets, primarily high dynamic range in the living room. This made HEVC table stakes for premium content producers.
Most other advanced codec adoption is by companies at the tippy top of the pyramid, like YouTube, Meta, Netflix (VP9/AV1), and Tencent (VVC), who have the scale to really leverage the bandwidth savings and the in-house expertise to minimize the risks and costs.

Most smaller companies, not distributing premium content, still stream H.264, either predominantly or exclusively. We love to talk about new codecs, but we seem to loathe adding them to our encoding mix to achieve bandwidth savings.

2. Per-Title Encoding: Customizing Your Bitrate Ladder

Per-title encoding, also called content-adaptive encoding (and many other things), analyzes individual video content to generate custom bitrate ladders for each. You can get a good overview of the technology in this StreamingMedia article: The Past, Present, and Future of Per-Title Encoding.

Unless you’re Netflix-sized, you’re probably best off either licensing per-title technology from a third party or using a per-title feature supplied by your cloud encoding vendor (we’ll explore a DIY option next).

How much bandwidth will a per-title technology save? Like you, I’ve experimented and tested various ways to integrate AI into my writing, with mixed results. I do trust AI services to have a better grasp of overall available data than I do, so I present the following responses to the question, “What’s the approximate bandwidth savings that streaming publishers can expect by switching to per-title (or content adaptive) encoding.

You see the results in Table 2. Take them with however much salt you’re giving AI today; I’m sure they incorporated far more data than any of my guesses could.

Table 2. Bandwidth savings from adopting per-title transcoding.

Otherwise, Bitmovin claims up to 72% savings here, while Netflix claims a 20% savings for the top rung here.

In August 2022, I published two reports analyzing the per-title encoding features of five service providers. At a high level, I created a per-title encoding ladder for 21 test files using x264/x265 and the slow preset and compared the results to per-title H.264/HEVC output produced by AWS, Azure, Bitmovin, Brightcove, and Tencent. Azure has since exited the market.

I compared each service to the x264/x265 ladders but not to a fixed encoding, so I have no number to compare to Table 2. I analyzed the bandwidth savings produced by each service using H.264 and HEVC for three distribution patterns: top-heavy (viewers watched mostly top rungs but some middle and lower rungs), mobile (mostly middle rungs), and IPTV (viewers watched only the top two rungs). You see the recommendations for HEVC.

Table 3. HEVC per-title ratings from my per-title encoding report.

How does per-title stack up to the other alternatives? Looking at our table, it will cost a bit more since per-title is a premium option for all service providers. You see the bitrate savings, which will vary with your distribution pattern.

Since you’re not adding a new codec, the addressable market is all existing targets for that codec, which is good, and a lower storage cost because the bitrates will be lower. These lower bitrates have a positive impact on caches since you can fit more files in a fixed cache size and satisfy more viewers.

It’s very simple to implement technically, particularly if you’re already using a service like Bitmovin or AWS. Just flip a switch, and you’re outputting in per-title format. The technology risk is also minimal because you’re using the same codec that you’ve always used.

I’ve been a huge advocate of per-title since Netflix debuted it in 2015, and if it’s not a technology that you’ve explored to date, you’re behind the curve.

3. Deploying Capped Constant Rate Factor Transcoding

Capped CRF is an eminently DIY per-title method that is also unquestionably less effective than the techniques discussed in the previous section. Still, it’s an encoding mode that can shave serious bandwidth costs off your top rung and can be very simple and risk-free to implement.

As explained here, capped CRF is an encoding technique available for most open-source codecs, including x264, x265, SVT-AV1, libaom, and many others, including all codecs deployed in the NETINT Quadra. With capped CRF, you choose a quality level via CRF commands and a cap, and the relevant commands might look like this:

ffmpeg -i input_file -c:v libx264 -crf 23 -maxrate 6750k -bufsize 6750k output_file

In the command string, I’m encoding with the x264 codec and choosing a quality level of 23, which generally hits around 93 VMAF points. I’m also setting a cap of 6.75 Mbps. During easy-to-encode segments in a video, the x264 codec can achieve the CRF quality level far below the cap, generating bandwidth savings. During high-motion sequences, the cap controls, and you’re no worse off than you would have been using VBR or CBR. You produce capped CRF output in a single pass, a slight savings over 2-pass encoding, and you can use the technique for live video as well as VOD.

Looking at the chart, capped CRF is inexpensive to deploy but won’t save as much as more sophisticated per-title techniques (see here for why). You’re not changing your codec, so you can address all existing targets and it will lower your storage requirements and improve cache efficiency. Ease of implementation is low, and the only tech risk is a slightly greater risk of transient quality issues, particularly in high-motion footage. It’s a very solid technique used by more top-tier streaming publishers than you might think.

4. Different Ladders for Different Targets: Mobile Matters

It doesn’t take as high a quality video stream to look good on your mobile phone as it does to look good on your 70” smart TV. So, you shouldn’t make the same streams available to mobile viewers as you do for desktop viewers. You can do this by creating separate manifests for mobile and smart TV devices and identifying the device before you supply the manifest.

You see two examples of this in Figure 2, which is from an article appropriately entitled You Are Almost Certainly Wasting Bandwidth Streaming to Mobile Phones. To produce the two charts, I encoded each movie twice at 1080p, 720p, 540p, and 480p and measured VMAF quality using both the default (in blue) and phone model (in red).

As you can see, using the phone model, all four videos are above the 93 VMAF target many publishers use for top rung quality, and the two top rungs are at or close to 100 VMAF points. Multiple studies have proven that few viewers can discern any quality differential above 93-95, making the bandwidth associated with these higher-quality files a total waste.

Figure 2. Quality scores show that you can send lower-quality files to phone viewers without any reduction in QoE.

Looking at our scoring in Table 1, this approach should be inexpensive to implement and could save significant bandwidth. It works on all devices that you’re currently serving, with no impact on storage since you’re creating the same ladder, just not sending the top rung to mobile viewers. It has minimal impact on the cache because you’re distributing the same files. I rate the tech ease and tech risk as slightly higher than capped CRF, but this is a well-known technique used by many larger streaming shops.

5. Use a Higher Quality Preset

Most streaming producers associate preset with encoding cost, not bandwidth. As I explained in my article The Correct Way to Choose a Preset, bandwidth and encoding costs are simply two sides of the same coin. Figure 3 shows you why.

Specifically, the blue line in Figure 3 shows how much you have to boost your bitrate to match the quality produced by the very slow preset. The red line shows how much using the lower-quality preset saves in encoding time. If you use the default medium preset, you cut encoding time/cost by 74% as compared to very slow, which is great. But you have to boost your bitrate by 11% to achieve the same quality as the very slow preset. In this fashion, choosing the optimal preset involves balancing encoding costs against bandwidth costs.

As the Correct Way article explains, depending upon your bandwidth costs, if your files are viewed more than a few hundred times, you save money by encoding using the highest-quality preset. This is why YouTube started deploying AV1 even when encoding times were hundreds of times longer than H.264 or VP9. It’s why Netflix debuted their per-title encoding technique using a brute force convex hull-based encoding technique that was hideously expensive but delivered the highest possible quality at the lowest possible bitrate. When your files are watched tens or hundreds of millions of times, the encoding cost becomes irrelevant.

Figure 3. Balancing encoding and bandwidth costs.

As you can read in the Correct Way article, if your bandwidth costs are $0.08 (Amazon Cloudfront’s highest rate), and your video is watched for only 50 hours, veryslow is the most cost-effective preset. So, if you’re using the Medium preset or even the slower preset, check out the article and reconsider.

Obviously, changing your preset is very inexpensive to implement, benefits all existing targets, lowers your storage costs, improves cache efficiency, is easy to implement, and has a negligible tech risk. Unfortunately, it also offers the least potential savings.

That’s it; if you’re already doing most of these, congrats, you’re on top of your game. If not, hopefully, you’ve got some new ideas about how to reduce your bandwidth costs (and you see this before your boss does).

My FFmpeg-Related New Year’s Resolution: Document Before I Test

Jan Ozer — Sun, 03 Dec 2023 18:13:03 +0000

My typical workflow for testing-related articles and reports is to create the command string, run the tests, analyze the results, and then write or create the presentation. Since encoding is often so time-consuming, and I’m always in a hurry, I tend to quickly create the command strings with minimal thought, then run the tests, analyze the results, and start to write.

The problem with this approach is that I don’t really think about the command strings until I write about them or start creating the presentation. As you know, there are an infinite number of configurations, all of which produce different results. FFmpeg has a terrible habit of producing precisely what the command string tells it to, not necessarily what you want it to produce.

Figure 1. The typical project workflow. Ready > Fire > Aim.

The latest example involved producing x264 and x265 output. The x264 command string was exactly what I wanted, but when I started encoding to HEVC, I simply changed c:v x264 to c:v x265. Bad idea. One of the switches in the x264 command line was lookahead, which converts to -rc-lookahead for x265. FFmpeg didn’t stop the encoding; it simply displayed a yellow error message for a millisecond or two during the encoding, which I missed. GOP controls are also expressed differently, another yellow message that I missed.

None of this hit me until I pasted the x265 command string into Google Docs and started writing about the individual switches. The nickel dropped; I recognized the error and had to re-run the tests and the analysis, which cost me several hours, though past errors have cost me days if not weeks.

Figure 2. Document your command strings before your run the tests.

The cure? My New Year’s Resolution. I resolve not to run any encodes until I document the command string in the article or presentation. This should move the thinking up and minimize, though probably not eliminate, wasted cycles. I’ve considered this approach before, but wanted to document it to help ensure that I actually implement it.

If you’re one of those folks who fully thinks things through before starting the time-consuming task, good for you. If not, find the structure that moves your thinking ahead of the time-consuming processing and analysis and consider making a similar New Year’s Resolution.

Understanding the Economics of Transcoding

Jan Ozer — Tue, 21 Nov 2023 15:34:28 +0000

Whether your business model is FAST or subscription-based premium content, your success depends upon your ability to deliver a high-quality viewing experience while relentlessly reducing costs. Transcoding is one of the most expensive production-related costs and the ultimate determinant of video quality, so obviously plays a huge role on both sides of this equation. This article identifies the most relevant metrics for ascertaining the true cost of transcoding and then uses these metrics to compare the relative cost of the available methods for live transcoding.

Cost Metrics

There are two potential cost categories associated with transcoding: capital costs and operating costs. Capital costs arise when you buy your own transcoding gear, while operating costs apply when you operate this equipment or use a cloud provider. Let’s discuss each in turn.

CAPEX

The simplest way to compare transcoders is to normalize capital and operating costs using the cost per stream or cost per ladder, which simplifies comparing disparate systems with different costs and throughput. The cost per stream applies to services inputting and delivering a single stream, while the cost per ladder applies to services inputting a single stream and outputting an encoding ladder.

We’ll present real-world comparisons once we introduce the available transcoding options, but for the purposes of this discussion, consider the simple example in Table 1. The top line shows that System B costs twice as much as System A, while line 2 shows that it also offers 250% of the capacity of System A. On a cost-per-stream basis, System B is actually cheaper.

CAPEX	System A	System B
System cost	$10,000	$20,000
1080p30 stream output	50	320
Cost per stream	$200	$63
Required streams	640	640
Required systems	13	2
Total CAPEX	$130,000	$40,000

Table 1. A simple cost-per-stream analysis.

The next few lines use this data to compute the number of required systems for each approach and the total CAPEX. Assuming that your service needs 640 simultaneous streams, the total CAPEX for System A dwarfs that of System B. Clearly, just because a particular system costs more than another doesn’t make it the more expensive option.

For the record, the throughput of a particular server is also referred to as density, and it obviously impacts OPEX charges. System B delivers over six times the streams from the same 1RU rack as System A, so is much more dense, which will directly impact both power consumption and storage charges.

Details Matter

Several factors complicate the otherwise simple analysis of cost per stream. First, you should analyze using the output codec or codecs, current and future. Many systems output H.264 quite competently but choke considerably with the much more complex HEVC codec. If AV1 may be in your future plans, you should prioritize a transcoder that outputs AV1 and compare cost per stream against all alternatives.

The second requirement is to use consistent output parameters. Some vendors quote throughput at 30 fps, some at 60 fps. Obviously, you need to use the same value for all transcoding options. As a rough rule of thumb, if a vendor quotes 60 fps, you can double the throughput for 30 fps, so a system that can output 8 1080p60 streams and likely output 16 1080p30 streams. Obviously, you should verify this before buying.

If a vendor quotes in streams and you’re outputting encoding ladders, it’s more complicated. Encoding ladders involve scaling to lower resolutions for the lower-quality rungs. If the transcoder performs scaling on-board, throughput should be greater than systems that scale using the host CPU, and you can deploy a less capable (and less expensive) host system.

The last consideration involves the concept of “operating point,” or the encoding parameters that you would likely use for your production, and the throughput and quality at those parameters. To explain, most transcoders include encoding options that trade off quality vs throughput much like presets do for x264 and x265. Choosing the optimal setting for your transcoding hardware is often a balance of throughput and bandwidth costs. That is, if a particular setting saves 10% bandwidth, it might make economic sense to encode using that setting even if it drops throughput by 10% and raises your capital cost accordingly. So, you’d want to compute your throughput numbers and cost per stream at that operating point.

In addition, many transcoders produce lower throughput when operating in low latency mode. If you’re transcoding for low-latency productions, you should ascertain whether the quoted figures in the spec sheets are for normal or low latency.

For these reasons, completing a thorough comparison requires a two-step analysis. Use spec sheet numbers to identify transcoders that you’d like to consider and acquire them for further testing. Once you have them in your labs you can identify the operating point for all candidates, test at these settings, and compare them accordingly.

OPEX – Power

Now, let’s look at OPEX, which has two components: power and storage costs. Table 2 continues our example, looking at power consumption.

Unfortunately, ascertaining power consumption may be complicated if you’re buying individual transcoders rather than a complete system. That’s because while transcoding manufacturers often list the power consumption utilized by their devices, you can only run these devices in a complete system. Within the system, power consumption will vary by the number of units configured in the system and the specific functions performed by the transcoder.

Note that the most significant contributor to overall system power consumption is the CPU. Referring back to the previous section, a transcoder that scales onboard will require lower CPU contribution than a system that scales using the host CPU, reducing overall CPU consumption. Along the same lines, a system without a hardware transcoder uses the CPU for all functions, maxing out CPU utilization likely consuming about the same energy as a system loaded with transcoders that collectively might consume 200 watts.

Again, the only way to achieve a full apples-to-apples comparison is to configure the server as you would for production and measure power consumption directly. Fortunately, as you can see in Table 2, stream throughput is a major determinant of overall power consumption. Even if you assume that systems A and B both consume the same power, System B’s throughput makes it much cheaper to operate over a five year expected life, and much kinder to the environment.

Power Costs	System A	System B
System power consumption (watts)	400	500
1080p30 stream output	50	320
Watts per stream	8	1.5625
Required streams	640	640
Total watts	5120	1000
Kilowatts per year	44851.2	8760
FIve year cost @ $.08	$17,940	$3,504

Table 2. Computing the watts per stream of the two systems.

Storage Costs

Once you purchase the systems, you’ll have to house them. While these costs are easiest to compute if you’re paying for a third-party co-location service, you’ll have to estimate costs even for in-house data centers. Table 3 continues the five year cost estimates for our two systems, and the denser system B proves much cheaper to house as well as power.

Co-location costs	System A	System B
Required output streams	640	640
Steams per system	20	50
Required systems	13	2
Five-year cost (@ $50/month per 1RU)	$39,000	$6,000

Table 3. Computing the storage costs for the two systems.

Transcoding Options

These are the cost fundamentals, now let’s explore them within the context of different encoding architectures.

There are three general transcoding options: CPU-only, GPU, and ASIC-based. There are also FPGA-based solutions, though these will probably be supplanted by cheaper-to-manufacture ASIC-based devices over time. Briefly,

CPU-based transcoding, also called software-based transcoding, relies on the host central processing unit, or CPU, for all transcoding functions.
GPU-based transcoding refers to Graphic Processing Units, which are developed primarily for graphics-related functions but may also transcode video. These are added to the server in add-in PCIe cards.
ASICs are Application-Specific Integrated Circuits designed specifically for transcoding. These are added to the server as add-in PCIe cards or devices that conform to the U.2 form factor.

Real-World Comparison

NETINT manufactures ASIC-based transcoders and video processing units. Recently, we published a case study where a customer, Mayflower, rigorously and exhaustively compared these three alternatives, and we’ll share the results here.

By way of background, Mayflower’s use case needed to input 10,000 incoming simultaneous streams and distribute over a million outgoing simultaneous streams worldwide at a latency of one to two seconds. Mayflower hosts a worldwide service available 24/7/365.

Mayflower started with 80-core bare metal servers and tested CPU-based transcoding, then GPU-based transcoding, and then two generations of ASIC-based transcoding. Table 4 shows the net/net of their analysis, with NETINT’s Quadra T2 delivering the lowest cost per stream and the greatest density, which contributed to the lowest co-location and power costs.

Table 4. A real-world comparison of the cost per stream and OPEX associated with different transcoding techniques.

As you can see, the T2 delivered an 85% reduction in CAPEX with ~90% reductions in OPEX as compared to CPU-based transcoding. CAPEX savings as compared to the NVIDIA T4 GPU was about 57%, with OPEX savings around ~70%.

Table 5 shows the five-year cost of the Mayflower T-2 based solution using the cost per KWH in Cyprus of $0.335. As you can see, the total is $2,225,241, a number we’ll return to in a moment.

CAPEX		$1,444,000
Five year co-location		$285,000
Kilowatts per year	296,263
Five-year cost at $0.335		$496,241
Total Mayflower five-year cost		$2,225,241

Table 5. Five-year cost of the Mayflower transcoding facility.

Just to close a loop, Tables 1, 2, and 3, compare the cost and performance of a Quadra Video Server equipped with ten Quadra T1U VPUs (Video Processing Units) with CPU-based transcoding on the same server platform. You can read more details on that comparison here.

Table 6 shows the total cost of both solutions. In terms of overall outlay, meeting the transcoding requirements with the Quadra-based System B costs 73% less than the CPU-based system. If that sounds like a significant savings, keep reading.

Total cost	System A	System B
System	$130,000	$40,000
Power	$17,940	$3,504
Co-location	$39,000	$6,000
Total	$186,940	$49,504

Table 6. Total cost of the CPU-based System A and Quadra T2-based System B.

Cloud Comparison

If you’re transcoding in the cloud, all of your costs are OPEX. With AWS, you have two alternatives: producing your streams with Elemental MediaLive or renting EC3 instances and running your own transcoding farm. We considered the MediaLive approach here, and it appears economically unviable for 24/7/365 operation.

Using Mayflower’s numbers, the CPU-only approach required 500 80-core Intel servers running 24/7. The closest CPU in the Amazon ECU pricing calculator was the 64-core c6i.16xlarge, which, under the EC2 Instance Savings plan, with a 3-year commitment and no upfront payment, costs 1,125.84/month.

Figure 1. The annual cost of the Mayflower system if using AWS.

We used Amazon’s pricing calculator to roll these numbers out to 12 months and 500 simultaneous servers, and you see the annual result in Figure 1. Multiply this by five to get to the five-year cost of $33,775,056, which is 15 times the cost of the Quadra T2 solution, as shown in table 5.

We ran the same calculation on the 13 systems required for the Quadra Video Server analysis shown in Tables 1-3 which was powered by a 32-core AMD CPU. Assuming a c6a.8xlarge CPU with a 3-year commitment and no upfront payment,, this produced an annual charge of $79,042.95, or $395,214.6 for the five-year period, which is about 8 times more costly than the Quadra-based solution.

Figure 2. The annual cost of an AWS system per the example schema presented in tables 1-3.

Cloud services are an effective means for getting services up and running, but are vastly more expensive than building your own encoding infrastructure. Service providers looking to achieve or enhance profitability and competitiveness should strongly consider building their own transcoding systems. As we’ve shown, building a system based on ASICs will be the least expensive option.

In August, NETINT held a symposium on Building Your Own Live Streaming Cloud. The on-demand version is available for any video engineer seeking guidance on which encoder architecture to acquire, the available software options for transcoding, where to install and run your encoding servers, and progress made on minimizing power consumption and your carbon footprint.

The Impact of GOP Size on Video Quality

Jan Ozer — Fri, 17 Nov 2023 00:35:31 +0000

This freely downloadable report measures the qualitative impact of GOP sizes on animated, general entertainment, sports, and office footage for H.264 and HEVC.

The impact of GOP size on VMAF quality. Click to view at full resolution.

One of the most fundamental encoding decisions is GOP size or the frequency of I-frames in our encoded files. I-frames, also called keyframes, start each “group of pictures” comprised of I-, B-, and P-frames. Most of the time, our GOP size, or I-frame interval, is dictated by adaptive bitrate considerations like choosing a GOP size that divides evenly into your segment size. But even then, you have multiple options. In addition, what GOP size should you use when encoding a single file for disk-based playback or progressive download?

This report helps you decide. It measures:

VMAF quality
Of 13 files in four genres (animations, general entertainment, sports, office)
With keyframe intervals ranging from 1/2 second to 20 seconds
Encoded with x264 and x265

No registration is required; click here to download the report. GOP-Size_report_11_16.pdf (1320 downloads )

B-Frames, Ultra Low-Latency Encoding, and Parking Lot Rules

Jan Ozer — Sun, 02 Jul 2023 22:17:55 +0000

One of my sweetest memories of bringing up our two daughters was weekly trips to the grocery store. Each got a $5.00 bribe for accompanying their father, which they happily invested in various tchotchkes that seldom lasted the week. When we exited the car, “parking lot rules” always applied, which meant that each daughter held one of Daddy’s hands for the walk to the store. Two girls, two hands, no running around the busy parking lot.

Parking lot rules came to mind as we debugged a decoding latency issue when testing a new server product called the Quadra Video Server. Initial tests revealed a decoding latency of up to 200 milliseconds in some high-volume configurations. Given that the encoding latency was under 20 milliseconds, the decoding numbers were uncomfortably high.

Eliminate B-Frames from the Origination Stream

After raising the issue, our testing team implemented a fix, which dropped latency to under 20 milliseconds, and decreased encoding latency as well. The change is the parking-lot-rules corollary for live streamers, which is “for ultra-low latency, eliminate B-frames from your live streaming workflow.” For most live encoders and transcoders, disabling B-frames for AVC or HEVC should be simple in the GUI or via a change to your command string.

A quick glance at Figure 1 reveals why B-frames blow-up decoding latency (shoutout to OTTverse, where we grabbed the image). B-frames, of course, incorporate redundancies from frames before and after the frame being encoded. They are packed and decoded out of order. Any frame decoded out of order adds latency – the further they are out of order, the greater the latency.

Figure 1. B-frames are packed out of order and can increase decode latency.

Will eliminating B-frames (or the Baseline H.264 profile) reduce the quality of the incoming stream? Only minimally, if at all. These streams are typically produced at a relatively high bit rate, so B-frames or higher-quality profiles deliver minimal additional quality. It’s even less likely that any decrease in quality would be noticeable in the output stream (see here).

B-Frames and Latency

Let’s pause for a moment and reflect on the bigger picture. Figure 2 shows the typical live-streaming workflow. We’ve been talking about B-frames in the on-premise encode impacting the decoding latency in the transcoding server. What about B-frames in the transcoding server when encoding streams for delivery to viewers?

Figure 2. B-frames from the on-premise transcoder will increase latency from the transcoding server.

Predictably, the result is the same. B-frames introduce the same latency during encoding for delivery for the same reason–packing frames out of order introduces delays. This is why, when implementing low-latency mode with the NETINT Quadra Video Processing Unit and T408 transcoder, you must use a GOP preset that encodes with consecutive frames.

When you get things right – incoming streams without B-frames and outgoing streams without B-frames, the results are transformative. Let’s have a look.

Tue Low Latency Transcoding

Table 1 below shows the actual testing results. This use case involves scaling 1080p AVC input down to 720p for delivery, which is common for interactive gaming, auction sites, and conferencing, and the server can produce 320 streams while encoding AVC, HEVC, and AV1. I don’t have the original data for the input file with B-frames, but as I recall, decoder latency averaged 150 – 200 ms, a noticeable break in a live conversation. Even worse, unlike encoder latency, it didn’t drop significantly in low-delay mode.

As you see in the table, after the fix, total latency is around 160 ms for all outputs in normal (latency-tolerant) mode. Working with the input file without B-frames, and outputting streams without B-frames, combined encoder and decoder latency plummets to around 22 ms, well under a single frame (which for 30 fps video takes 33 ms to display). That’s low enough for even the most latency-sensitive applications.

Table 1. Encode/decode latency in normal and low-delay mode (with a properly formatted input file).How much will the lack of B-frames impact quality in the output encoding ladder? Once again, B-frames have delivered surprisingly little value in the tests that I’ve performed. You can read a good article on the subject here and access updated data here (see page 22), which shows less than a 1% quality difference between streams with and without B-frames. The bottom line, of course, is that if your application needs ultra-low latency, you have to prioritize that over any potential quality loss, though it’s good to know that few, if any, viewers will notice it.

Returning to the thoughts that prompted this article, when my daughters have their kids, an endearing wish is that they implement parking lot rules in all relevant shopping trips. Given their progress to date, this may not occur in my lifetime. If you’re a live-streaming engineer, you have no similar excuse to ignore the corollary. If latency is critical, make sure you eliminate B-frames from your live-streaming workflows.

Again, the server referenced is the Quadra Video Server, which combines ten Quadra video transcoding units (VPUs) with a SuperMicro chassis driven by a 32-core CPU. The total cost should be around $20,000 for this configuration. Stay tuned for more details.

(Author’s note: this article was edited after publishing to remove the recommendation to use the baseline profile to eliminate B-frames from H.264 streams. As several LinkedIn commenters pointed out, a better solution was to use the High profile and simply disable B-frames in the GUI or command string).

Which is the Best AWS CPU for FFmpeg?

Jan Ozer — Sun, 02 Jul 2023 14:28:24 +0000

If you encode with FFmpeg on AWS, you probably know that you have three CPU options: AMD, Graviton, and Intel. Which is the best AWS CPU for FFmpeg? This article reveals all.

For those in a hurry, it’s Graviton for x264 and AMD for x265, often by a significant margin. But the devil is always in the details, and if you want to learn how we tested and how big a difference your CPU selection makes, you can follow the narrative or hopscotch through the fancy charts below. We conclude with a look at the optimal core count for those encoding with AMD CPUs.

Here’s a short video describing the tests and findings.

Testing the AWS CPUs

Let me start by saying that this was my first extended foray into CPU testing on AWS, and while it appears straightforward, some unconsidered complexity may have skewed the results. If you see any errors or other factors worth considering, please drop me a note at janozer@gmail.com.

Second, your source clip and command string may produce different results than those shown below. If you’re spending big to encode with FFmpeg on AWS, don’t consider my results the final word; instead, consider them as evidence that your CPU choice really does matter and as motivation to perform your own tests.

Those caveats aside, let’s dig into the testing.

Codecs/Configurations/Command Strings

I tested three test cases.

8-bit 1080p30 with x264
8-bit 1080p30 with x265
10-bit 4K60p with x265

I present the command strings at the bottom of this article. Note that I used the veryslow preset for x264, slower for x265 at 1080p30, and slow for the 4K60 HEVC encodes. Why such demanding presets? Because based upon a total cost of distribution (encoding and bandwidth), the optimal economic decision when view counts will exceed 10,000 views is to use a high-quality preset.

Remember, presets don’t determine quality; your quality expectations do. Most compressionists target a VMAF score of between 93-95 VMAF points for the top rung of their encoding ladders. Using the veryslow preset, you might achieve that at, say, 3 Mbps. Using ultrafast, you might need a bit rate of as much as 5 Mbps to achieve the same quality. Ultrafast might cut your encoding time/cost by 90%, but you only pay that once, while you pay bandwidth costs for each video view. Even at a cost per GB of $0.02, it takes less than 10,000 views for the veryslow preset to break even based on lower bandwidth costs.

Instances and Pricing

I tested using the 8-core instances and on-demand pricing shown in Table 1. I tested all systems running Ubuntu version 22.04. Note that the cost delta between Intel and AMD is ten percent, a number I’ll refer to below.

Table 1: Instances and on-demand pricing tested.

Encoding Procedure

As you’ll see in the charts below, I started encoding a single FFmpeg instance and kept adding simultaneous encodes until the cost per stream began to increase, indicating that spinning up another instance was more cost-effective than adding additional encodes to the same system.

FFmpeg Versions

Here’s where things get a bit complicated. My premise was that I would produce the optimal results using FFmpeg versions compiled specifically for each CPU tested. I downloaded builds for Graviton, AMD, and Intel from https://johnvansickle.com/ffmpeg/ and happily contributed via PayPal. However, I was also in touch with MulticoreWare, who requested that I test with an advanced version of their x265 codec that was optimized for Graviton.

Figure 1. I tested with CPU-specific versions of FFmpeg 6.0 from https://johnvansickle.com/ffmpeg/.

Before testing, I compared the performance of the stock version of FFmpeg (Version 4.4) with the CPU-specific versions from Vansickle on the AMD and Intel platforms and for x264 on Graviton. In all cases, the Vansickle version produced the same or better throughput with identical quality.

Note that in other tests on different AMD instances with core counts ranging from 2 – 32, the Vansickle version was not always the best performer. So, if you try the Vansickle versions or your own CPU-specific compiled versions, you should verify that it outperforms the native version in all relevant use cases.

Note that the MulticoreWare version of FFmpeg performed much better on the Graviton system than the generic version of 4.4 or the Vansickle version, though still far behind Intel and particularly AMD. As you’ll see clearly below, if you’re running x265 on a Graviton system using high-quality presets, you’re missing a great opportunity to shave your costs.

For the record, I tried upgrading the stock version of FFmpeg on the Ubuntu system to version 6.0 but ran into multiple issues that ultimately corrupted the system and forced me to start back at ground zero. Unfortunately, Ubuntu operation and maintenance are not a core-strengths of mine, but since I ran all tests using Version 6.0, whether supplied by Vansickle or MulticoreWare, the results should be representative.

Table 2 shows the different versions of FFmpeg that I ran on the three systems for the three test cases.

	AMD	Graviton	Intel
8-bit 1080p30/x264	Vansickle	Vansickle	Vansickle
8-bit 1080p30/x265	Vansickle	MulticoreWare	Vansickle
10-bit 4K60p/x265	Vansickle	MulticoreWare	Vansickle

Table 2. The FFmpeg versions deployed on the three systems for the three test cases.

Results

Here are the results for the three test cases.

Best AWS CPU for FFmpeg @ 1080p x264

Figure 2 shows the cost per hour to produce a 1080p30 stream using FFmpeg and the x264 codec. One of the more interesting testing results was that the combination of FFmpeg and Ubuntu handled multiple instances of FFmpeg with minimal overhead, particularly on the Graviton CPU. You see this with the cost per hour for Graviton remaining consistent through twelve instances, while it increased slightly for Intel after ten instances and AMD after 12.

In all cases, you see the cost per instance drop significantly when moving from single to multiple simultaneous encodes. If you’re performing a single 1080p x264 encode on an 8-core system, you’re probably wasting money.

On the other hand, once each CPU hits the lowest cost per hour, it’s time to consider adding another instance. The cost per stream will remain the same, but your encoding speed will double. So, if you’re encoding on a Graviton system, your encoding time will double if you perform twelve simultaneous encodes as opposed to six, but your cost per hour will be almost exactly the same. If you spin up another 8-core system and encode six simultaneous encodes on the two systems, your cost will be almost identical, but your throughput will double.

Figure 2. Cost per hour to produce a single 1080p stream using the x264 codec and FFmpeg. Graviton is clearly the most cost-effective.

Best AWS CPU for FFmpeg @ 1080p x265

What a difference a codec makes. Where Graviton was the clear leader for x264, it’s the clear laggard for x265. Again, I produced the Graviton results shown in Figure 3 using a version of FFmpeg supplied by x265 developer MulticoreWare; the results would have been much worse with either the Vansickle version or the stock version. As you may know, Graviton is an Arm-based CPU that uses a different instruction set than Intel or AMD CPUs. While the x264 codec was Arm-friendly, the x265 codec was decidedly the reverse, at least using the high-quality presets that I used in my tests.

Interestingly, for both Intel and AMD, we realized the lowest cost per stream at relatively low simultaneous stream counts, two for Intel and two and three for AMD. If your testing confirms this, you should consider adding instances once you achieve this threshold rather than adding additional encodes to existing instances.

Figure 3. Cost per hour to produce a single 1080p stream using the x265 codec and FFmpeg.

Comparing the lowest cost Intel ($6.60) to the lowest cost AMD ($5.49), shows a cost delta of about 17%. As shown in Table 1, 10% of this relates to pricing, leaving about a 7% performance delta.

For the record, note that an Amazon engineer ran similar tests here and found that Graviton was faster for both x264 and x265. Note, however, that the author used the ultrafast preset, while I used higher quality presets for the stated reasons. Have a look and draw your own conclusions.

Best AWS CPU for FFmpeg @ 4K60 x265

In 4K60p testing, the Graviton was clearly overwhelmed from both a cost and performance aspect, unable to complete even three simultaneous encodes. The overall cost delta between Intel and AMD narrowed slightly, dropping to 13.7% overall, with 10% relating to pricing. The actual throughput delta between the two in these tests is 3.7%.

Figure 4. Cost per hour to produce a single 4K60p stream using the x265 codec and FFmpeg.

This 4K60 test stressed memory usage much more so than the 1080p tests, limiting successful simultaneous transcodes to two for Graviton and four for AMD and Intel. Interestingly, in these tests, AMD produced the lowest cost per stream while running a single encode, and Intel did so at 2. With these challenging encodes; you may want to spin up new machines after only one or two encodes rather than attempting more simultaneous encodes. Or, perhaps, try a machine with more cores. Hold that thought until the last section.

For reference, Table 3 summarizes the lowest cost per hour for the three test cases.

	AMD	Graviton	Intel
x264-1080p/30	$0.70	$0.61	$0.91
x265-1080p/30	$5.49	$8.63	$6.60
x265-4K60	$15.65	$35.39	$18.14

Table 3. Cost per hour for the three test cases on the three tested CPUs.

This leads us to the last section.

What’s the Optimal Number of Cores for FFmpeg?

AWS offers multiple core counts in all three CPU flavors: what’s the optimal core count? To evaluate this, I ran tests on multiple AMD CPUs for all three test cases and presented the results below.

Let’s talk about expectations first. AWS charges linearly for the machine cores, so an 8-core system costs twice as much as a 4-core system and a quarter of a 32-core system. Given the results presented above, where FFmpeg/Ubuntu proved highly efficient when processing multiple instances, I expected a similar cost per hour for all CPUs. The results were close.

With x264, 2-core and 8-core systems were slightly more affordable than 16-core, though a 32-core system finally caught up at 32 simultaneous transcodes. If you’re going to run a 32-core system for 1080p30/x264 encodes, you need to be running quite a few simultaneous encodes to achieve the optimal cost per stream.

Figure 5. x264 encoding cost for the CPU core counts shown.

With x265 encoding at 1080p, the results were closer to what I expected, though again, the 2-core and 8-core systems were slightly more affordable. Unlike x264, the 32-core system became slightly more expensive as the number of simultaneous encodes increased, making eight simultaneous streams the most affordable.

Figure 6. x265 encoding cost for 1080p30 encodes, and the CPU core counts shown.

When encoding 4K videos, the phrase “go big or go home” comes to mind. Here, 32-cores delivered the lowest cost, though only by a fraction, and only at four simultaneous encodes. After that, the cost per hour increases slightly through eight encodes and then starts a more serious climb.

Figure 7. x265 encoding cost for 4K60 encodes, and the CPU core counts shown.

As you can see, all these results are highly codec and source material specific. The most important takeaway from this article should not be that Graviton is best for x264 and AMD best for x265. It should be that real differences exist between the performance of the CPUs, and these differences may translate to significant cost differentials. If you’re spending even a few thousand dollars a month on AWS for FFmpeg encoding, it makes sense to run tests like these to identify the most cost-effective CPU and core-count.

Test Strings

1080p30 x264:

ffmpeg -y -i Orchestra.mp4 -c:v libx264 -profile:v high -preset veryslow -g 60 -keyint_min 60 -sc_threshold 0 -b:v 4200k -pass 1 -f mp4 /dev/null

ffmpeg -y -i Orchestra.mp4 -c:v libx264 -preset veryslow -g 60 -keyint_min 60 -sc_threshold 0 -b:v 4200k -maxrate 8400k -bufsize 8400k -pass 2 orchestra_x264_output.mp4

1080p30 x265:

ffmpeg -y -i Football_short.mp4 -c:v libx265 -preset slower -x265-params keyint=60:min-keyint=60:scenecut=0:bitrate=3500:pass=1 -f mp4 /dev/null

ffmpeg -y -i Football_short.mp4 -c:v libx265 -preset slower -x265-params keyint=60:min-keyint=60:scenecut=0:bitrate=3500:vbv-maxrate=7000:vbv-bufsize=7000:pass=2 Football_x265_HD_output.mp4

4K60 x265:

ffmpeg -y -i Football_4K60.mp4 -c:v libx265 -preset slow -x265-params keyint=120:min-keyint=120:scenecut=0:bitrate=12500K:pass=1 -f mp4 /dev/null

ffmpeg -y -i Football_4K60.mp4 -c:v libx265 -preset slow -x265-params keyint=120:min-keyint=120:scenecut=0:bitrate=12500K:vbv-maxrate=25000K:vbv-bufsize=25000K:pass=2 Football_4K_output.mp4