RED + NVIDIA Announce Realtime 8K R3D GPU Decoding

DJ Meyer · Aug 20, 2018

Misha Engel said:
Same price as the Titan Xp ($ 1.200) 1 GB less memory and a whole 3,3% faster in de-bayering at base clock. It better be good in other things.

Got benches? Haven't seen any that show debayering.

Misha Engel · Aug 20, 2018

DJ Meyer said:
Got benches? Haven't seen any that show debayering.

De-bayer is FP32 (of all people on this forum you should know, having tested so many GPU's).
Wtih ProResRAW it's a different story because it's only 12 bits, so that half precision should be enough.

Titan Xp 11.366 TFlops single precision at base clock
RTX2080Ti 11.75 Tflops single precision at base clock

DJ Meyer · Aug 20, 2018

Misha Engel said:
De-bayer is FP32 (of all people on this forum you should know, having tested so many GPU's).
Wtih ProResRAW it's a different story because it's only 12 bits, so that half precision should be enough.

Titan Xp 11.366 TFlops single precision at base clock
RTX2080Ti 11.75 Tflops single precision at base clock

What I've learned most of all from testing so many GPUs is that one needs to actually test the cards in real world use cases to determine their performance, especially when comparing different generations.

Spreadsheet benchmarking is pretty useless, especially when talking about video or .r3d workflows.

Misha Engel · Aug 20, 2018

DJ Meyer said:
What I've learned most of all from testing so many GPUs is that one needs to actually test the cards in real world use cases to determine their performance, especially when comparing different generations.

Spreadsheet benchmarking is pretty useless, especially when talking about video or .r3d workflows.

De-Bayer for ARRI, BMD, RED, SONY, etc... is all 16 bits at the end, de-bayered with FP32 on a GPU.

Ian D'Amelia · Aug 21, 2018

Misha Engel said:
De-bayer is FP32 (of all people on this forum you should know, having tested so many GPU's).
Wtih ProResRAW it's a different story because it's only 12 bits, so that half precision should be enough.

Titan Xp 11.366 TFlops single precision at base clock
RTX2080Ti 11.75 Tflops single precision at base clock

Not sure you can compare these one-to-one solely based on FLOPS performance (BTW -- the pure floating point compute of the SM is 13.4 TFLOPs not 11.75 on the 2080 Ti). They have added a completely concurrent fixed-point compute engine (which all addressing is based on), as well as higher memory bandwidth. This doesn't even address the 110 TFLOPS of fused multiply-add OPs that you can get out of the tensor cores -- which based on Jared's press release they very well may have harnessed.

Rakesh Malik · Aug 21, 2018

Misha Engel said:
Looking at some benchmarks from phoronix, windows is kind of an embarrasment.

Yes, especially in the I/O and file system department.

Rakesh Malik · Aug 21, 2018

Ian D'Amelia said:
Not sure you can compare these one-to-one solely based on FLOPS performance (BTW -- the pure floating point compute of the SM is 13.4 TFLOPs not 11.75 on the 2080 Ti). They have added a completely concurrent fixed-point compute engine (which all addressing is based on), as well as higher memory bandwidth. This doesn't even address the 110 TFLOPS of fused multiply-add OPs that you can get out of the tensor cores -- which based on Jared's press release they very well may have harnessed.

You're completely right. TFLOPS is generic... doesn't account for the fact that a lot of the new hardware in the Turing generation is dedicated and custom for certain things, which is invariably faster for those things. This isn't just a case of "we added more computing power" it's "we added dedicated processors for X and Y in addition to more raw computing power" -- where in this case X and Y are ray tracing and AI.

You can compare one Turing variant to another based on generics like TFLOPS to help decide which one to purchase, but beyond that comparing to a current generation variant is going to be quite misleading, since it doesn't account for the tensor and RT cores.

Misha Engel · Aug 21, 2018

Ian D'Amelia said:
Not sure you can compare these one-to-one solely based on FLOPS performance (BTW -- the pure floating point compute of the SM is 13.4 TFLOPs not 11.75 on the 2080 Ti). They have added a completely concurrent fixed-point compute engine (which all addressing is based on), as well as higher memory bandwidth. This doesn't even address the 110 TFLOPS of fused multiply-add OPs that you can get out of the tensor cores -- which based on Jared's press release they very well may have harnessed.

11.75 TFlops32 at base clock 1350 MHz. Higher memory bandwidth will help (after the decrypt/decode/decompress) with de-bayer and other things in Resolve, I hope for the reduser that the other part (RT, tensor, int16, etc..) will help with the decrypt/decode/decompress on the .R3D, so the CUDA cores can focus on the "old" GPU things.

Blair S. Paulsen · Aug 21, 2018

I'm hopeful that some of the cores/architecture intended to accelerate specific tasks have the capacity to be "re-purposed" for operations that require similar processing. Perhaps that's part of the reason the new code can get more oomph out of the Kepler cards.

IAC, I really hope that Apple and NVIDIA can end their feud and work together on the 2019 mMP. AFAIK there is nothing super-special about CUDA cores, but at this point many of the key applications we use have been written to leverage them - so here we are.

Cheers - #19

Misha Engel · Aug 21, 2018

Blair S. Paulsen said:
IAC, I really hope that Apple and NVIDIA can end their feud and work together on the 2019 mMP. AFAIK there is nothing super-special about CUDA cores, but at this point many of the key applications we use have been written to leverage them - so here we are.

Cheers - #19

Jarred Land said:
OpenCL may come in a year or two... but not till CUDA is completely fleshed out. AMD does have a pretty fast solution with their high end workstation cards that can get you to realtime the old way as long as you have a ninja Xeon or two or 4.

It makes sense that RED and NVidia went in cooperation, NVidia wants to sell GPU's only(not strengthening the competition AMD and Intel) and RED wants to sell RED camera's only(without GPU companies focussing on other codecs). AMD wants to sell a CPU + GPU, Apple wants to protect their codecs(ProRes and ProResRAW) and sell lifestyle computers that can handle their codecs, iphones, etc.., Cavium thunderx2 is to immature and Intel is having a hard time fixing all the leaks in their CPU's.

Let's hope the cooperation between RED and NVidia lives up to the hype (the Windows and Linux users hope it will be a success).

Jason Rivera · Aug 21, 2018

Misha Engel said:
I hope for the reduser that the other part (RT, tensor, int16, etc..) will help with the decrypt/decode/decompress on the .R3D, so the CUDA cores can focus on the "old" GPU things.

Me too. Makes me think that's what Jarred was referring to when he said that there may be enough headroom for FX processing. Maybe CUDA cores will be freed up to do all the realtime FX work.

scott devitte · Aug 22, 2018

P1 available for preorder-

https://www.lenovo.com/us/en/laptops/thinkpad/thinkpad-p/ThinkPad-P1/p/22WS2WPP101

Misha Engel · Aug 22, 2018

scott devitte said:
P1 available for preorder-

https://www.lenovo.com/us/en/laptops/thinkpad/thinkpad-p/ThinkPad-P1/p/22WS2WPP101

P2000 with only 2.4 TFlops FP32 is a bit on the low side, same speed as the macbooks Radeon Pro 560x, great laptop for Full-HD or 2k-DCI and usable for UHD or 4k with a codec that is not to heavy.

Viktor Sloth · Aug 22, 2018

This is insanely beautiful news!

So a question:

Currently when i encode through Premiere, or AME i usually only get 5-10 % utilization on the GPU, and sometimes peaks at around 65-85% when hitting some plugins - this is mostly because of the R3D encoding happens with the CPU, but this will change now and the GFX will be utilized more with the encoding process?

Man... I can't even describe for how long I yearned for this!!!

scott devitte · Aug 22, 2018

Misha Engel said:
P2000 with only 2.4 TFlops FP32 is a bit on the low side, same speed as the macbooks Radeon Pro 560x, great laptop for Full-HD or 2k-DCI and usable for UHD or 4k with a codec that is not to heavy.

I have ordered one, and the 2080ti for egpu, both supposed to arrive around Sept 20th. I will test it then to see how it shakes out.

Misha Engel · Aug 22, 2018

scott devitte said:
I have ordered one, and the 2080ti for egpu, both supposed to arrive around Sept 20th. I will test it then to see how it shakes out.

P2000 for the 10 bits color overlay in OpenGL (Adobe software) and the eGPU for de-bayer and effects in Resolve(set the P2000 to GUI only). When you want to color grade in Resolve get an extra eGPU box for a declink-card and a 10 bits monitor/TV. External raid on one of the USB 3.1 Gen 1 ports for data storage and the remaining USB 3.1 Gen 1 with a USB-hub to connect your mouse, control panel, RED STATION RED MINI-MAG reader, etc...

Jason Rivera · Aug 28, 2018

Maybe I'm daydreaming, but it dawned on me after watching the Gamescom NVidia RTX 2080 announcement over and over again and reading Jarred's comments over and over again (to pass the time), that Red and Nvidia might be doing at least two things with the "code thing" to make performance enhancements to Red R3D footage.

1) "allowing GPU-based wavelet decode" using existing Pascal architecture by figuring out a way to parallel-ize the decode process using CUDA cores.

This is most likely , in my mind at least, what Jarred meant when he said "everyone on Nvidia will eventually win..." considering he did say that they in fact were "breaking 24fps on current cards with CUDA".

And even though he seemed to insinuate that the announcement of real-time 8K just happens to coincide with Nvidia's new hardware announcement, I believe the engineers on both sides are working on something much more than that...

2) "allowing GPU-based wavelet decode" using the new Turing architecture by using some combination of parallelization on the new dedicated processing cores (via a bunch of computer science, algorithmicy stuff). If successful, I can only imagine that we'd see 100s of 8k frames processed per second.

Again, I'm probably just daydreaming, but I think if these guys can figure out 10 Giga rays/sec, that they can eventually do 10 Giga frames/s. Hell, the possibilities are endless at this point. Maybe Red can learn a new strategy to optimize the encode process to allow for more frames/sec and smaller file sizes using AI, who knows?

Maz Mawlawi · Aug 30, 2018

Will the 2080 cards be able to do realtime 8K playback or is that only on the Quadro ones?

Adam Montville · Aug 30, 2018

I'm getting the RTX 2080 (ti was 'just' a bit too much money for now. I'll drop that kind of cash when 7nm happens), so I'll be happy about being able finally get real-time 8K decoding! Sometimes working in the lower decode settings freaks me out with the noise that won't be there in the final render. The faster rendering also sounds exciting!

Misha Engel · Aug 30, 2018

Wait a couple of days for the (independent) reviews.

Welcome to our community

Be a part of something great, join today!

RED + NVIDIA Announce Realtime 8K R3D GPU Decoding

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member