Welcome to our community

Be a part of something great, join today!

  • Hey all, just changed over the backend after 15 years I figured time to give it a bit of an update, its probably gonna be a bit weird for most of you and i am sure there is a few bugs to work out but it should kinda work the same as before... hopefully :)

Request : IPP2 Support for Red Rocket-X

Antony Newman

Well-known member
Joined
Mar 7, 2012
Messages
1,805
Reaction score
6
Points
38
Location
London, UK.
@Red,

Even with beefy workstations with high end CPU & GPU's - a considerable amount of system resources is consumed Debayering 8K R3Ds.

For top end Mac systems, or systems with AMD's GPU's - they still greatly benefit from RED-ROCKET-X acceleration.

NLE and Colour grading applications still support accelerated workflows incorporating the RRX (Pr,FCPx,Avid,Davinci) - because it make their tools more effective; increasing frame rates and allowing CPU & GPUs to be dedicated to also-consumptive effects processing.

RED currently sell new RED-ROCKET-Xs on their store front. ( http://www.red.com/store/products/red-rocket-x )

As the RED community migrates from 6K LEGACY to 8K IPP2 - is there a case to upgrade the RRX - one last time - to support IPP2?

AJ
 
@Red,

Even with beefy workstations with high end CPU & GPU's - a considerable amount of system resources is consumed Debayering 8K R3Ds.

For top end Mac systems, or systems with AMD's GPU's - they still greatly benefit from RED-ROCKET-X acceleration.

NLE and Colour grading applications still support accelerated workflows incorporating the RRX (Pr,FCPx,Avid,Davinci) - because it make their tools more effective; increasing frame rates and allowing CPU & GPUs to be dedicated to also-consumptive effects processing.

RED currently sell new RED-ROCKET-Xs on their store front. ( http://www.red.com/store/products/red-rocket-x )

As the RED community migrates from 6K LEGACY to 8K IPP2 - is there a case to upgrade the RRX - one last time - to support IPP2?

AJ

For the price of 1 RRX you can buy 2 eGPU boxes + 2 Titan V gpu's or 3 eGPU boxes + 3 GTX1080ti's(or 3 VEGA FE's).
 
Thankyou for the suggestion.

The RRX already has the Heavy Lifting horsepower that I need.

Rather than OBSOLETE the RED-ROCKET-X before its time - why not reconfigure TWO of the THREE banks so that it can process IPP2 footage and allow the hardware to support the increasing community that are adopting IPP2?

AJ
 
GPU : NVIDIA Quadro GP100 : 16GB HBM2 : £7,489 (about the price of an RRX)
CPU : 2 x 12 Core : 3GHz XEON

Blackmagic reckons that it takes > 24 XEON cores of CPU working flat to DECOMPRESS 8K R3Ds.
On his system - the GPU would have been able to do more DEBAYERING (ie use more than 46% of the GPU) .. if the the CPU's were not already bottlenecking.

March 2018 : https://forum.blackmagicdesign.com/viewtopic.php?f=21&t=70646#p395688

My read on this would : If a RED-ROCKET-X was used in that system and if it also used to DEBAYER, the CPU load and GPU load would have been greatly reduced - enabling higher frame rates and system resources elsewhere (eg to show effects in real time).

AJ
 
August the 13th 2018 you can get yourself an AMD TR2 2990X with 32 cores at 3,4..4 GHz for around $ 1500.
The titan V (14.8 TFlops fp32) is around $ 3000 and a lot faster than the GP100 (10.3 TFlops fp32).
With these kinds of GPU speeds the memory bandwith of the GPU's is the limiting factor with GTX1080ti's, Titan X(p), Vega FE's, etc...

When you want to use a Resolve version that only supports 32 threads with R3D you can turn off the SMT(Simultaneous multithreading) of the 32c/64t threadripper.

The thread you mentioned is one thing, the answer from the BMD guy is what worries me.
 
It is great news that AMD are torpedoing Intel’s desire to keep on sellng high price single socket CPUs with deliberately crippled performance : https://www.nextplatform.com/2018/07/11/why-intel-must-respond-to-amds-single-socket-threat/

As Apple have committed to releaseing a new machine in 2019 - it might even have enough GPU memory (12GB is not enough in R15b6) to Debayer a transition between two 8K VV R3D clips. And will have to be competitive with a 32-48? Core Rome - which is also fantastic.

However - if any of these configurations was able to process faster by OFFLOADING the processing of IPP2 R3D footage - and you had a RED-ROCKET-X ... would you want to Leverage the dedicated Hardware acceleration it could offer?

My point is that if the RED-ROCKET-X is technically able to Debayer 8K VV Monstro Footage - then this IS something that would benefit the RED comunity.

AJ
 
Our Red Rocket-X works with Monstro 8K footage using ipp2 just fine.
 
DJ Mayer,

My (likely limited) understanding is that the RED-ROCKET-X performs the following steps:
1) Decrypting the data (as the R3D format is protected)
2) Decompressinging the wavelets
3) DEBAYERing ONLY DRAGON Footage (but NOT IPP2 used by Monstro & Helium)

This first two stages are CPU heavy.
The DEBAYERING stage is GPU heavy

After the R3D are debayered into a 2K or 4K timeline - there is no extra processing overhead when ingesting 8K IPP2 or 6K DRAGON footage.

However as 8K IPP2 R3D have >80% more pixels to DEBAYER than the old 6K DRAGON, and this not currently being handled by the RED-ROCKET-X, it causes a lot of GPU resources and memory to be tied up debayering.

If the Red Rocket could ONLY support DRAGON debayering or IPP2 DEBAYERING - it could be argued that IPP2 has a greater need (due there being 80% more pixels) - and it would also promote adoption of REDs (Gs) latest and greatest colour pipeline - as this was the workflow that is fully accelerated in all the NLEs.

AJ
 
As the RED community migrates from 6K LEGACY to 8K IPP2 - is there a case to upgrade the RRX - one last time - to support IPP2?

Perhaps but I suspect it not economical for Red to persue ... also RRX is currently happy with the really hard part of a r3d decode (the de-wavelet)...so in reality only laptop TB3 systems would significantly benefit from a new IPP2 RRX.

As I am not a computer lover I am always slightly confused how successful sexy GPU marketing has convinced everyone that more GPU is the solution to r3d decoding. If you build a computer today that when decoding monstro files pegged both the GPU and CPU at near 100% the spend on GPU would be peanuts -and spending more is only of value for complex gradeing duties...

Until r3d's codec is changed to something else or the de-waveleting is moved to GPU we re stuck with big CPU bills or slow decoding. No amount of computer fetishising will change that!

I have some ARRI LF footage here and it decodes very very fast and you need nearly no CPU to do it... also sexy 8k GPUs would actually make it faster.

RED's initial greatest asset 'Redcode' is still going strong but due to unfortunately only modest increase in CPU (but massive bandwidth and GPU advances) over the last 10years it is not the asset I suspect Red had hoped for...

My hope is that Red are (behind the scenes) spending serious development resources on improvements to r3d decoding speeds to make r3ds the wonder feature it was..
 
RED's initial greatest asset 'Redcode' is still going strong but due to unfortunately only modest increase in CPU (but massive bandwidth and GPU advances) over the last 10years it is not the asset I suspect Red had hoped for...

My hope is that Red are (behind the scenes) spending serious development resources on improvements to r3d decoding speeds to make r3ds the wonder feature it was..

Over the last 10 years you might be right but the last 2 years cpu speed has increased dramatically.

Q2 2016 intel i7-6950x 10 core (CB R15 MC 1792 points) $ 1700 vs. Q3 2018 AMD TR2 32 core (CB R15 MC 6200 points) $ 1700 vs. Q2 2018 AMD R7 2700x (CB R15 MC 1817 points) $ 330

That is 3,5 times more speed in a bit more than 2 years for around the same price, or the same speed for 1/5 th of the price.

The new TR2 should be able to handle any R3D upto 8k in realtime or faster.

8k.R3D is great as it is.
 
Michael,

Thankyou for chiming in on this.

In further testing, I have found:
+) Resolve is about 5% slower processing IPP2 over LEGACY (8K VV footage : Debayer Quality=Half Res Good)
+) Resolve Half_Res_Good with an RRX is considerably faster on a TRASHCAN. 24fps playback with vs 7fps without.

Based on this - I agree with your view that the RRX is still doing the heavy lifting - and that the debayer is not the most costly part of processing (at Half Res Good).

<snip> RRX is currently happy with the really hard part of a r3d decode (the de-wavelet).

OTHER TESTING

It seems that RESOLVE has some problems with certain IPP2 clips (I had not noticed this with LEGACY).

When it struggles - the frame rate drops in Playback. In Delivery - those slowdowns are (sometimes) rendered as time reversal (or random data). Most problems seem to occur when transitioning between two 8K VV clips.

Also : Recreating a similar transition in FCP-X has not such slowdown.

AJ
 
Last edited:
The new TR2 should be able to handle any R3D upto 8k in realtime or faster.

I've only had mine for about 3 or 4 days with little time to test it out, but yes, I can confirm a significant increase in performance over using the TR1 2950x and just recently been able to achieve 8K 8:1 @ full res in realtime. I consider realtime to be where the playhead never catches up to the frame buffer and the neither the CPU or GPU utilization is maxed out. In the TR1 2950X configuration, I only had my 1080Ti plugged in, and the CPU was pegged at 100% in "Full" res while the GPU was barely doing any work (I don't remember utilization).

With the TR2 2990WX (under optimal settings), the CPU nor the GPU ever really got above 70% utilization. I can get smooth, realtime playback @ "Full" res in RedcineX Pro thus far, but it's still not quit as instantaneous/responsive as using 1/2 or 1/4 res and still not stutter free 100% of the time. It took a lot of tweaking to figure out what the best settings were, but as it stands, using a combination of three of my Nvidia 1080 cards in the following configuration, I can get smooth, realtime playback with moderate stress on both the CPU and GPUs while the playhead stays comfortably behind the frame buffer.

GPU Configuration:
1) 1080Ti (Slot 1 - x16)
2) 1080 (Slot 2 - x8) - Dual Displays connected
3) 1080 (Slot 3 - x16) - Dedicated PhysX GPU *** Important ***
4) Decklink Mini Monitor 4K (Slot 4 - x8)

Edit:

My realtime Redcine-X Pro settings by resolution are:
1) 8k Full res - Fairly Responsive and Fairly Smooth - 2 GPUs selected (2X x16); 36-40 frames processed simultaneously - currently playing a 1.5 minute clip in a loop over and over again as I write this. The frame buffer never fills up and keeps processing.
2) 8k Full res - Fairly Responsive and Fairly Smooth - 3 GPUs selected (2X x16, 1X x8); 40-46 frames processed simultaneously
3) 8k Half res - Fairly Responsive and Smooth - 2 GPUs selected; 24-100+ frames processed simultaneously
4) 8K Quarter res - Super Responsive and Butter Smooth - 1 GPU selected; 14-100+ frames processed simultaneously

Observations:
1) The lower the resolution, the higher the number of frames I could process simultaneously and vice versa. I could also process fewer frames than the clip frame rate and still achieve realtime playback at 1/4 resolution and lower. Surprisingly I could process 100+ frames at a time in 1/2 res.
2) The more GPUs I used, the more frames I could process simultaneously and vice versa.
3) The lower the number of frames I processes simultaneously (down to the clip frame rate at 1/2 and full res), the smoother the playback was overall and the more responsive it was when initially hitting the play button. Vice versa was also true.
4) Processing any number of frames less than the clip frame rate resulted in less than realtime performance for 1/2 and full res.
5) I think using my GPU in the x8 lane with any of the others dropped the overall performance of all the GPUs, but still worked good enough to see an increase in processed frames even at full res.
6) For my lower than Full res configurations, I could process 64 or more frames simultaneously with no problem as well as use more than one GPU with no problem. I only listed the minimum # of GPUs needed to achieve realtime smoothly. Responsiveness varied by configuration.
7) The TR2 2990WX CPU could process up to 100 simultaneous frames with no problem using all lower resolutions and up to 64 at full res. The GPUs however could not process 64 frames smoothly enough at full res, but it didn't matter since I could still process less frames simultaneously without the playhead catching up to the frame buffer.

The biggest factors affecting realtime performance that I've noticed are:

1) Which Nvidia GPU is set to the dedicated Physx GPU. This was the biggest killer of performance! If it is set to any card that is being used as a display, the playback will get choppy, even in lower resolutions. I would never use a single GPU setup with any resolution R3D file again now that I 've seen the difference.

2) The number of simultaneous frames to process. This has to be proportional to the hardware being used and the clip frame rate, compression, etc.

3) Whether I export to my Decklink Mini Monitor or not. Latency is a problem. Essentially, my Decklink Card is useless in Redcine at anything higher than 1/4 res, but 1/4 res works perfectly. At higher than 1/4 res I get a delay between my computer monitor and my program monitor and the lag gets exponentially worse over time to the point where eventually the system itself slows down to a crawl and playback on all displays stops, but I can still pause the playback. Must be some kind of memory buffer problem thingy majiggy on the Decklink or the PCIe slot I have it in.

What I learned is that for the best performance and overall system responsiveness during playback in Redcine-X, I want to process the fewest number of frames simultaneously and still allow the frame buffer to fill up fast enough to keep the playhead from ever catching up.

I'd love to know if anyone else had any luck with getting realtime performance with the new TR2 2990WX. I'm comfortable using ful res for long clips, but not all the time due to the initial responsiveness of the app at that resolution. I prefer the ultra responsiveness of the lower res settings. I think I'll stick to 1/2 res for most of my previews and 1/4 res when I'm running through clips rather quickly.
 
Jason, some tricks that might help you getting your workstation on speed.

- Not knowing how many threads are at 100% but guessing only half them, try to turn off SMT so that you have 32c/32t, this should get you around 100% CPU utilization on all 32 cores and lower latencies.
- With SMT off the threadrippers can overclock higher with lower Vcore voltages and less heat.
- Use at least 4 x 16 GB ram sticks with atleast DDR4-2933 CL14 (Zen loves fast memory).
- Tweak your memory, our DDR4-3200 CL14 128 GB runs at 2933 CL14 at 1.35 volt and 3200 CL14 at 1.55 volt (DDR4 can easily handle the voltage), with 4x16 GB you don't have to tweak it most of the time
(a friend of mine clocked the same memory 4x16GB to 3200 CL12 at 1.55 volt on a i9-7900).
- Try to sell those 2 GTX1080 and buy an extra GTX1080ti ($ 650..700), the memory bandwidth of the 1080's is only 320 GB/s whereas the 1080ti is 484 GB/s, memory bandwidth is a big limit on performance,
I overclocked our (VEGA)GPU's memory from 484 GB/s (945 MHz) to 564 GB/s (1100MHz) and that was a big help in Resolve. (I don't know if you can overclock the ti's memory).
- Try to use a separate scratch disk (ours is overkill 7 GB/s R/W, 4x960pro 512 GB in raid0 on a highpoint SSD72??? over 8 PCIe-lanes), 2x 1TB 970 pro's in raid0 should be great(5 GB/s seq.write and 7 GB/s seq.read)
or when not enough budget buy cheaper ones.
- Use Linux instead of Windows, Linux can better handle all the threads.

For us no WS budget this year, I hope you can let it run at it's full potential.
 
Jason, some tricks that might help you getting your workstation on speed.

- Not knowing how many threads are at 100% but guessing only half them, try to turn off SMT so that you have 32c/32t, this should get you around 100% CPU utilization on all 32 cores and lower latencies.
- With SMT off the threadrippers can overclock higher with lower Vcore voltages and less heat.
- Use at least 4 x 16 GB ram sticks with atleast DDR4-2933 CL14 (Zen loves fast memory).
- Tweak your memory, our DDR4-3200 CL14 128 GB runs at 2933 CL14 at 1.35 volt and 3200 CL14 at 1.55 volt (DDR4 can easily handle the voltage), with 4x16 GB you don't have to tweak it most of the time
(a friend of mine clocked the same memory 4x16GB to 3200 CL12 at 1.55 volt on a i9-7900).
- Try to sell those 2 GTX1080 and buy an extra GTX1080ti ($ 650..700), the memory bandwidth of the 1080's is only 320 GB/s whereas the 1080ti is 484 GB/s, memory bandwidth is a big limit on performance,
I overclocked our (VEGA)GPU's memory from 484 GB/s (945 MHz) to 564 GB/s (1100MHz) and that was a big help in Resolve. (I don't know if you can overclock the ti's memory).
- Try to use a separate scratch disk (ours is overkill 7 GB/s R/W, 4x960pro 512 GB in raid0 on a highpoint SSD72??? over 8 PCIe-lanes), 2x 1TB 970 pro's in raid0 should be great(5 GB/s seq.write and 7 GB/s seq.read)
or when not enough budget buy cheaper ones.
- Use Linux instead of Windows, Linux can better handle all the threads.

For us no WS budget this year, I hope you can let it run at it's full potential.

Sweet! Thanks for the feedback. I'll definitely play around with some of these suggestions to see what works for my system or not. I'm not expert in any of this, but I can get it done.

-I'm definitely curious about the SMT setting. I believe I can only change that from the AMD Ryzan Master software as I don't recall seeing it in the BIOS. I was wondering why none of the auto overclocking software would clock better than 10% over base. The highest I can get is 3.3GHz on all cores.
-Also, I do use 8x16GB 2666 Memory, but only 96GB is showing up, so I believe a pair of my memory slots is not working. Need to test that out. I did notice by default the Asus ROG Zenith Extreme BIOS has the memory set to 2133MHz and I just set it to 2666MHz last night before running those tests above. I'll double check.
-I've got two of the RTX2980Ti FEs coming next month, so this should help GPU performance. But, hopefully this is all moot when the NLEs software is updated in December to use the new Turin architecture and such.
-I was wondering using scratch disks was still a thing since the invention of raid SSDs and them being so fast, but it wouldn't hurt. Overkill works for me. lol I'll try using a pair of those 960s separately from my media raids.
-I'm slowly using Linux at work for software development, so it's growing on me. May do a reinstall at some point and try it out.

I'll let you know if I can get better performance with some of these suggestions. Thanks again!
 
Sweet! Thanks for the feedback. I'll definitely play around with some of these suggestions to see what works for my system or not. I'm not expert in any of this, but I can get it done.

-I'm definitely curious about the SMT setting. I believe I can only change that from the AMD Ryzan Master software as I don't recall seeing it in the BIOS. I was wondering why none of the auto overclocking software would clock better than 10% over base. The highest I can get is 3.3GHz on all cores.
-Also, I do use 8x16GB 2666 Memory, but only 96GB is showing up, so I believe a pair of my memory slots is not working. Need to test that out. I did notice by default the Asus ROG Zenith Extreme BIOS has the memory set to 2133MHz and I just set it to 2666MHz last night before running those tests above. I'll double check.
-I've got two of the RTX2980Ti FEs coming next month, so this should help GPU performance. But, hopefully this is all moot when the NLEs software is updated in December to use the new Turin architecture and such.
-I was wondering using scratch disks was still a thing since the invention of raid SSDs and them being so fast, but it wouldn't hurt. Overkill works for me. lol I'll try using a pair of those 960s separately from my media raids.
-I'm slowly using Linux at work for software development, so it's growing on me. May do a reinstall at some point and try it out.

I'll let you know if I can get better performance with some of these suggestions. Thanks again!

On the "memory not showing up" issue, this is a common encounter with TR4 boards. The retention design of the socket is among the worst I've ever seen so its challenging to get perfectly even torque across the entire chip. If you do not, some of the memory controller pins do not connect, and some of your RAM vanishes. So before you send RAM back, try re-doing the CPU retention.
 
On the "memory not showing up" issue, this is a common encounter with TR4 boards. The retention design of the socket is among the worst I've ever seen so its challenging to get perfectly even torque across the entire chip. If you do not, some of the memory controller pins do not connect, and some of your RAM vanishes. So before you send RAM back, try re-doing the CPU retention.

So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.
 
So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.

Having only 1 Asrock Fatal1ty X399 Professional Gaming and not having any problems I don't know if this is a commen problem, when the same happened with the TR2950x 2 things a possible, bad RAM or bad motherboard. Normally ASUS boards are highly rated.

When you want to know more about overclocking with your asus motherboard have a look over here

When you want to read more about overclocking the TR in general, have a look over here https://www.guru3d.com/articles_pages/amd_ryzen_threadripper_2990wx_review,31.html, or here https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review/13

With Resolve 4x16GB should be enough (with Fusion 128 works a lot better).
 
So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.

The good news is that once you have it locked in well it tends to be fine. But yeah after having done around 30 threadripper builds it is something that happens.
 
Back
Top