Welcome to our community

Be a part of something great, join today!

  • Hey all, just changed over the backend after 15 years I figured time to give it a bit of an update, its probably gonna be a bit weird for most of you and i am sure there is a few bugs to work out but it should kinda work the same as before... hopefully :)

Workstation Benchmark Rankings

Jon Thomasberg

Well-known member
Joined
Jan 2, 2010
Messages
253
Reaction score
0
Points
16
Location
Cashburn, VA
[Set aside for future Rankings List info here]
 
....also known as measuring-up, mine is bigger than yours, etc. I move that we all have a nice friendly competition for the fastest workstation. I will (attempt) to keep the Rankings List.

This will serve 2 purposes:
1) To have fun with it and see how your machine stacks up to others here on the REDUSER forum; and,
2) To perhaps shed some insight from those systems that perform better than others for anyone looking to make new hardware purchasing decisions.

I know there are other leaderboards of every system in the world based on one benchmark app. This one is just for the REDUSER community. Feel free to add to the suggestion box for benchmarks to be included.

I'll start with the system I just built for editing + color grading. Ive been all Mac for the last 5 yrs, so this is my first WinPC in as many years:

i7-3930k (3.2GHz, overclocked stable to 5.0GHz), Corsair H100 closed-loop hydro CPU cooler
Asus P9X79 Deluxe motherboard
32 GB (4x8) Corsair Vengence quad DDR3-1866-CL9 RAM
***MSI n580GTX Lightning Extreme 3GB video card (will be; now its an old nVidia GeForce7900GS w/ 256MB (from 4 yrs ago) until next week when the new one comes in.)
No RedRocket
2x OCZ Vertex 3 240GB SSD (in RAID 0 config -- benching sustained 1000MB/sec) as OS drive
1x 500GB WD SATA2 7200RPM, until prices come back to reality for multiple WD RE4 2TBs
Pioneer BDR
Cooler Master HAF X case w/ 4x 200mm + 1x 140mm case fans + 2x 120mm fans for push-pull config on the H100 CPU radiator
Corsair AX1200 Gold PSU
OS = Windows7 Pro x64 (w/ all patches & ServicePacks up-to-date)

Benchmarks: (as of today, 2011/12/10)
Cinebench 11.5 = no OpenGL tested; multiCPU = 14.44
GeekBench2_x64Win = 28247

and of course -- pix or it didn't happen :)

Benchmarks_DivaProd1.PNG
 
Last edited:
NICE Cinebench score!

BTW Resolve Beta for Win is on Blackmagic's site! Looking forward to your numbers, with it when you get your GPU, good man!

We need a ResolveBench...

Also, how loud is it and what is the power draw? You guys are tempting me...

Bruce Allen
www.boacinema.com
 
Bruce,

I haven't checked it on a decibel meter, but I can tell you it is VERY quiet. The large diameter fans make all the difference in the world. Moves lots of CFM without sounding like it will take off. Honestly, the Epic while in capture mode is about the same db level. Epic in standby is significantly louder than this computer.

Also, while stable at relative load. After a few hours running PRIME 95 and AIDA Extreme to burn it in with 100% CPU, I decided to dial it back to 4.9 GHz. This changed the Cinebench score to a still very respectable 14.33, and as of yet I haven't rerun Geekbench on it. But it is perfectly stable at 4.9GHz running AIDA full-bore for the last 12 hours straight. Core temps max are 82 C, avg 74 C. Case is at avg 22 C. Which is only +1 or 2 C over my room ambient temp.

I forgot to add that I am running a Corsair Gold AX1200 PSU, but I have not metered it to see the total current draw. I will once I add the GTX580, since right now it would be pointless with the puny card that's in there.

On a side note, I am able to playback in slightly faster than realtime in RC-X Pro beta8 @ 1/2 debayer on 4kHD R3Ds, no HDRx with the current setup (and no Rocket). But also noticed that my main 6 CPU cores are only hitting up to 55% utilization and not using the hyperthreads at all. Perhaps because the video card can't keep up at this point. More to come.

Is there such a benchmark either within Resolve or as a standalone?
 
That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!
 
I built a similar system. On premiere pro, there is no way in hell you will get real time 4k playback. The Red sdk that people are using for the r3d decode and demosaic is not very multi threaded and therefor crippled. The gpu does little to help, as it assists in other image processing *after* the sdk makes an rgb image for the app.
Even if the sdk was allowed to use all cores, it still wouldn't give you fast playback, as the jpeg2000 is hard to decode. The gpu could help with the demosaic, but that is not possible at the moment.

I will be working with S3D, so I built this fast machine to try to help. I am ok with half res for editing and color, so that helps a bit. There is no debayer when looking at half res.

-Les

That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!
 
I built a similar system. ......

Les, care to share the system specs and your results?

That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!

Thanks Subhadip. It seems pretty impressive so far. I will share my findings once it is all completed.
 
Asus p9x79 pro mobo, the 3.2 cpu OC to 4.3 ( I like to keep things cool , stays at 40 C ). Threw away the H100 fans, they are not compatible with the Asus PWM fan control.
With 4 pin PWM fans, the system is very quiet, 700 rpm fans at idle !
Sandra memory bandwidth 40 GB/sec .... this is where the system helps, in image processing the memory bandwidth is critical.

On benchmarking, still waiting to see how to use redline to saturate the cores and ACTUALLY USE THE MACHINE .... frustrating. I'll probably end up transcoding to another wavelet codec using spare cycles on the render farm at work.
A hundred i7's with GTX580's should do the trick ;)

-Les
 
Is there a memory bandwidth benchmark for Mac?

Also, don't OCZ's suffer from incompressible data bottlenecks (of which video definitely is, even RED footy), or was that just the Vertex 2s and earlier?
 
Maybe Sir Jon of Thomasberg has the answer to this one. I posted on another thread and nobody answered.
Can anyone confirm the "huge performance gain" of SSD Caching on the latest Intel X79 systems?
The newer ASUS PGX79 boards allow for more caching than the Windows default limitation.
I'm wondering if I can put my page file and cache on a 90G Corsair GT and really utilize the 6GB/S speeds.​
 
Hi Carter,

I have yet to use the SSD caching option on the asus mobo. My config is using 2 SSDs on the Intel SATA3 ports. The SSD Caching requires a single SSD and single HDD on each of the Marvell chipset ports to function. It dedicates that SSD to caching the HDD. I didn't want to waste an SSD on cache exclusively.

Normally, I would have my pagefile.sys on a separate drive from the OS, but since I have 2 SSDs RAID 0 and getting >1000MB/sec, I just manually locked the pagefile size, rather than letting Windows manage it, and left it on the OS drive. If/when I add another SSD, I will likely put the pagefile on it and use it for the SCRATCH volume on editing. But as it is, nothing hits capacity of my 32 GB of RAM as it is.

Overall though, the SSD Caching feature is not really useful since most of what we do is uncompressed sequential reads/writes. These aren't heavily cached anyways. If you want that kind of speed, use SSDs exclusively for that purpose OR get an Areca hardware-based controller and a bunch of HDDs to get the performance out of them.

But to directly answer your question: No, I can neither confirm or deny that Asus SSD caching works.

Side note: As Jeff Kilgroe pointed out in another thread, HDD manufacturers are now selling SATA3 6G HDDs, that's a joke. No single spinning drive can hit, much less sustain those speeds, and the onboard caches are not any larger. Waste of money over the SATA2 3GB versions.
 
Little bit of an update on my workstation build:

RC-X Pro beta8 Win7 edition:
-Full Res:
--no 5k realtime playback (not even usable -- quite annoying, in fact)
--no 4kHD realtime playback (plays, but hiccups every 5 seconds)

-Half Res:
--no 5k realtime playback
--no 4kHD realtime playback (plays but hiccups every 20 seconds)

HOWEVER, on PremierPro CS5.5.1:
-Full Res:
--no 5k realtime playback (plays back but skips a lot of frames, plays approx 4-5 fps)
--no 4kHD realtime playback (plays back but skips some frames, plays approx 12-15fps)

-Half Res:
--5k realtime playback !!!!
--4kHD realtime playback !!!!

Also, rendering for FULL RES playback in timeline preview I tested are:
-5k clip, no HDRx, 24 fps 00:02:58:19 (4291 frames) = 4min 25sec
-4kHD clip, no HDRx, 24 fps 00:03:25:13 (4933 frames) = 2min 48 sec

Notes:
-In RCx Pro Beta 8, even after I maxed out the performance settings, I could not get it to use more than 40% load on my CPU cores. Most of the the time fluctuating between 32-40%. Also, the load was not equally distributed among all 12 threads. Thus, many of the threads were ~10% utilization. Memory usage never even came close to maxed-out /saturation on either system RAM or Video RAM on my GTX580. So it looks like a code thing in RCX Pro that is limiting it from taking advantage of all the horsepower.

-In PP CS5.5.1, it hit and maintained 90% CPU load +/- 3% throughout rendering, and was well-balanced across all processor threads. Again, at no point did I come even remotely close to saturating the System RAM or VRAM.

Also, monitoring the SATA throughput, my 2 RAID-0 OCZ Vertex 3 SSDs performed flawlessly and never hit saturation.

Overall, I am very pleased with this build. If there are any test that you all would like run, feel free to ask. Hopefully this will help you all in basing decisions for any new workstation builds you all are contemplating.

One downfall to this, after using this workstation, my decked-out iMac 27" seems pathetically slow in comparison.
 
In RCx Pro Beta 8, even after I maxed out the performance settings, I could not get it to use more than 40% load on my CPU cores. Most of the the time fluctuating between 32-40%. Also, the load was not equally distributed among all 12 threads. Thus, many of the threads were ~10% utilization. Memory usage never even came close to maxed-out /saturation on either system RAM or Video RAM on my GTX580. So it looks like a code thing in RCX Pro that is limiting it from taking advantage of all the horsepower.

Urgh. That is a major disappointment. Why the hell isn't RCX leveraging all the power it can? Why no GPU/CUDA support? Because of RED Rocket, I guess.
 
Urgh. That is a major disappointment. Why the hell isn't RCX leveraging all the power it can? Why no GPU/CUDA support? Because of RED Rocket, I guess.

Whoa whoa... I mean, yes, that's a logical reason NOT to implement GPU/CUDA (or even proper multi-threading support), but I think the real reason is solely complexity. When it comes to using GPUs for general processing, both AMD/ATi and nVidia have their own way of doing things, which sucks because it's not a one-code-is-efficient-on-all-graphics-hardware situation, meaning RED would have to write different sets of code for ATi and nVidia (bleh.) And when it comes to multi-threading/multi-CPUs, it's actually way more difficult to write code that can split itself up properly/efficiently between the different cores/threads.

I think the easiest/quickest thing that RED should do is render different clips on different cores/threads. That way, each core/thread is handling it's own clip(s) and it'd be kind of a brute-force way of using multi-cpus. Not efficient, but still effective. I know in windows you could set the affinity of applications pretty easily, so theoretically, if you ran multiple instances of RCXp and set each one to a different core/thread, and it should work pretty good. But of course that's a bit of a pain in the ass from the end-user perspective... But still it'd be 4 or 8 times faster than just setting up a single batch transcode on one core (it also assumes you have more than one clip that needs transcoding and they all use the same look settings.)
 
Last edited:
Writing code to multithread is not that hard when you have discreet frames. You just fire off a separate thread for each of several frames. There is no interframe compression that I know of.
There is a question I have, however : Can an SDK caller call the SDK multiple times for the same r3d in seperate threads, to decode the frames faster? It may be that the SDK prevents that sort of activity.
Most of the work decoding r3d is the j2000 decode, that is not very easy to do on a GPU ( CUDA or open-cl ). But the demosaic is very doable in a GPU. It wouldn't speed things much, it's not the hard part.

You are free however to decode multiple clips simultaneously yourself. Just use redline commands. Not GUI but doable.
I have tripled my speed this way. So someone with an old i7 920 can 'out benchmark' a less savvy SB-e owner, as far as bulk trans-coding goes !!

-Les Dittert
Whoa whoa... I mean, yes, that's a logical reason NOT to implement GPU/CUDA (or even proper multi-threading support), but I think the real reason is solely complexity. When it comes to using GPUs for general processing, both AMD/ATi and nVidia have their own way of doing things, which sucks because it's not a one-code-is-efficient-on-all-graphics-hardware situation, meaning RED would have to write different sets of code for ATi and nVidia (bleh.) And when it comes to multi-threading/multi-CPUs, it's actually way more difficult to actual write code that can split itself up properly/efficiently between the different cores/threads.

I think the easiest/quickest thing that RED should do is render different clips on different cores/threads. That way, each core/thread is handling it's own clip(s) and it'd be kind of a brute-force way of using multi-cpus. Not efficient, but still effective. I know in windows you could set the affinity of applications pretty easily, so theoretically, if you ran multiple instances of RCXp and set each one to a different core/thread, and it should work pretty good. But of course that's a bit of a pain in the ass from the end-user perspective... But still it'd be 4 or 8 times faster than just setting up a single batch transcode on one core (it also assumes you have more than one clip that needs transcoding and they all use the same look settings.)
 
Writing code to multithread is not that hard when you have discreet frames. You just fire off a separate thread for each of several frames. There is no interframe compression that I know of.
There is a question I have, however : Can an SDK caller call the SDK multiple times for the same r3d in seperate threads, to decode the frames faster? It may be that the SDK prevents that sort of activity.
Most of the work decoding r3d is the j2000 decode, that is not very easy to do on a GPU ( CUDA or open-cl ). But the demosaic is very doable in a GPU. It wouldn't speed things much, it's not the hard part.

You are free however to decode multiple clips simultaneously yourself. Just use redline commands. Not GUI but doable.
I have tripled my speed this way. So someone with an old i7 920 can 'out benchmark' a less savvy SB-e owner, as far as bulk trans-coding goes !!

-Les Dittert

That's what I wanted to know. Potentially, the more cores/threads the faster it could be; which means if you had dual octo-xeons (16c/32threads), you could render 32 clips simultaneously (aka. a brute-force method for getting 32x faster overall render times.) To me, that'd be far easier than actually making a renderer that sends each frame to a new core/thread (which, as you suggest, shouldn't be that difficult, either.) And, yeah, I totally didn't even think about it as discreet frames; that should be pretty simple to do as well.

Les, you should start a thread outlining how to pound out more clips simultaneously using Redline commands. I think I asked you in another thread how you were doing it; it'd be really helpful to a lot of people who can't afford, or are reluctant to buy, a redrocket and just getting their scarlets. If I knew how to make a simple GUI for the process I would; a simple list of the clips and their corresponding RMD(s) and go. As I said, it'd be ~8x faster than just using a single core if you have a Quad-CPU with Hyperthreading.

If the demosaic is easily doable on the GPU it would at least allow full-res 4k/5k playback of .r3ds in the NLE. Not really necessary, but still helpful.

Be careful what you call "old"; I'm rocking a 920 at 4.2GHz and it still whips the lama's ass in most cases :) I was actually thinking about getting a used 970 (because it's 1366 hexacore) to hold me over until octocores become available... I could just drop it in, and bam, instantly go from 8 to 12 threads for a mere ~$300... Alas, if it doesn't clock to 4.2GHz+, it might actually be slower overall.
 
Thanks Jon. There always seems to be some issue somewhere stepping up to 5K full-res - it brings about a sharp drop. On a Core i7 970 overclocked, 4K material is about 15fps at full-res, so I was expecting something more with the 3930K. Particularly impressive that 4KHD full-res renders faster than real-time. Either way - half-res suits me just fine. At 5K half-res, it maxes out the resolution of my monitor. So unless you have a 4K monitor, full-res playback is unnecessary. And by the time 4K monitors are more affordable, I hope the RED SDK is optimized and of course, CPU performance will advance further, to make real-time at 4K possible. Of course, if only GPU debayering could be worked out... It's true that it will cannibalize Rocket sales though.

One downfall to this, after using this workstation, my decked-out iMac 27" seems pathetically slow in comparison.

Indeed! Probably doesn't cost much more either.
 
Don't get me wrong - I love RED in many aspects, but they are also a company that needs to make money and not cannibalize on their own products. But by keeping all of the debayer stuff inside their own SDK (which is fully understandable from an IP standpoint) they are also 'crippling' the post community and making themselves the bottleneck in keeping up with technological advancements in areas such as fully utilizing the power of current generation GPUs. I'm 110% sure it's possible to get full debayer of R3Ds in 5K with some of the latest gen nvidia cards. Theres quite a few JPeg2000 GPU projects out there - some of which are leveraging CUDA technology. We all know the Red Rocket card wasn't developed by RED do the question is if we will se a Red Rocket II or if they will be 'forced' to go in another direction. We need 5K realtime playback soon!
 
So far I have tried to merely present facts and benchmarks without adding much commentary. But GPU accelerated debayer would be great ultimately, but I was thinking more along the lines that it would seem reasonable if they would unlock/optimize RCX Pro to even use merely the CPU cores and threads to their potential. However, if they were able to enlist MPE into the mix, I am confident I would be able to render 5k realtime (or close to it) given my current build.

RedRocket is still viable and useful for those with systems that cannot handle such load without the assistance of the card, but it would seem that if the system were able to handle the load natively on its own merit, the SDK shouldn't hamper/throttle its performance just to sell proprietary J2000 cards. Given how gracious Red has been with free firmware updates and free software, I highly doubt that this is Red's motive, as some have alluded. With that said, I am not Rob, nor is my strength in coding, but it seems reasonable that with some tweaking of code RCX Pro should be able to enable full CPU usage for software-based debayer.
 
Back
Top