Weapon = FPGA???

Terry VerHaar · Mar 6, 2015

Something I just read prompted a question in my mind. Could one of the major evolutions doing to Weapon be that it is based on Field Programmable Gate Arrays (FPGA) rather than ASICs? I'm not an engineer so I don't even know if this is feasible or logical but I am curious.

Jacek Zakowicz · Mar 6, 2015

FPGA is a development platform for ASIC. ASIC is the next step. What weapon has is most likely 3rd gen ASIC (after MX ASIC and Dragon ASIC) RED ONE is FPGA based

David Ellison · Mar 7, 2015

What kind of improvements would that offer, Jacek?

Antony Newman · Mar 7, 2015

Switching from a 28nm ASIC to a 14nm ASIC (eg from Aletra) could reduce power requirements by ~70%.

It could also enable ~4 x the functionality to be packed into the same chip area.

Less chips = less interchip comms -> which could further reduce power requirements.

I am not actually speculating what 'nm' process RED will use in the Weapon - just giving an example of what could happen Brain Miniaturisation wise.

AJ

Terry VerHaar · Mar 7, 2015

Jacek Zakowicz said:
FPGA is a development platform for ASIC. ASIC is the next step. What weapon has is most likely 3rd gen ASIC (after MX ASIC and Dragon ASIC) RED ONE is FPGA based

Interesting. In the article, here:

http://www.redsharknews.com/busines...ize,-quadruples-revenues-due-to-odyssey-sales

They make it sound like an advanced technology. To quote, "These seemingly miraculous devices behave like hardware - which means they're very fast - but can be completely reprogrammed, even to the extent that they can be rebooted with new code and act like a completely different type of device. "

Gavin Greenwalt · Mar 7, 2015

The article right, FPGAs are cool. But they're also expensive, power hogs and slow by comparison. You want to move as many predictable functions into ASIC as possible. It's the same with processors. Sure a GPU *can* decode H264 but every video card manufacturer dedicates purpose built hardware because it uses way less power, generates less heat and costs less to produce than a general purpose processor.

Ryan Sims · Mar 7, 2015

As Jacek said, the RedOne used an FPGA. That's the reason it took over a minute to boot up. ASICs are like software baked in hardware. You can't change the code once it's designed. But, that also means there is no boot time to load the code. I know your thinking that there is still a boot time on an Epic and Scarlet that use ASICs. Yes, but it's just to load the code from the firmware that allows the camera to get software updates that really have proved useful in my book.
All the heavy lifting/real time processing of the code is in the ASICs.

Hence when you really have developed and refined the process, you need to burn new ASICs and build a new machine such as Weapon since you can't update the code in an ASIC.

On the other hand, FPGAs can be reprogrammed, but that comes with the cost of being slow to boot and slower to process the data with more demands of power and waste heat. Not something that lends itself to the realtime processing needed for the massive data in a RED. FPGAs got us started in REDones, as a proof of concept and were very flexible to update the camera as it evolved. I'll bet Weapon uses the latest ASIC magic pixies.

Terry VerHaar · Mar 7, 2015

Great information, guys. I am glad I asked.

Antony Newman · Mar 7, 2015

Boot times just come down to implementation of your startup software.

Blueray players that take 30 seconds to 'switch on' is an example of how NOT to do it.

iPad or 'wake from sleep' style would still require a retrospective hardware check - but are instant on.

In some camera makers firmware ... they use 'wait & hope' during their startup (eg before accessing CF) which is insane.

AJ

ericyoung · Mar 7, 2015

There's still a programmable element, otherwise camera firmware upgrades couldn't add significant new features.

Gavin Greenwalt · Mar 7, 2015

Sure but those are generalish purpose chips that allow that. The UI is I'm sure running on a rather normal ARM or MIPS CPU with Linux and a GPU while the non-programmable bits like encoding/decoding JP2k at 100 fps are handled in purpose built circuitry.

Adding something like the GIO scope takes 1 watt on a gpu. Something like decoding wavelets and debayer at 100 fps would take 500 watts on an ARM/GPU Combo.

Elsie N · Mar 7, 2015

Terry VerHaar said:
Great information, guys. I am glad I asked.

+1!

Antony Newman · Mar 7, 2015

Knocking up a false colour on an ARM v7 (like the ARM946ES) to display false colour did not take much in the way of CPU power (I think 2.8k @ 40fps with the chip running 500MHZ) would take <0.5 Watts.

Never looked into wavelet decompression ... but I suspect that these could be done on an iPhone6+ @24fps just using the CPU.
If I was a betting man ... I reckon Metal could probably keep the GPU busy with the debayer too.

Wild guess - I don't think that you need anywhere near 500W to do 100fps.

AJ

Gavin Greenwalt · Mar 8, 2015

It takes an octacore Intel Xeon running at several GHZ to decode 24fps 6k REDCODE/JP2k paired with an Nvidia 970 GTX to debayer. There are wavelets that are easier to decode (like Cineform) but REDCode/JP2k is not one of them. I know my 6 core machine has trouble with real-time 6k 24p and it's a hexacore intel processor paired with an Nvidia Titan. ARM v7 is not faster than intel server class chips, there's no reason to think an iphone6 can outperform my workstation.

Stacey Spears · Mar 8, 2015

Years ago ATI removed MPEG2 decoder HW from one of their GPUs to cut costs and tried to use the GPU for decoding. The CPU outperformed the GPU that year for MPEG2 decoding. Since then, every GPU has had a dedicated HW MPEG-2, AVC and VC-1 decoder as Gavin mentioned. The GPU on the Xbox One also has a HW MJPEG decoder for the video to/from Kinect. Most GPUs don't include HW MJPEG decoding.

HEVC is another beast. All CPUs are taxed on a current gen game console just to decode UHD HEVC at 24p. No HW HEVC decoders on any game console... yet. CABAC is not GPU friendly and is always done on the CPU if you don't have a dedicated HW decoder. Xbox 360 decodes AVC in GPU/CPU since AVC HW decoders did not exist back then.

The Lumagen video processors are all FPGA based. Early DVDO VP50 and 50Pro's were FPGA based. The later DVDO Duo was ASIC based. It took 45-minutes to update the FW over serial on a 50/50Pro and 5-seconds on a Duo.

FPGAs are great to work out the kinks or shipping in really low volume like a consumer video processor. But they are expensive.

Antony Newman · Mar 8, 2015

To effectively determine what could be done on something like an A8X ... I'd need to analyse the existing decompression code (not sure if this is available)

3 cores running 1.5GHz + an 8 core mobile GPU + hand optimised ASM might perform favourably when compared to a server running running stock compiled code.

...When I see how slow Excel can run on a 10 core 64GB trash can ... I get the feeling that some people shouldn't be allowed to code ;-)

AJ

EDIT : Maybe I should follow this and say that the only real time decompressor that I optimised (well rewrote in ASM) was a SGI routine. Wavelet decompression may well be 'unfriendly'

Gavin Greenwalt · Mar 8, 2015

Find me a JPEG2000 decoder that runs on an ARM chip and can handle 19 megapixel frames at 24fps. I'll be waiting a while because they don't exist. An A8X is on par with an Intel Atom not a Broadwell core. And even my i5 based system which smokes an Atom on CPU tasks is brought to its knees with 6k.

As Stacey points out even H265 struggles at 4k @ 24fps on modern CPUs and the MPEG group avoided wavelets specifically due to their processor taxation, H265 is still mostly a straight up DCT codec.

However if you think you can optimize RED's SDK to handle REDCode at even 60fps in real-time on a single Broadwell core (which has more power than an A8X) then draft out an example on JPEG2000 and I bet RED will pay you handsomely (as well many others in the industry).

Antony Newman · Mar 8, 2015

I note that even for JPEG2000, some compiled libraries (J2K-2000) are an order of magnitude faster than the open source (OpenJPEG);
that pre-calculated data in lookup tables can often make light work of computationally intensive algorithms;
that the time taken to process an image is dependent on reframing a problem (algorithm) into the best match for the target device.

In the case of the 500Mhz ARMv7 - there was need for per pixel square roots, and per pixel RGB->YUV transformation, with 7bit colour accuracy to match the LCD driver.
Ultimately, the instruction cache was replaced with lookup tables and partial calculation results -> and the heavy lifting rewritten iteratively in thousands of lines of assembler to squeeze ever drop of performance out of the thing.

If Red include the sending of an unencrypted R3D's format wirelessly (even 1/2 or 1/4 rate), and the SDK enables me to write the end-to-end display ... I would be tempted to see what an iOS device can do.

AJ

Wayne Morellini · Mar 27, 2015

Generally what people say is right, to deal with existing heat and power issues it is preferable not to use fpga. However it is likely there are some minor fpga circuite in there other things. Red used dogs Iythink in the original redone, but the costs themselves weight against it on the larger manufacturing volumes that pay for asci development.

However, there are lower power and higher speed FPGA technology out there. One technology I was looking at to make a gaming console with a new 3D technology, a decade ago was rated at 5GHz and looked like it could stretch out to 10-20Ghz (like a frypan most likely). But that was sucked up by the military.

Now as to asci beating processing arrays. I did come up with a scheme and estimated that it could beat reasonably complex asci in many situations and be dynamically programmable and replace analogue circiutes too (complex as in not really simple glue logic). Such a system can be cut into existing cuircuites such as TV logic boards to fix existing parts, or be made pin compatible, but in reality you just make new circuits to use these parts standardised. Tall order, but sci can beat it if the asci is finely dynamically segmented allowing all unused sections to be put on standby or turned off when not in use, as they are doing these days by the look if it, but still the scheme has an advantage in one area, and you save a heap on setting up manufacture and probably development, with development moving mostly in house uding very cheap standardised parts in house to manufacture, and fpgas becoming irrelevant mostly. It would be a bit of a brain twister of a scheme involving many of my technical logical advances.

Wayne Morellini · Mar 27, 2015

I forgot, Nvidia is now trying to roughly do a grain scheme onnyheur GPUs, which is rather like sending rocks down a an hour glass instead.

Welcome to our community

Be a part of something great, join today!

Weapon = FPGA???

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member