Vision-Based System Design Part 2 - Optimising SWaP-C in Embedded Vision Systems

Article Index

Aaron Behman, Director of Strategic Marketing, Embedded Vision, Xilinx, Inc.

Adam Taylor CEng FIET, Embedded Systems Consultant. 

A growing number of smart systems in the automotive, medical, industrial and scientific spaces are dependent on high-quality image capture and processing, often at high speed and in full colour. The preceding article of this series discussed selection criteria for image sensors. This article examines key challenges and decisions encountered when developing the image-processing system. 

Time to market can often be a critical pressure that can determine which aspects of the system are developed in-house, representing value-added activity, and which are purchased as Commercial Off The Shelf (COTS) blocks or subcontracted for development. Focusing on value-added activities and leveraging IP modules at the hardware, software and FPGA levels are key enabling factors to meeting time to market.

As far as the technical challenges are concerned, embedded vision systems are typically developed for applications where size, weight, power and cost - often called SWaP-C - are driving factors. One way to improve SWaP-C is through tighter system integration, particularly in the processing system.

Image-Processing Pipeline and Algorithms

Almost all embedded vision systems incorporate an image-processing pipeline that interfaces with the selected sensor and performs the operations required to produce an image suitable for either further processing or transmission over a network. 

Within this image processing pipeline, various algorithms are applied to the received images depending upon the application being implemented. There are a number of commonly used algorithms for processes such as sharpening the image, improving contrast, or detecting features, objects or movement. 

These algorithms should be developed within a framework that allows the shortest possible time to market and promotes re-use of proven IP, while reducing non-recurring and recurring engineering costs. A number of frameworks are worth considering. 

* OpenVX - Open-source application for development of image-processing applications

* OpenCV - Open-source Computer Vision, which comprises a number of libraries aimed at real-time computer vision based on C / C++ 

* OpenCL - Open-source Computer Language based upon C++ for developing applications for parallel processed applications as seen in GPU, FPGA, etc.

* SDSoC -Xilinx design environment that allows developers to initially implement algorithms written in C / C++ in the ARM(r) processing system of a Zynq(r) or UltraScale+ MPSoC device, profile the code base to identify performance bottlenecks, and then using Xilinx High Level Synthesis (HLS) to translate those bottlenecks into hardware-enabled IP that run in the programmable logic (PL) portion of the device.

Use of these frameworks coupled with HLS in a FPGA or All Programmable SoC design flow allows for efficient development of embedded vision applications which can be quickly demonstrated with hardware in the loop.

Processing Choices

Once the image completes the processing pipeline how the data is output from the system is also important. At the highest level there are three broad choices. One of these is to output the image to a display using a standard like VGA, HDMI, SDI or DisplayPort. On the other hand, the image (or information extracted from it) may be transmitted elsewhere, such as to The Cloud, for further processing. A third option is to store the images on non-volatile media to be accessed at a later date. 

For the majority of these high-level choices at the completion of the imaging chain, it is important to consider the image format to be used. This presents the choice of encoding the image using an industry-standard compression algorithm such as H.264 (MPEG-4 Part 10 Advanced Video Coding) or H.265 (High Efficiency Video Coding). Implementations of these algorithms are often called Codecs, and allow for more efficient utilisation of communication and network bandwidth or a reduction in the storage footprint, at the cost of a small loss of fidelity. N applications where such a trade-off is not acceptable, the image can be transmitted or stored in its raw format or encoded in a lossless format. 

Most codec implementations use a different colour space to that which is output by typical colour image sensors. The most commonly used colour spaces within embedded vision are: 

* Red, Green, Blue - This contains the RGB information as output from the image sensor, it is commonly used as an output for simple interfaces like VGA 

* YUV - This contains Luma (Y) and the chrominance (U & V), and is used for most codecs and some display standards. Commonly used formats are YUV4:4:4 and YUV4:2:2. With 4:4:4 each pixel is represented by eight bits making for a 24-bit pixel. With a 4:2:2 format the U and V values are shared between pixels allowing for a more memory-efficient 16-bit pixel depth.

One further decision that has a considerable impact on the image-processing chain and SWAP-C is the choice of where the majority of the image processing is to be implemented. This may be within the embedded vision system itself, which enables faster response times but also requires higher processing and memory resources, leading to higher power demand. This will be the most common approach for embedded applications like ADAS or machine vision.

Alternatively, performing processing in The Cloud requires the embedded vision system to be capable of capturing the image and transmitting it using network-enabled technology. This approach can be suitable for applications such as medical imaging or scientific research, where processing can be very intensive and real-time results are not required.

To implement the processing chain, the heart of an embedded vision system requires a processing core which is capable of not only controlling the selected image sensor but also receiving, implementing the image processing pipeline and transmitting the images over the desired network infrastructure, or to the chosen display. These demanding requirements often result in a selection of a FPGA or as in more and more cases an All Programmable System on a Chip. 

Xilinx Zynq All Programmable SoCs combine two high-performance ARM A9 processors with FPGA fabric. The Processor System (PS) can be used to communicate with a host over Gigabit Ethernet, PCIe or other interfaces like CAN while also performing general system housekeeping. The Programmable Logic (PL) section exploits the parallel nature of FPGA fabric to receive and process the images extremely efficiently.

If the images must be transmitted over a network, on-chip Direct Memory Access (DMA) controllers can be used to efficiently move image data from the PL to DDR memory in the PS. Once within the PS DDR memory it can also be accessed using DMA controllers of the selected transport medium. It is worth noting that the A9 processors can be used to perform further processing on the image within the PS DDR, and that the Zynq architecture also allows processed images to be moved from the PS DDR back into the image pipeline in the PL, thus giving maximum flexibility to choose the most efficient processing strategy. Figure 2 illustrates the tight integration of processing, memory control and interface functions within the Zynq device.


Following the sensor-selection guidance given in the first part of the series, this article has described a number of technologies, frameworks and devices that can be used to help satisfy stringent size, weight, power and cost (SWaP-C) constraints on high-performance embedded vision systems for demanding applications.

For more information, please visit:




T&M Supplement

The Annual T&M Supplement, sponsored by Teledyne LeCroy, was published in July. Click on the image above to read this exclusive report for free.

Follow us