You are here: Start » Technical Issues » Optimizing Image Analysis for Speed

Optimizing Image Analysis for Speed

General Rules

Rule #1: Do not compute what you do not need.

Use image resolution well fitted to the task. The higher the resolution, the slower the processing.
Use the inRoi input of image processing functions to compute only the pixels that are needed in further processing steps.
If several image processing operations occur in sequence in a confined region then it might be better to use CropImage at first.
Do not overuse images of types other than UInt8 (8-bit).
Do not use multi-channel images, when there is no color information being processed.
If some computations can be done only once, move them before the main program loop, or even to a separate function.

Rule #2: Prefer simple solutions.

Do not use Template Matching if more simple techniques as Blob Analysis or 1D Edge Detection would suffice.
Prefer pixel-precise image analysis techniques (Region Analysis) and the Nearest Neighbour (instead of Bilinear) image interpolation.
Consider extracting higher level information early in the program – for example it is much faster to process Regions than Images.

Rule #3: Mind the influence of the user interface.

Note that displaying data in the user interface takes much time, regardless of the UI library used.
Mind the Diagnostic Mode. Turn it off whenever you need to test speed. Diagnostic Mode can be turn off or on by EnableAvlDiagnosticOutputs function. One can check, if Diagnostic Mode is turned on by GetAvlDiagnosticOutputsEnabled function.
Before optimizing the program, make sure that you know what really needs optimizing. Measure execution time or use a profiler.

Common Optimization Tips

Apart from the above general rules, there are also some common optimization tips related to specific functions and techniques. Here is a check-list:

Template Matching: Prefer high pyramid levels, i.e. leave the inMaxPyramidLevel set to atl::NIL, or to a high value like between 4 and 6.
Template Matching: Prefer inEdgePolarityMode set not to Ignore and inEdgeNoiseLevel set to Low.
Template Matching: Use as high values of the inMinScore input as possible.
Template Matching: If you process high-resolution images, consider setting the inMinPyramidLevel to 1 or even 2.
Template Matching: When creating template matching models, try to limit the range of angles with the inMinAngle and inMaxAngle inputs.
Template Matching: Consider limiting inSearchRegion. It might be set manually, but sometimes it also helps to use Region Analysis techniques before Template Matching.
Do not use these functions in the main program loop: CreateEdgeModel1, CreateGrayModel, TrainOcr_MLP, TrainOcr_SVM.
If you always transform images in the same way, consider functions from the Image Spatial Transforms Maps category instead of the ones from Image Spatial Transforms.
Do not use image local transforms with arbitrary shaped kernels: DilateImage_AnyKernel, ErodeImage_AnyKernel, SmoothImage_Mean_AnyKernel. Consider the alternatives without the "_AnyKernel" suffix.
SmoothImage_Median can be particularly slow. Use Gaussian or Mean smoothing instead, if possible.

Library-specific Optimizations

There are some optimization techniques that are available only in Aurora Vision Library and not in Aurora Vision Studio. These are:

In-Place Data Processing

See: In-Place Data Processing.

Re-use of Image Memory

Most image processing functions allocate memory for the output images internally. However, if the same object is provided in consecutive iterations and the dimensions of the images do not change, then the memory can be re-used without re-allocation. This is very important for the performance considerations, because re-allocation takes time which is not only significant, but also non-deterministic. Thus, it is highly advisable to move the image variable definition before the loop it is computed in:

// Slow code
while (...)
{
	Image image2;
	ThresholdImage(image1, atl::NIL, 128.0f, atl::NIL, 0.0f, image2);
}

// Fast code
Image image2;
while (...)
{
	ThresholdImage(image1, atl::NIL, 128.0f, atl::NIL, 0.0f, image2);
}

// Fast code (also in the first iteration)
Image image2(752, 480, PlainType::UInt8, 1, atl::NIL);	// memory pre-allocation (dimensions must be known)
while (...)
{
	ThresholdImage(image1, atl::NIL, 128.0f, atl::NIL, 0.0f, image2);
}

Skipping Background Initialization

Almost all image processing functions of Aurora Vision Library have an optional inRoi parameter, which defines a region-of-interest. Outside this region the output pixels are initialized with zeros. Sometimes, when the rois are very small, the initialization might take significant time. If this is an internal operation and the consecutive operations do not read that memory, the initialization can be skipped by setting IMAGE_DIRTY_BACKGROUND flag in the output image. For example, this is how dynamic thresholding is implemented internally in AVL, where the out-of-roi pixels of the blurred image are not meaningful:

Image blurred;
blurred.AddFlags(IMAGE_DIRTY_BACKGROUND);
SmoothImage_Mean(inImage, inRoi, inSourceRoi, atl::NIL, KernelShape::Box, radiusX, radiusY, blurred);
ThresholdImage_Relative(inImage, inRoi, blurred, inMinRelativeValue, inMaxRelativeValue, inFuzziness, outMonoImage);

Library Initialization

Before you call any AVL function it is recommended to call the InitLibrary function first. This function is responsible for precomputing library's global data. If it is not used explicitly, it will be called within the first invocation of any other AVL function, taking some additional time.

Configuring Parallel Computing

The functions of Aurora Vision Library internally use multiple threads to utilize the full power of multi-core processors. By default they use as many threads as there are physical processors. This is the best setting for majority of applications, but in some cases another number of threads might result in faster execution. If you need maximum performance, it is advisable to experiment with the ControlParallelComputing function with both higher and lower number of threads. In particular:

If the number of threads is higher than the number of physical processors, then it is possible to utilize the Hyper-Threading technology.
If the number of threads is lower than the number of physical processors (e.g. 3 threads on a quad-core machine), then the system has at least one core available for background threads (like image acquisition, GUI or computations performed by other processes), which may improve its responsiveness.

Configuring Image Memory Pools

Among significant factors affecting function performance is memory allocation. Most of the functions available in Aurora Vision Library re-use their memory buffers between consecutive iterations which is highly beneficial for their performance. Some functions, however, still allocate temporary image buffers, because doing otherwise would make them less convenient in use. To overcome this limitation, there is the function ControlImageMemoryPools which can turn on a custom memory allocator for temporary images.

There is also a way to pre-allocate image memory before first iteration of the program starts. For this purpose use the InspectImageMemoryPools function at the end of the program, and – after a the program is executed – copy its outPoolSizes value to the input of a ChargeImageMemoryPools function executed at the beginning. In some cases this will improve performance of the first iteration of program.

Using GPGPU/OpenCL Computing

Some functions of Aurora Vision Library allow to move computations to an OpenCL capable device, like a graphics card, in order to speed up execution. After proper initialization, OpenCL processing is performed completely automatically by suitable functions without changing their use pattern. Refer to "Hardware Acceleration" section of the function documentation to find which functions support OpenCL processing and what are their requirements. Be aware that the resulting performance after switching to an OpenCL device may vary and may not always be a significant improvement relative to CPU processing. Actual performance of the functions must always be verified on the target system by proper measurements.

To use OpenCL processing in Aurora Vision Library the following is required:

a processing device installed in the target system supporting OpenCL C language in version 1.1 or greater,
a proper and up-to-date device driver installed in the system,
a proper OpenCL runtime software provided by its vendor.

OpenCL processing is supported for example in the following functions: RgbToHsi, HsiToRgb, ImageCorrelationImage, DilateImage_AnyKernel.

To enable OpenCL processing in functions an AvsFilter_InitGPUProcessing function must be executed at the beginning of a program. Please refer to that function documentation for further information.

Previous: Data Types Visualizers

Next: Migrate from other library