You are here: Start » Machine Vision Guide » Deep Learning

Deep Learning

Table of contents:

Introduction
Anomaly Detection
Feature Detection
Object Classification
Instance Segmentation
Point Location
Troubleshooting

1. Introduction

Deep Learning is a breakthrough machine learning technique in computer vision. It learns from training images provided by the user and can automatically generate solutions for a wide range of image analysis applications. Its key advantage, however, is that it is able to solve many of the applications which have been too difficult for traditional, rule-based algorithms of the past. Most notably, these include inspections of objects with high variability of shape or appearance, such organic products, highly textured surfaces or natural outdoor scenes. What is more, when using ready-made products, such as our Deep Learning Add-on, the required programming effort is reduced almost to zero. On the other hand, deep learning is shifting the focus to working with data, taking care of high quality image annotations and experimenting with training parameters – these elements actually tend to take most of the application development time these days.

Typical applications are:

detection of surface and shape defects (e.g. cracks, deformations, discoloration),
detecting unusual or unexpected samples (e.g. missing, broken or low-quality parts),
identification of objects or images with respect to predefined classes (i.e. sorting machines),
location, segmentation and classification of multiple objects within an image (i.e. bin picking),
product quality analysis (including fruits, plants, wood and other organic products),
location and classification of key points, characteristic regions and small objects.

The use of deep learning functionality includes two stages:

Training – generating a model based on features learned from training samples,
Inference – applying the model on new images in order to perform the actual machine vision task.

The difference to the traditional image analysis approach is presented in the diagrams below:

Traditional approach: The algorithm must be designed by a human specialist.

Machine learning approach: We only need to provide a training set of labeled images.

Overview of Deep Learning Tools

Anomaly Detection – this technique is used to detect anomalous (unusual or unexpected) samples. It only needs a set of fault-free samples to learn the model of normal appearance. Optionally, several faulty samples can be added to better define the threshold of tolerable variations. This tool is useful especially in cases where it is difficult to specify all possible types of defects or where negative samples are simply not available. The output of this tool are: a classification result (normal or faulty), an abnormality score and a (rough) heatmap of anomalies in the image.

An example of a missing object detection using DL_DetectAnomalies2 tool.
Left: The original image with a missing element. Right: The classification result with a heatmap of anomalies.

Feature Detection (segmentation) – this technique is used to precisely segment one or more classes of pixel-wise features within an image. The pixels belonging to each class must be marked by the user in the training step. The result of this technique is an array of probability maps for every class.

An example of image segmentation using DL_DetectFeatures tool.
Left: The original image of the fundus. Right: The segmentation of blood vessels.

Object Classification – this technique is used to identify an object in a selected region with one of user-defined classes. First, it is necessary to provide a training set of labeled images. The result of this technique is: the name of detected class and a classification confidence level.

An example of object classification using DL_ClassifyObject tool.

Instance Segmentation – this technique is used to locate, segment and classify one or multiple objects within an image. The training requires the user to draw regions corresponding to objects in an image and assign them to classes. The result is a list of detected objects – with their bounding boxes, masks (segmented regions), class IDs, names and membership probabilities.

An example of instance segmentation using DL_SegmentInstances tool. Left: The original image. Right: The resulting list of detected objects.

Point Location – this technique is used to precisely locate and classify key points, characteristic parts and small objects within an image. The training requires the user to mark points of appropriate classes on the training images. The result is a list of predicted point locations with corresponding class predictions and confidence scores.

An example of point location using DL_LocatePoints tool. Left: The original image. Right: The resulting list of detected points.

Basic Terminology

You do not need to have the specialistic scientific knowledge to develop your deep learning solutions. However, it is highly recommended to understand the basic terminology and principles behind the process.

Deep neural networks

Adaptive Vision provides access to several standardized deep neural networks architectures created, adjusted and tested to solve industrial machine vision tasks. Each of the networks is a set of trainable convolutional filters and neural connections which can model complex transformations of an image with the goal to extract relevant features and use them to solve a particular problem. However, these networks are useless without proper amount of good quality data provided for training process. This documentation presents necessary practical hints on creating an effective deep learning model.

Depth of a neural network

Due to various levels of task complexity and different expected execution times, the users can choose one of five available network depths. The Network Depth parameter is an abstract value defining the memory capacity of a neural network (i.e. the number of layers and filters) and the ability to solve more complex problems. The list below gives hints about selecting the proper depth for a task characteristics and conditions.

Low depth (value 1-2)
- A problem is simple to define.
- A problem could be easily solved by a human inspector.
- A short time of execution is required.
- Background and lighting do not change across images.
- Well-positioned objects and good quality of images.
Standard depth (default, value 3)
- Suitable for a majority of applications without any special conditions.
- A modern CUDA-enabled GPU is available.
High depth (value 4-5)
- A big amount of training data is available.
- A problem is hard or very complex to define and solve.
- Complicated irregular patterns across images.
- Long training and execution times are not a problem.
- A large amount of GPU RAM (≥4GB) is available.
- Varying background, lighting and/or positioning of objects.

Tip: Test your solution with a lower depth first, and then increase it if needed.

Note: A higher network depth will lead to a significant increase in memory and computational complexity of training and execution.

Training process

Model training is an iterative process of updating neural network weights based on the training data. One iteration involves some number of steps (determined automatically), each step consists of the following operations:

selection of a small subset (batch) of training samples,
calculation of an error measure for these samples,
updating the weights to achieve lower error for these samples.

At the end of each iteration, the current model is evaluated on a separate set of validation samples selected before the training process. Validation set is automatically chosen from the training samples. It is used to simulate how neural network would work with real images not used during training. Only the set of network weights corresponding with the best validation score at the end of training is saved as the final solution. Monitoring the training and validation score (blue and orange lines in the figures below) in consecutive iterations gives fundamental information about the progress:

Both training and validation scores are improving – keep training, the model can still improve.
Both training and validation scores has stopped improving – keep training for a few iterations more and stop if there is still no change.
Training score is improving, but validation score has stopped or is going worse – you can stop training, model has probably started overfitting to your training data (remembering exact samples rather than learning rules about features). It may also be caused by too small amount of diverse samples or too low complexity of the problem for a network selected (try lower Network Depth).

An example of correct training.

A graph characteristic for network overfitting.

The above graphs represent training progress in the Deep Learning Editor. The blue line indicates performance on the training samples, and the orange line represents performance on the validation samples. Please note the blue line is plotted more frequently than the orange line as validation performance is verified only at the end of each iteration.

Stopping Conditions

The user can stop the training manually by clicking the Stop button. Alternatively, it is also possible to set one or more stopping conditions:

Iteration Count – training will stop after a fixed number of iterations.
Iterations without Improvement – training will stop when the best validation score was not improved for a given number of iterations.
Time – training will stop after a given number of minutes has passed.
Validation Accuracy or Validation Error – training will stop when the validation score reaches a given value.

Preprocessing

To adjust performance to a particular task, the user can apply some additional transformations to the input images before training starts:

Downsample – reduction of the image size to accelerate training and execution times, at the expense of lower level of details possible to detect. Increasing this parameter by 1 will result in downsampling by the factor of 2 over both image dimension.
Convert to Grayscale – while working with problems where color does not matter, you can choose to work with monochrome versions of images.

Augmentation

In case when the number of training images can be too small to represent all possible variations of samples, it is recommended to use data augmentations that add artificially modified samples during training. This option will also help avoiding overfitting.

Available augmentations are:

Luminance – change brightness of samples by a random percentage (between -ParameterValue and +ParameterValue) of pixel values (0-255).
Noise – modify samples with uniform noise. Value of each channel and pixel is modified separately, by random percentage (between -ParameterValue and +ParameterValue) of pixel values (0-255).
Gaussian Blur – blur samples with a kernel of a size randomly selected between 0 and the provided maximum kernel size.
Rotation – rotate samples by a random angle between -ParameterValue and +ParameterValue. Measured in degrees.
Flip Up-Down – reflect samples along the X axis.
Flip Left-Right – reflect samples along the Y axis.
Relative Translation – translate samples by a random shift, defined as a percentage (between -ParameterValue and +ParameterValue) of the tile (in Detect Features, Locate Points, Detect Anomalies 1 Local and Detect Anomalies 2) or the image size (in Classify Object, Segment Instances and Detect Anomalies 1 Global). Works independently in both X and Y dimensions.
Scale – resize samples relatively to their original size by a random percentage between the provided minimum scale and maximum scale.
Horizontal Shear – shear samples horizontally by an angle between -ParameterValue and +ParameterValue. Measured in degrees.
VerticalShear – analogous to Horizontal Shear.

Warning: the choice of augmentation options depends only on the task we want to solve. Sometimes they might be harmful for quality of a solution. For a simple example, the Rotation should not be enabled if rotations are not expected in a production environment. Enabling augmentations also increases the network training time (but does not affect execution time!)

2. Anomaly Detection

Deep Learning Add-on provides two variants of the DetectAnomalies tool, representing two different approaches to anomaly detection in images. The DL_DetectAnomalies1 tool (reconstructive approach) uses deep neural networks to remove defects from the input image by reconstructing the affected regions. The DL_DetectAnomalies2 tool (classificational approach) uses deep neural networks to classify if a region contains a defect or not. The selection of a tool depends on the characteristics and requirements of the application. Some guidelines are presented in the table below.

Which problem am I facing?
Textile and surfaces with regular or irregular patterns	Containers contents analysis: missing or additional elements. Examples include chocolate boxes or ready-to-take meals	Large object analysis, highly heterogeneous content of a single image, similar images in a whole dataset
Which tool should I choose? (best fitting at the top)
DA1 Local: Denoising	DA1 Global (if separate objects are relatively large and with considerable spacing)	Detect Anomalies 2 or DA1 Local: Featurewise
Detect Anomalies 2 or DA1 Local: Featurewise	Detect Anomalies 2	DA1 Global
DA1 Local: Contextual	DA1 Local: Featurewise	DA1 Local: Denoising

Global and Local network types

The DL_DetectAnomalies1 filter has an option to choose between Local and Global types of processing. The default and the most easy-to-use one is the Global processing. It is used to analyze images holistically and is recommended for typical applications where objects are well-positioned and have large defects, like in the image below. The Local processing is more computationally expensive and it is used to analyze images in fragments of size determined by the Feature Size parameter.

An example of cookie shape defect detection using the Global processing of the DL_DetectAnomalies1 filter.

Denoising, Contextual and Featurewise Approaches (Local network type only)

The DL_DetectAnomalies1 Local network provides three ways of defect detection:

Denoising
Contextual
Featurewise

Both the Denoising and Contextual approaches are based on reconstructing an image without defects and then comparing it with the original one. The first one filters out all patterns smaller than Feature Size that were not present in the training set. The latter one reconstructs proper image fragments by using their neighborhood (context). The Denoising approach is the default one and is a few times faster than the Contextual approach. It is recommended to use the Contextual approach when the Denoising approach has problems with removing some specific defects. The third approach - Featurewise - is based on a different mechanism than the previous two: it compares image characteristics (features) of image fragments. This approach is recommended when defects are difficult to distinguish from their background, e.g. lack of piece of chocolate in the dark brown packaging.

Feature Size

This parameter corresponds to the expected defect size and is the most significant one in terms of both quality and speed of inspection. It it is represented by a green square in the Image window of the Editor. The common denominator of all fragment based approaches is that the Feature Size should be adjusted so that it contains common defects with some margin.

Specific effects and guidelines on different approaches are as follows:

Local Denoising: Large Feature Size will cause small defects to be ignored, however the inference time will be shortened considerably. Heatmap precision will also be lowered.
Local Contextual, Local Featurewise, DL_DetectAnomalies2: Works best in the range of 24-48. Consider using Downscale parameter instead of increasing the Feature Size. Large Feature Size increases training as well as inference time and memory requirements.

Sampling Density

This parameter controls the spatial resolution of both training and inspection. The higher the density the more precise results but longer computational time. It is recommended to use the Low density only for well positioned and simple objects. The High density is useful when working with complex textures and highly variable objects.

Model Usage

In Detect Anomalies 1 variant, a model should be loaded with DL_DetectAnomalies1_Deploy prior to executing it with DL_DetectAnomalies1. Alternatively, the model can be loaded directly by DL_DetectAnomalies1 filter, but it will then require time-consuming initialization in the first program iteration.

In Detect Anomalies 2 variant, a model should be loaded with DL_DetectAnomalies2_Deploy prior to executing it with DL_DetectAnomalies2. Alternatively, model can be loaded directly by DL_DetectAnomalies2 filter, but it will then require time-consuming initialization in the first program iteration.

Running Adaptive Vision Deep Learning Service simultaneously with these filters is discouraged as it may result in degraded performance or errors.

3. Feature Detection (segmentation)

This technique is used to detect pixel-wise regions corresponding to defects or – in a general sense – to any image features. A feature here may be also something like the roads on a satellite image or an object part with a characteristic surface pattern. Sometimes it is also called pixel labeling as it assigns a class label to each pixel, but it does not separate instances of objects.

Training Data

Images loaded to the Editor of DL_DetectFeatures can be of different sizes and can have different ROIs defined. However, it is important to ensure that the scale and the characteristics of the features are consistent with that of the production environment.

The features can be marked using an intuitive interface in the Editor or can be imported as masks from a file.

Each and every feature should be marked on all training images, or the ROI should be limited to include only marked defects. Incompletely or inconsistently marked features are one of the main reasons of poor accuracy. REMEMBER: If you leave even a single piece of some feature not marked, it will be used as a negative sample and this will highly confuse the training process!

The marking precision should be adjusted to the application requirements. The more precise marking the better accuracy in the production environment. While marking with low precision it is better to mark features with some excess margin.

An example of wood knots marked with low precision.

An example of tile cracks marked with high precision.

Multiple classes of features

It is possible to detect many classes of features separately using one model. For example, road and building like in the image below. Different features may overlap but it is usually not recommended. Also, it is not recommended to define more than a few different classes in a single model. On the other hand, if there are two features that may be mutually confusing (e.g. roads and rivers), it is recommended to have separate classes for them and mark them, even if one of the classes is not really needed in the results. Having the confusing feature clearly marked (and not just left as the background), the neural network will focus better on avoiding misclassification.

An example of marking two different classes (red roads and yellow buildings) in the one image.

Patch Size

Detect Features is an end-to-end segmentation tool which works best when analysing an image in a medium-sized square window. The size of this window is defined by the Patch Size parameter. It should be not too small, and not too big. Typically much bigger than the size (width or diameter) of the feature itself, but much less than the entire image. In a typical scenario the value of 96 or 128 works quite well.

Performance Tip 1: a larger Patch Size increases the training time and requires more GPU memory and more training samples to operate effectively. When Patch Size exceeds 128 pixels and still looks too small, it is worth considering the Downsample option.

Performance Tip 2: if the execution time is not satisfying you can set the inOverlap filter input to False. It should speed up the inspection by 10-30% at the expense of less precise results.

Examples of Patch Size: too large or too small (red), maybe acceptable (yellow) and good (green). Remember that this is just an example and may vary in other cases.

Model Usage

A model should be loaded with DL_DetectFeatures_Deploy filter before using DL_DetectFeatures filter to perform segmentation of features. Alternatively, the model can be loaded directly by DL_DetectFeatures filter, but it will result in a much longer time of the first iteration.

Running Adaptive Vision Deep Learning Service simultaneously with these filters is discouraged as it may result in degraded performance or errors.

Parameters:

To limit the area of image analysis you can use inRoi input.
To shorten feature segmentation process you can disable inOverlap option. However, in most cases, it decreases segmentation quality.
Feature segmentation results are passed in a form of bitmaps to outHeatmaps output as an array and outFeature1, outFeature2, outFeature3 and outFeature4 as separate images.

4. Object Classification

This technique is used to identify the class of an object within an image or within a specified region.

The Principle of Operation

During the training phase, the object classification tool learns representation of user defined classes. The model uses generalized knowledge gained from samples provided for training, and aims to obtain good separation between the classes.

Result of classification after training.

After a training process is completed, the user is presented with a confusion matrix. It indicates how well the model separated the user defined classes. It simplifies identification of model accuracy, especially when a large number of samples has been used.

Confusion matrix presents correct (diagonal) and incorrect assignment of samples to the user defined classes.

Training Parameters

In addition to the default training parameters (list of parameters available for all Deep Learning algorithms), the DL_ClassifyObject tool provides a Detail Level parameter which enables control over the level of detail needed for a particular classification task. For majority of cases the default value of 1 is appropriate, but if images of different classes are distinguishable only by small features (e.g. granular materials like flour and salt), increasing value of this parameter may improve classification results.

Model Usage

A model should be loaded with DL_ClassifyObject_Deploy filter before using DL_ClassifyObject filter to perform classification. Alternatively, model can be loaded directly by DL_ClassifyObject filter, but it will result in a much longer time of the first iteration.

Running Adaptive Vision Deep Learning Service simultaneously with these filters is discouraged as it may result in degraded performance or errors.

Parameters:

To limit the area of image analysis you can use inRoi input.
Classification results are passed to outClassName and outClassIndex outputs.
The score value outScore indicates the confidence of classification.

5. Instance Segmentation

This technique is used to locate, segment and classify one or multiple objects within an image. The result of this technique are lists with elements describing detected objects – their bounding boxes, masks (segmented regions), class IDs, names and membership probabilities.

Note that in contrary to feature detection technique, instance segmentation detects individual objects and may be able to separate them even if they touch or overlap. On the other hand, instance segmentation is not an appropriate tool for detecting features like scratches or edges which may possibly have no object-like boundaries.

Original image.

Visualized instance segmentation results.

Training Data

The training phase requires the user to draw regions corresponding to objects on an image and assign them to classes.

Editor for marking objects.

Training Parameters

Instance segmentation training adapts to the data provided by the user and does not require any additional training parameters besides the default ones.

Model Usage

Use DL_SegmentInstances filter to perform instance segmentation.

Running Adaptive Vision Deep Learning Service simultaneously with these filters is discouraged as it may result in degraded performance or errors.

Parameters:

To limit the area of image analysis you can use inRoi input.
To set minimum detection score inMinDetectionScore parameter can be used.
Maximum number of detected objects on a single image can be set with inMaxObjectsCount parameter. By default it is equal to the maximum number of objects in the training data.
Results describing detected objects are passed to following outputs:
- bounding boxes: outBoundingBoxes,
- class IDs: outClassIds,
- class names: outClassNames,
- classification scores: outScores,
- masks: outMasks.

6. Point Location

This technique is used to precisely locate and classify key points, characteristic parts and small objects within an image. The result of this technique is a list of predicted point locations with corresponding class predictions and confidence scores.

When to use point location instead of instance segmentation:

precise location of key points and distinctive regions with no strict boundaries,
location and classification of objects (possibly very small) when their segmentation masks and bounding boxes are not needed (e.g. in object counting).

When to use point location instead of feature detection:

coordinates of key points, centroids of characteristic regions, objects etc. are needed.

Original image.

Visualized point location results.

Training Data

The training phase requires the user to mark points of appropriate classes on the training images.

Editor for marking points.

Feature Size

In the case of the Point Location tool, the Feature Size parameter corresponds to the size of an object or characteristic part. If images contain objects of different scales, it is recommended to use a Feature Size slightly larger than the average object size, although it may require experimenting with different values to achieve the best possible results.

Performance tip: a larger feature size increases the training time and needs more memory and training samples to operate effectively. When feature size exceeds 64 pixels and still looks too small, it is worth considering the Downsample option.

Model Usage

A model should be loaded with DL_LocatePoints_Deploy filter before using DL_LocatePoints filter to perform point location and classification. Alternatively, model can be loaded directly by DL_LocatePoints filter, but it will result in a much longer time of the first iteration.

Running Adaptive Vision Deep Learning Service simultaneously with these filters is discouraged as it may result in degraded performance or errors.

Parameters:

To limit the area of image analysis you can use inRoi input.
To set minimum detection score inMinDetectionScore parameter can be used.
inMinDistanceRatio parameter can be used to set minimum distance between two points to be considered as different. The distance is computed as MinDistanceRatio * FeatureSize. If the value is not enabled, the minimum distance is based on the training data.
To increase detection speed but with potentially slightly worse precision inOverlap can be set to False.
Results describing detected points are passed to following outputs:
- point coordinates: outLocations,
- class IDs: outClassIds,
- class names: outClassNames,
- classification scores: outScores.

7. Troubleshooting

Below you will find a list of most common problems.

1. Network overfitting

A situation when a network loses its ability to generalize over available problems and focuses only on test data.

Symptoms: during training, the validation graph stops at one level and training graph continues to rise. Defects on training images are marked very precisely, but defects on new images are marked poorly.

A graph characteristic for network overfitting.

Causes:

The number of test samples is too small.
Training time is too long.

Possible solutions:

Provide more real samples of different objects.
Use more augmentations.
Reduce Network Depth.

2. Susceptibility to changes in lighting conditions

Symptoms: network is not able to process images properly when even minor changes in lighting occur.

Causes:

Samples with variable lighting were not provided.

Solution:

Provide more samples with variable lighting.
Enable "Luminance" option for automatic lighting augmentation.

3. No progress in network training

Symptoms ― even though the training time is optimal, there is no visible training progress.

Training progress with contradictory samples.

Causes:

The number of samples is too small or the samples are not variable enough.
Image contrast is too small.
The chosen network architecture is too small.
There is contradiction in defect masks.

Solution:

Modify lighting to expose defects.
Remove contradictions in defect masks.

Tip: Remember to mark all defects of a given type on the input images or remove images with unmarked defects. Marking only a part of defects of a given type may negatively influence the network learning process.

4. Training/sample evaluation is very slow

Symptoms ― training or sample evaluation takes a lot of time.

Causes:

Resolution of the provided input images is too high.
Fragments that cannot possibly contain defects are also analyzed.

Solution:

Enable "Downsample" option to reduce the image resolution.
Limit ROI for sample evaluation.
Use lower Network Depth

Deep Learning

1. Introduction

Overview of Deep Learning Tools

Basic Terminology

Deep neural networks

Depth of a neural network

Training process

Stopping Conditions

Preprocessing

Augmentation

2. Anomaly Detection

Global and Local network types

Denoising, Contextual and Featurewise Approaches (Local network type only)

Feature Size

Sampling Density

Model Usage

3. Feature Detection (segmentation)

Training Data

Multiple classes of features

Patch Size

Model Usage

4. Object Classification

The Principle of Operation

Training Parameters

Model Usage

5. Instance Segmentation

Training Data

Training Parameters

Model Usage

6. Point Location

Training Data

Feature Size

Model Usage

7. Troubleshooting

1. Network overfitting

2. Susceptibility to changes in lighting conditions

3. No progress in network training

4. Training/sample evaluation is very slow

See Also