Deep Learning

Table of contents:

Overview
Anomaly detection
Feature detection (segmentation)
Object classification
Instance segmentation
Point location
Troubleshooting

1. Overview

Deep Learning is a milestone machine learning technique in computer vision. It automatically learns from the training images provided and can effectively generate the solutions for a wide range of applications with minimum effort. Main advantages of this technique are: simple configuration, short development time, versatility of possible applications, robustness to noise and high performance.

Common applications:

detection of surface and shape defects (e.g. cracks, deformations, discoloration),
detecting anomalous samples (e.g. missing, broken or low-quality parts),
identification of objects/images with respect to predefined classes (i.e. sorting machines),
localization, segmentation and classification of multiple objects within an image (i.e. bin picking),
quality analysis in variable environments,
localization and classification of key points, characteristic regions and small objects.

Using deep learning functionality includes two stages:

Training - generating a model based on features learned from training samples (using Deep Learning Editor),
Inference - applying the model on new images in order to perform certain machine vision tasks.

The difference from a classic image processing approach is presented in diagrams below:

Classic approach: the algorithm is the missing element that needs to be designed by a human specialist.

Machine learning approach: only the manually labeled training images need to be provided.

Available Deep Learning tools

Anomaly detection - This technique is used to detect anomalous samples. It only needs a set of fault-free samples to learn the model of normal appearance. Optionally, several faulty ones can be useful to define the level of tolerable variations. This tool is useful especially in cases where defects are unknown, too difficult to define upfront or highly variable. The output of this tool are a classification result (normal or faulty), an abnormality score and a heatmap of defects in the image.

An example of missing part detection using DeepLearning_DetectAnomalies2 tool. Left: The original image with a missing element. Right: The classification result with a heatmap of defects overlay.

Feature detection (segmentation) - This technique is used to precisely segment one or more classes of features within an image. The pixels belonging to each class must be marked by the user in the training step. The result of this technique is an array of probability maps for every class.

An example of image segmentation using DeepLearning_DetectFeatures tool. Left: The original image of the fundus. Right: The segmentation of blood vessels.

Object classification - This technique is used to mark objects/images with one of the predefined classes. First, it is necessary to provide a training set of labeled images. The result of this technique are a name of a class and a classification confidence level for a given image.

An example of object classification using DeepLearning_ClassifyObject tool.

Instance segmentation - This technique is used to locate, segment and classify single or multiple objects within an image. The training requires a user to draw regions corresponding to objects on an image and assign them to classes. The result of this technique are lists with elements describing detected objects - their bounding boxes, masks (segmented regions), class IDs, names and membership probabilities.

An example of instance segmentation using DeepLearning_SegmentInstances tool. Left: The original image. Right: The resulting list of detected objects.

Point location - This technique is used to precisely locate and classify key points, characteristic regions and small objects within an image. The training requires a user to mark points of appropriate classes on the training images. The result of this technique is a list of predicted point locations with corresponding class predictions and confidence scores.

An example of point location using DeepLearning_LocatePoints tool. Left: The original image. Right: The resulting list of detected points.

Basic terminology

The users do not need to be equipped with the specialistic scientific knowledge to design their own deep learning solutions. However, it may be very useful to understand the basic terminology and principles behind the process.

Deep neural networks

Adaptive Vision gives access to deep convolutional neural networks architectures created, adjusted and tested to solve industrial-grade machine vision tasks. Each network is a set of trainable convolutional filters and neural connections which can model complex transformations of the image to extract relevant features and use them in order to solve particular problem. However, they are useless without proper amount of good quality data provided for training process (adjusting weights of filters and connections). This documentation gives necessary practical hints on preparing an effective deep learning model.

Due to various levels of tasks complexity and different expected execution times, the users can choose one of five available network depths. The Network depth parameter is an abstract value defining the memory capacity of the network (i.e., the number of layers and filters of a network) and ability to solve more complex problems. The list below gives hints about selecting the proper depth for a task characteristics and conditions.

Low depth (value 1-2)
- A problem is simple to define.
- A problem could be easily solved by a human inspector.
- A short time of execution is required.
- Background and lighting do not change across images.
- Well-positioned objects and good quality of images.
Standard depth (default, value 3)
- Suitable for a majority of applications without any special conditions.
- A modern CUDA-enabled GPU is available.
High depth (value 4-5)
- A big amount of training data is available.
- A problem is hard or very complex to define and solve.
- Complicated irregular patterns across images.
- Long training and execution times are not a problem.
- A large amount of GPU RAM (≥4GB) is available.
- Varying background, lighting and/or positioning of objects.

Tip: test your solution with a lower depth first, and then increase it if needed.

Note: a higher network depth will lead to a significant increase in memory and computational complexity of training and execution.

Training

Model training is an iterative process of updating neural network weights based on the training data. One iteration involves some number of steps (determined automatically), each step consists of the following operations:

selection of a small subset (batch) of training samples,
calculation of network error for these samples,
updating weights to achieve lower error for these samples.

At the end of each iteration, the current model is evaluated on a separate set of validation samples selected before the training process. Validation set is automatically chosen from the training samples. It is used to simulate how neural network would work with real images not used during training. Only a set of network weights corresponding with the best validation score at the end of training is saved as the final solution. Monitoring the training and validation score (blue and orange lines in the figures below) in consecutive iterations gives fundamental information about the progress:

Both training and validation scores are improving - keep training, model can still improve.
Both training and validation scores has stopped improving - keep training for a few iterations more and stop if still no change.
Training score is improving, but validation score has stopped or is going worse - you can stop training, model has probably started overfitting your training data (remembering exact samples rather than learning rules about features). It may be caused by too small amount of diverse samples or too low complexity of problem for a network selected.

An example of correct training.

A graph characteristic for network overfitting.

Above graphs represent training progress in our Deep Learning Editor, the blue line indicates the performance on training samples, and the orange line represents the performance on validation samples. Please note the blue line is plotted more frequently than the orange line, as validation performance is verified only at the end of each iteration.

Stopping Conditions

The user can stop the training manually by clicking the Stop button. Alternatively, it is also possible to set one or more stopping conditions:

Iteration Count ― training will stop after a fixed number of iterations.
Iterations Without Improvement ― training will stop when the best validation score was not improved for a given number of iterations.
Time ― training will stop after a given number of minutes has passed.
Validation Accuracy or Validation Error ― training will stop when validation score reaches a given value.

Preprocessing

To adjust the performance on your particular task, the user can apply some additional transformations to input images before training starts:

Downsample ― reduction of image size to accelerate training and execution times at the expense of lower level of details possible to detect. Increasing this parameter by 1 will result in downsampling by the factor of 2 over both image dimension.
Convert to grayscale ― while working with problems where color does not matter, you can choose to work with monochrome versions of images.

Augmentation

In case when the number of training images can be too small to represent all possible variations of samples, it is recommended to use data augmentation that adds artificially modified samples during training. This option can also help avoiding overfitting.

Available augmentations are:

Luminance ― change brightness of samples by a random percentage (between -ParameterValue and +ParameterValue) of pixel values (0-255).
Noise ― modify samples with uniform noise. Value of each channel and pixel is modified separately, by random percentage (between -ParameterValue and +ParameterValue) of pixel values (0-255).
Gaussian Blur ― blur samples, with kernel of a size randomly selected between 0 and provided maximum kernel size.
Rotation ― rotate samples by random angle between -ParameterValue and +ParameterValue. Measured in degrees.
Flip Up-Down ― reflect samples along the X axis.
Flip Left-Right ― reflect samples along the Y axis.
Relative Translation ― translate samples by a random shift, defined as a percentage (between -ParameterValue and +ParameterValue) of the tile (in Detect Features, Locate Points, Detect Anomalies 1 Local and Detect Anomalies 2) or image size (in Classify Object, Segment Instances and Detect Anomalies 1 Global). Works independently in both X and Y dimensions.
Scale ― resize samples relatively to its original size by a random percentage between provided minimum scale and maximum scale.
Horizontal Shear ― shear samples horizontally by angle between -ParameterValue and +ParameterValue. Measured in degrees.
VerticalShear ― analogous to Horizontal Shear.

Warning: the choice of augmentation options depends only on the task we want to solve, sometimes they might be harmful for quality of a solution. For simple example, the Rotation should not be enabled if the rotations are not expected in a production environment. Enabling augmentations also increases the network training time (but does not affect execution time!)

2. Anomaly detection

Deep Learning Add-on provides two variants of the DetectAnomalies tool, representing two different approaches to anomaly detection in images. The DeepLearning_DetectAnomalies1 tool (reconstructive approach) uses deep neural networks to remove defects from the input image (by reconstructing the affected regions). The DeepLearning_DetectAnomalies2 tool (classificational approach) uses deep neural networks to classify if a region contains a defect or not. The selection of a tool depends on the characteristics and requirements of the application. Some guidelines are presented in the diagram below.

The difference between the approaches is presented in the example below.

Left: The original image with a defect. Middle: An example of defect detection using DeepLearning_DetectAnomalies1. Right: An example of faster but less precise defect detection using DeepLearning_DetectAnomalies2.

Interactive histogram tool

DetectAnomalies filters measure deviation of samples from normal image appearance learned during training phase. If the deviation exceeds a given threshold, the image is marked as defected. The suggested threshold is automatically calculated after the training phase but can be adjusted by user in the Deep Learning Editor using the interactive histogram tool described below.

After the training phase, scores are calculated for every training sample and are presented in the form of histogram; good samples are marked with green, bad samples with red bars. In the perfect case, the scores for good samples should be all lower than for bad samples and the threshold should be automatically calculated to give the optimal accuracy of the model. However, the groups may sometimes overlap because of:

incorrectly labeled samples,
bad Feature size,
ambiguous definition of the expected defects,
high variability of the samples appearance or environmental conditions.

In order to achieve more robust threshold, it is recommended to perform training with a large number of samples from both groups. If the number of samples is limited, our software allows to manually set the uncertainty area with additional thresholds (the information about the confidence of the model can be than obtained from the hidden outIsConfident filter port for each sample).

The histogram tool where green bars represent correct samples and red bars represent defected samples. T marks the main threshold and T1, T2 define the area of uncertainty.

Left: a histogram presenting well-separated groups indicating a good accuracy of the model. Right: a poor accuracy of the model.

Global and Local network types

The DeepLearning_DetectAnomalies1 filter gives an option to choose between Local and Global types of processing. The default and the most easy-to-use one is the Global processing. It is used to analyze images holistically and is recommended for typical applications where objects are well-positioned and have large defects, like in the image below. The Local processing is more computationally expensive and it is used to analyze images in fragments of size determined by the Feature size parameter.

An example of cookie shape defects detection using the Global processing of the DeepLearning_DetectAnomalies1 filter.

Feature size

This parameter defines the expected defect size and is the most significant one in terms of both quality and speed of inspection. It it is represented by a gray square in the Image window of the Editor. It needs to be adjusted to a specific application using hints below:

The Feature size should be large enough to contain common defects with some margin.
Too small Feature size may often lead to increased complexity of the model and longer processing time.
It is better to try first with a larger Feature size (small defects should be still detected) or using the Downsample option.
Too large Feature size may lead to less precise results and lower resolution of the inspection.
The Feature size is examined to work best in the range of 24-48 for the DetectAnomalies tools.

Denoising, Contextual and Featurewise approaches (Local network type only)

The DeepLearning_DetectAnomalies1 Local network provides three ways of defect detection:

Denoising
Contextual
Featurewise

Both the Denoising and Contextual approaches are based on reconstructing an image without defects and then comparing it with the original one. The first one filters out all patterns smaller than Feature Size that were not present in the training set. The latter one reconstructs proper image fragments by using their neighborhood (context). The Denoising approach is the default one and is a few times faster than the Contextual approach. It is recommended to use the Contextual approach when the Denoising approach has problems with removing some specific defects. The third approach - Featurewise - is based on a different mechanism than the previous two: it compares image characteristics (features) of image fragments and their contexts. This approach is recommended when defects are difficult to distinguish from their background, e.g. lack of piece of chocolate in the dark brown packaging.

Sampling density

This parameter controls the spatial resolution of both training and inspection. The higher the density the more precise results but longer computational time. It is recommended to use the Low density only for well positioned and simple objects. The High density is useful when working with complex textures and highly variable objects.

Model usage

Use DeepLearning_DetectAnomalies1 or DeepLearning_DetectAnomalies2 filters.

3. Feature detection (segmentation)

This technique is used to precisely mark pixels belonging to one or more classes called features of the image. They should be repeatable across the dataset and easy to define because they need to be marked by the user first. A few common examples of features to detect using this filter are:

characteristic edges, lines or points,
characteristic patterns and small objects,
repeatable defects.

Preparing the data

Images used to training can be of different sizes and can have different ROIs defined. However, it is important to ensure scale and appearance of the features to be consistent with the production environment.

Each and every feature should be marked on all training images, or ROI should be limited to include only marked defects. Inconsistently marked features are one of the main reasons of poor accuracy.

The marking precision should be adjusted to the application requirements. The more precise marking the better accuracy in the production environment. While marking with low precision it is better to mark features with some excess margin.

An example of knots marked with low precision.

An example of cracks marked with high precision.

Multiple classes of features

It is possible to detect many classes of features separately using the same model. For example, road and building like in the image below. Features may overlap but it is usually not recommended. Also, it is not recommended to define more than a few different classes in a single model.

An example of marking two different classes (red roads and yellow buildings) in the one image.

Patch Size

Detect Features is an end-to-end segmentation tool which works optimally by analysing an image in a relatively small square window. The side length of such window is defined by the Patch Size parameter. The best results can be achieved when the Patch Size contains a feature as well as its surrounding (context). It is recommended to follow the rule that an operator, using the same context, should be able to properly decide whether a given sample contains a defect or not. It is important to note that both too small or too large Patch Size values may lead to poor accuracy and long processing time. In a typical scenario Patch Size of 96 or 128 should give the best results.

Performance tip: a larger Patch Size increases the training time and needs more GPU memory and training samples to operate effectively. When Patch Size exceeds 128 pixels and still looks too small, it is worth considering the Downsample option.

Performance tip: if the execution time is not satisfying you can set the inOverlap filter input to False. It should speed up the inspection by 10-30% at the expense of less precise results.

Examples of Patch Size: too large (red), optimal (green) and acceptable (orange). Remember that this is just a heuristic and can vary in some cases.

Model usage

Use DeepLearning_DetectFeatures filter to perform segmentation of features.

Parameters:

To limit the area of image analysis you can use inRoi input.
To increase segmentation precision you can use inOverlap option.
Feature segmentation results are passed in a form of bit maps to outHeatmaps output as an array and outFeature1, outFeature2, outFeature3 and outFeature4 as separate images.

Notices:

Enabling inOverlap option increases segmentation quality, but also prolongs feature segmentation process.
Since Adaptive Vision Studio 4.12 programs, which contain only filters related to Feature Detection, Points Location or Object Classification, do not require working Deep Learning Service anymore. It is a result of complete re-engineering of the underlying implementation - a general-purpose deep learning framework has been replaced with our own code specifically optimized for the purposes we have. If there are Deep Learning filters related to Anomaly Detection or Instances Segmentation, the Service should be up and running, though.

4. Object classification

This technique is used to identify class of object within an image.

Principle of operation

During the training, object classification learns the representation of user defined classes. Model uses generalized knowledge gained from samples provided for training, and aims to obtain good separation between classes.

Result of classification after training.

After the training is completed, user is presented with confusion matrix. It indicates, how well model separated user defined classes. It simplifies identification of model accuracy, when large number of samples is presented.

Confusion matrix presents assignment of samples to user defined classes.

Training parameters

In addition to the default training parameters (list of parameters available for all Deep Learning algorithms), Classify Object tool provides the Detail Level parameter which enables control over level of detail needed for a particular classification task. For majority of cases the default value of 1 is appropriate, but if images of different classes are distinguishable only by small features (e.g. granular materials like flour and salt), increasing value of this parameter may improve classification results.

Model usage

Use DeepLearning_ClassifyObject filter to perform classification.

Parameters:

To limit the area of image analysis you can use inRoi input.
Classification results are passed to outClassName and outClassIndex outputs.
Score value outScore indicates confidence of classification.

Notices:

Since Adaptive Vision Studio 4.12 programs, which contain only filters related to Feature Detection, Points Location or Object Classification, do not require working Deep Learning Service anymore. It is a result of complete re-engineering of the underlying implementation - a general-purpose deep learning framework has been replaced with our own code specifically optimized for the purposes we have. If there are Deep Learning filters related to Anomaly Detection or Instances Segmentation, the Service should be up and running, though.

5. Instance segmentation

This technique is used to locate, segment and classify single or multiple objects within an image. The result of this technique are lists with elements describing detected objects - their bounding boxes, masks (segmented regions), class IDs, names and membership probabilities.

Note that in contrary to feature detection technique, instance segmentation detects individual objects and may be able to separate them even if they touch or overlap. On the other hand, instance segmentation is not an appropriate tool to detect and segment features, e.g. scratches or cracks.

Original image.

Visualized instance segmentation results.

Preparing the data

The training requires a user to draw regions corresponding to objects on an image and assign them to classes.

Editor for marking objects.

Training parameters

Instance segmentation training adapts to the data provided by a user and does not require any additional training parameters besides the default ones.

Model usage

Use DeepLearning_SegmentInstances filter to perform instance segmentation.

Parameters:

To limit the area of image analysis you can use inRoi input.
To set minimum detection score inMinDetectionScore parameter can be used.
Maximum number of detected objects on a single image can be set with inMaxObjectsCount parameter. By default it is equal to the maximum number of objects in the training data.
Results describing detected objects are passed to following outputs:
- bounding boxes: outBoundingBoxes,
- class IDs: outClassIds,
- class names: outClassNames,
- classification scores: outScores,
- masks: outMasks.

6. Point location

This technique is used to precisely locate and classify key points, characteristic regions and small objects within an image. The result of this technique is a list of predicted point locations with corresponding class predictions and confidence scores.

When to use point location instead of instance segmentation:

precise location of key points and distinctive regions with no strict boundaries,
location and classification of objects (possibly very small) when their segmentation masks and bounding boxes are not needed (e.g. in object counting).

When to use point location instead of feature detection:

coordinates of key points, centroids of characteristic regions, objects etc. are needed.

Original image.

Visualized point location results.

Preparing the data

The training requires a user to mark points of appropriate classes on the training images.

Editor for marking points.

Feature size

In the case of point location tool, feature size parameter corresponds to the size of object or characteristic region. If images contain objects of different scales, it is recommended to use feature size slightly larger than the average object size, although it may require experimenting with different values to achieve optimal results.

Performance tip: a larger feature size increases the training time and needs more memory and training samples to operate effectively. When feature size exceeds 64 pixels and still looks too small, it is worth considering the Downsample option.

Model usage

Use DeepLearning_LocatePoints filter to perform point location and classification.

Parameters:

To limit the area of image analysis you can use inRoi input.
To set minimum detection score inMinDetectionScore parameter can be used.
inMinDistanceRatio parameter can be used to set minimum distance between two points to be considered as different. The distance is computed as MinDistanceRatio * FeatureSize. If the value is not enabled, the minimum distance is based on the training data.
To increase detection speed but with potentially slightly worse precision inOverlap can be set to False.
Results describing detected points are passed to following outputs:
- point coordinates: outLocations,
- class IDs: outClassIds,
- class names: outClassNames,
- classification scores: outScores.

Notices:

Since Adaptive Vision Studio 4.12 programs, which contain only filters related to Feature Detection, Points Location or Object Classification, do not require working Deep Learning Service anymore. It is a result of complete re-engineering of the underlying implementation - a general-purpose deep learning framework has been replaced with our own code specifically optimized for the purposes we have. If there are Deep Learning filters related to Anomaly Detection or Instances Segmentation, the Service should be up and running, though.

7. Troubleshooting

Below you will find a list of most common problems.

1. Network overfitting

A situation when network loses its ability to generalize over available problems and focuses only on test data.

Symptoms: during training, the validation graph stops at one level and training graph continues to rise. Defects on training images are marked very precisely, but defects on new images are marked poorly.

A graph characteristic for network overfitting.

Causes:

The number of test samples is too small.
Training time is too long.

Solution:

Provide more real samples.
Add more samples with possible object transformations.

2. Susceptibility to changes in lighting conditions

Symptoms: network is not able to process images properly when even minor changes in lighting occur.

Causes:

Samples with variable lighting were not provided.

Solution:

Provide more samples with variable lighting.
Enable "Luminance" option for automatic lighting augmentation.

3. No progress in network training

Symptoms ― even though the training time is optimal, there is no visible training progress.

Training progress with contradictory samples.

Causes:

The number of samples is too small or the samples are not variable enough.
Image contrast is too small.
The chosen network architecture is too small.
There is contradiction in defect masks.

Solution:

Modify lighting to expose defects.
Remove contradictions in defect masks.

Tip: Remember to mark all defects of a given type on the input images or remove images with unmarked defects. Marking only a part of defects of a given type may negatively influence the network learning process.

4. Training/sample evaluation is very slow

Symptoms ― training or sample evaluation takes a lot of time.

Causes:

Resolution of the provided input images is too high.
Fragments that cannot possibly contain defects are also analyzed.

Solution:

Enable "Downsample" option to reduce the image resolution.
Limit ROI for sample evaluation.

Deep Learning

1. Overview

Available Deep Learning tools

Basic terminology

Deep neural networks

Training

Stopping Conditions

Preprocessing

Augmentation

2. Anomaly detection

Interactive histogram tool

Global and Local network types

Feature size

Denoising, Contextual and Featurewise approaches (Local network type only)

Sampling density

Model usage

3. Feature detection (segmentation)

Preparing the data

Multiple classes of features

Patch Size

Model usage

4. Object classification

Principle of operation

Training parameters

Model usage

5. Instance segmentation

Preparing the data

Training parameters

Model usage

6. Point location

Preparing the data

Feature size

Model usage

7. Troubleshooting

1. Network overfitting

2. Susceptibility to changes in lighting conditions

3. No progress in network training

4. Training/sample evaluation is very slow

See also: