You are here: Start » Filter Reference » Sample Based Inspection » LoadOrganicTrainingData

LoadOrganicTrainingData

Loads images and regions, extracts knowledge from it, and initializes OrganicModel, which can be then trained and used to classify new images.

Name	Type	Description
inTrainingImageDirectories	Directory Array	One directory with images and regions per class.
inTrainingImageExtensions	String	Extension of image files to use.
outInitializedModel	OrganicModel	Pre-trained model, initialized with data.

Description

Data used to initialize OrganicModel has to be organized as follows:

Images representing each object class has to reside in separate directories; that is, if there are K classes of object, there should by K directories.
All images have to be saved in the same format - files should share common extension.
For each image in directory, there should be .avdata file of the same name in this directory. The .avdata file should contain region, which is a ROI of object being classified on the corresponding image. The method used to extract this region should be the same for each class.

This filter will load data (both images and regions) from directories specified in inTrainingImageDirectories and will initialize a model with class count equal to the number of directories provided. Images from first directory will be assigned to class number 0, from second directory to class number 1, etc. This assignment will be fixed inside outInitializedModel, and after training in TrainOrganicModel filter, it will be used to assign new images in RecognizeOrganicObject. The image extension of images has to be specified as inTrainingImageExtensions. Extension supported are the same as in LoadImage filter.

Remarks

Organic object classification is divided into several steps: preparing training data, loading it with LoadOrganicTrainingData, processing and training with use of TrainOrganicModel filter, and finally classification using RecognizeOrganicObject. Details of those steps are described in this section. It is advised to train the model during development phase, and than save trained OrganicModel and import it to production system, but it is also feasible to re-train such a model within deployed system. In terms of execution time, LoadOrganicTrainingData is the most time-consuming, and thus it is advised to load data once, and than save initialized mode. The TrainOrganicModel is quite fast (but this depends on parameters used and size of input data) which makes it plausible to adjust model directly in runtime.

The classification of organic objects is based on numerous transformations of given sample images. The training images should present best representants of given class to avoid adjustment of model to bad examples, i.e. there should not be any partially visible object, multiple objects or objects of different class in set of images associated with given class.

Because of variety of organic objects being classified, there is no universal method for segmenting them from the image. There is a need to devise extraction method for each new type of objects. To accommodate to this difficulty, the model is being trained with user provided images and regions. The regions should be covering all of the object in image. The best way is to prepare macro which extracts object ROI from image. This macro should be saved during training phase, and than used again when classification is performed.

Extracted from training images regions should be saved in the directory in which images are located. This can be done with SaveObject filter . Resulting region file name has to be the same, as image file name, from which the region has been extracted. Name should only differ in extension: for region, it has to be .avdata.

Each directory, filled with images and regions, should represents one class of objects. This is crucial - mixing images between training directories will result with faulty model.

LoadOrganicTrainingData scans through provided directories and loads images of given extension simultaneously with corresponding region files. Than it applies some transformations to the images, building coarse model of object classes. This process is time-consuming, but it is parameterless. It is advised to build good set of training images (few hundreds images per class should be sufficient) and execute LoadOrganicTrainingData only once. Saved OrganicModel weights much less than whole set of images with accompanying regions. Initialized model can later be loaded and trained.

The training process is performed with TrainOrganicModel filter. Internally it is an iterative process which transforms coarse model of provided object classes into more specific, better model. Initially, the data can be preprocessed. It is recommended to use raw data only for easy classification tasks - it makes training a little bit faster, but raw data can be full of noise, which prevents classifier to obtain good fit. Popular and fast pre-processing method is Normalization - using it often results in good fit. The PCA method transforms coarse model using Principal Components Analysis, effectively removing parts of model which are not giving a lot of insight into the data. This, however, reduces internal model size, and has to be reflected with changes of TrainOrganicModel.inModelCapacity parameter.

After training, the model should be assessed. For the assessment, another set of images and regions should be prepared - it is called "validation set". The validation set should comprise images, that are not included in previously used training set - this is crucial for avoiding so called overfitting. Model assessment is to perform classification on validation set with trained model, and compare resulting assignments to real classes of objects provided. To classify object with trained model, RecognizeOrganicObject filter can be used. To calculate performance metrics of classification, MeasureClassificationQuality_Multiclass or MeasureClassificationQuality_Binary can be used. Mentioned filters are calculating few scores and, so called, confusion matrix, from which it is clearly visible, how the classifier is doing its job. After assessment, the trained model can be used in production system, or - if the chosen metric is not good enough - it can be trained again with different parameters.

Errors

This filter can throw an exception to report error. Read how to deal with errors in Error Handling.

List of possible exceptions:

Error type	Description
DomainError	Cannot perform classification on less than two classes. Add more directories with images of objects of different classes.

Complexity Level

This filter is available on Advanced Complexity Level.

LoadOrganicTrainingData

Description

Remarks

Errors

Complexity Level

See Also