3.4.4 Regression

The machine learning models in OpenCV and LibSVM also support a regression mode : they can be used to predict a numeric value (i.e. not a class index) from an input predictor. The workflow is the same as classification. First, the regression model is trained, then it can be used to predict output values. The applications to do that are TrainRegression and PredictRegression.

Input datasets

The input dataset for training must have the following structure :

The TrainRegression application supports 2 input formats :

If you have separate images for predictors and output values, you can use the ConcatenateImages application.

otbcli_ConcatenateImages  -il features.tif  output_value.tif  
                          -out training_set.tif

Statistics estimation

As in classification, a statistics estimation step can be performed before training. It allows to normalize the dynamic of the input predictors to a standard one : zero mean, unit standard deviation. The main difference with the classification case is that with regression, the dynamic of output values can also be reduced.

The statistics file format is identic to the output file from ComputeImagesStatistics application, for instance :

<?xml version="1.0" ?>  
<FeatureStatistics>  
    <Statistic name="mean">  
        <StatisticVector value="198.796" />  
        <StatisticVector value="283.117" />  
        <StatisticVector value="169.878" />  
        <StatisticVector value="376.514" />  
    </Statistic>  
    <Statistic name="stddev">  
        <StatisticVector value="22.6234" />  
        <StatisticVector value="41.4086" />  
        <StatisticVector value="40.6766" />  
        <StatisticVector value="110.956" />  
    </Statistic>  
</FeatureStatistics>

In the TrainRegression application, normalization of input predictors and output values is optional. There are 3 options :

If you use an image list as training set, you can run ComputeImagesStatistics application. It will produce a statistics file suitable for input and output normalization (third option).

otbcli_ComputeImagesStatistics  -il   training_set.tif  
                                -out  stats.xml

Training

Initially, the machine learning models in OTB only used classification. But since they come from external libraries (OpenCV and LibSVM), the regression mode was already implemented in these external libraries. So the integration of these models in OTB has been improved in order to allow the usage of regression mode. As a consequence , the machine learning models have nearly the same set of parameters for classification and regression mode.

The regression mode is currently supported for :

The behaviour of TrainRegression application is very similar to TrainImagesClassifier. From the input dataset, a portion of the samples is used for training, whereas the other part is used for validation. The user may also set the model to train and its parameters. Once the training is done, the model is stored in an output file.

otbcli_TrainRegression  -io.il                training_set.tif  
                        -io.imstat            stats.xml  
                        -io.out               model.txt  
                        -sample.vtr           0.5  
                        -classifier           knn  
                        -classifier.knn.k     5  
                        -classifier.knn.rule  median

Prediction

Once the model is trained, it can be used in PredictRegression application to perform the prediction on an entire image containing input predictors (i.e. an image with only n feature components). If the model was trained with normalization, the same statistic file must be used for prediction. The behaviour of PredictRegression with respect to statistic file is identic to TrainRegression :

The model to use is read from file (the one produced during training).

otbcli_PredictRegression  -in     features_bis.tif  
                          -model  model.txt  
                          -imstat stats.xml  
                          -out    prediction.tif