SampleAugmentation - Sample Augmentation

Generates synthetic samples from a sample data file.

Detailed description

The application takes a sample data file as generated by the SampleExtraction application and generates synthetic samples to increase the number of available samples.

Parameters

This section describes in details the parameters available for this application. Table [1] presents a summary of these parameters and the parameters keys to be used in command-line and programming languages. Application key is SampleAugmentation .

[1]Table: Parameters table for Sample Augmentation.
Parameter Key Parameter Name Parameter Type
in Input samples Input File name
out Output samples Output File name
field Field Name List
layer Layer Index Int
label Label of the class to be augmented Int
samples Number of generated samples Int
exclude Field names for excluded features. List
strategy Augmentation strategy Choices
strategy replicate Replicate input samples Choice
strategy jitter Jitter input samples Choice
strategy smote Smote input samples Choice
strategy.jitter.stdfactor Factor for dividing the standard deviation of each feature Float
strategy.smote.neighbors Number of nearest neighbors. Int
seed set user defined seed Int
inxml Load otb application from xml file XML input parameters file
outxml Save otb application to xml file XML output parameters file

Input samples: Vector data file containing samples (OGR format).

Output samples: Output vector data file storing new samples(OGR format).

Field Name: Name of the field carrying the class name in the input vectors.

Layer Index: Layer index to read in the input vector file.

Label of the class to be augmented: Label of the class of the input file for which new samples will be generated.

Number of generated samples: Number of synthetic samples that will be generated.

Field names for excluded features.: List of field names in the input vector data that will not be generated in the output file.

Augmentation strategy Available choices are:

  • Replicate input samples: The new samples are generated by replicating input samples which are randomly selected with replacement.
  • Jitter input samples: The new samples are generated by adding gaussian noise to input samples which are randomly selected with replacement.
  • Factor for dividing the standard deviation of each feature: The noise added to the input samples will have the standard deviation of the input features divided by the value of this parameter. .
  • Smote input samples: The new samples are generated by using the SMOTE algorithm (http://dx.doi.org/10.1613/jair.953) on input samples which are randomly selected with replacement.
  • Number of nearest neighbors.: Number of nearest neighbors to be used in the SMOTE algorithm.

set user defined seed: Set specific seed. with integer value.

Load otb application from xml file: Load otb application from xml file.

Save otb application to xml file: Save otb application to xml file.

Example

To run this example in command-line, use the following:

otbcli_SampleAugmentation -in samples.sqlite -field class -label 3 -samples 100 -out augmented_samples.sqlite -exclude OGC_FID name class originfid -strategy smote -strategy.smote.neighbors 5

To run this example from Python, use the following code snippet:

#!/usr/bin/python

# Import the otb applications package
import otbApplication

# The following line creates an instance of the SampleAugmentation application
SampleAugmentation = otbApplication.Registry.CreateApplication("SampleAugmentation")

# The following lines set all the application parameters:
SampleAugmentation.SetParameterString("in", "samples.sqlite")

# The following line execute the application
SampleAugmentation.ExecuteAndWriteOutput()

Limitations

None

Authors

This application has been written by OTB-Team.