SampleAugmentation

Generates synthetic samples from a sample data file.

Description

The application takes a sample data file as generated by the SampleExtraction application and generates synthetic samples to increase the number of available samples.

Parameters

Input samples -in filename [dtype] Mandatory
Vector data file containing samples (OGR format)

Output samples -out filename [dtype] Mandatory
Output vector data file storing new samples(OGR format).

Field Name -field string
Name of the field carrying the class name in the input vectors.

Layer Index -layer int Default value: 0
Layer index to read in the input vector file.

Label of the class to be augmented -label int Default value: 1
Label of the class of the input file for which new samples will be generated.

Number of generated samples -samples int Default value: 100
Number of synthetic samples that will be generated.

Field names for excluded features -exclude string1 string2...
List of field names in the input vector data that will not be generated in the output file.

Augmentation strategy -strategy [replicate|jitter|smote] Default value: replicate

  • Replicate input samples
    The new samples are generated by replicating input samples which are randomly selected with replacement.
  • Jitter input samples
    The new samples are generated by adding gaussian noise to input samples which are randomly selected with replacement.
  • Smote input samples
    The new samples are generated by using the SMOTE algorithm (http://dx.doi.org/10.1613/jair.953) on input samples which are randomly selected with replacement.

Jitter input samples options

Factor for dividing the standard deviation of each feature -strategy.jitter.stdfactor float Default value: 10
The noise added to the input samples will have the standard deviation of the input features divided by the value of this parameter.

Smote input samples options

Number of nearest neighbors -strategy.smote.neighbors int Default value: 5
Number of nearest neighbors to be used in the SMOTE algorithm


Random seed -seed int
Set a specific random seed with integer value.

Examples

From the command-line:

otbcli_SampleAugmentation -in samples.sqlite -field class -label 3 -samples 100 -out augmented_samples.sqlite -exclude OGC_FID name class originfid -strategy smote -strategy.smote.neighbors 5

From Python:

import otbApplication

app = otbApplication.Registry.CreateApplication("SampleAugmentation")

app.SetParameterString("in", "samples.sqlite")
app.SetParameterString("field", "class")
app.SetParameterInt("label", 3)
app.SetParameterInt("samples", 100)
app.SetParameterString("out", "augmented_samples.sqlite")
app.SetParameterStringList("exclude", "OGC_FID name class originfid")
app.SetParameterString("strategy","smote")
app.SetParameterInt("strategy.smote.neighbors", 5)

app.ExecuteAndWriteOutput()