SampleAugmentation - Sample Augmentation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generates synthetic samples from a sample data file. Detailed description -------------------- The application takes a sample data file as generated by the SampleExtraction application and generates synthetic samples to increase the number of available samples. Parameters ---------- This section describes in details the parameters available for this application. Table [#]_ presents a summary of these parameters and the parameters keys to be used in command-line and programming languages. Application key is *SampleAugmentation* . .. [#] Table: Parameters table for Sample Augmentation. +-------------------------+----------------------------------------------------------+--------------------------+ |Parameter Key |Parameter Name |Parameter Type | +=========================+==========================================================+==========================+ |in |Input samples |Input File name | +-------------------------+----------------------------------------------------------+--------------------------+ |out |Output samples |Output File name | +-------------------------+----------------------------------------------------------+--------------------------+ |field |Field Name |List | +-------------------------+----------------------------------------------------------+--------------------------+ |layer |Layer Index |Int | +-------------------------+----------------------------------------------------------+--------------------------+ |label |Label of the class to be augmented |Int | +-------------------------+----------------------------------------------------------+--------------------------+ |samples |Number of generated samples |Int | +-------------------------+----------------------------------------------------------+--------------------------+ |exclude |Field names for excluded features. |List | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy |Augmentation strategy |Choices | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy replicate |Replicate input samples | *Choice* | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy jitter |Jitter input samples | *Choice* | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy smote |Smote input samples | *Choice* | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy.jitter.stdfactor|Factor for dividing the standard deviation of each feature|Float | +-------------------------+----------------------------------------------------------+--------------------------+ |strategy.smote.neighbors |Number of nearest neighbors. |Int | +-------------------------+----------------------------------------------------------+--------------------------+ |seed |set user defined seed |Int | +-------------------------+----------------------------------------------------------+--------------------------+ |inxml |Load otb application from xml file |XML input parameters file | +-------------------------+----------------------------------------------------------+--------------------------+ |outxml |Save otb application to xml file |XML output parameters file| +-------------------------+----------------------------------------------------------+--------------------------+ **Input samples**: Vector data file containing samples (OGR format). **Output samples**: Output vector data file storing new samples(OGR format). **Field Name**: Name of the field carrying the class name in the input vectors. **Layer Index**: Layer index to read in the input vector file. **Label of the class to be augmented**: Label of the class of the input file for which new samples will be generated. **Number of generated samples**: Number of synthetic samples that will be generated. **Field names for excluded features.**: List of field names in the input vector data that will not be generated in the output file. **Augmentation strategy** Available choices are: - **Replicate input samples**: The new samples are generated by replicating input samples which are randomly selected with replacement. - **Jitter input samples**: The new samples are generated by adding gaussian noise to input samples which are randomly selected with replacement. * **Factor for dividing the standard deviation of each feature**: The noise added to the input samples will have the standard deviation of the input features divided by the value of this parameter. . - **Smote input samples**: The new samples are generated by using the SMOTE algorithm (http://dx.doi.org/10.1613/jair.953) on input samples which are randomly selected with replacement. * **Number of nearest neighbors.**: Number of nearest neighbors to be used in the SMOTE algorithm. **set user defined seed**: Set specific seed. with integer value. **Load otb application from xml file**: Load otb application from xml file. **Save otb application to xml file**: Save otb application to xml file. Example ------- To run this example in command-line, use the following: :: otbcli_SampleAugmentation -in samples.sqlite -field class -label 3 -samples 100 -out augmented_samples.sqlite -exclude OGC_FID name class originfid -strategy smote -strategy.smote.neighbors 5 To run this example from Python, use the following code snippet: :: #!/usr/bin/python # Import the otb applications package import otbApplication # The following line creates an instance of the SampleAugmentation application SampleAugmentation = otbApplication.Registry.CreateApplication("SampleAugmentation") # The following lines set all the application parameters: SampleAugmentation.SetParameterString("in", "samples.sqlite") # The following line execute the application SampleAugmentation.ExecuteAndWriteOutput() Limitations ~~~~~~~~~~~ None Authors ~~~~~~~ This application has been written by OTB-Team.