.. _SampleAugmentation: SampleAugmentation ================== Generates synthetic samples from a sample data file. Description ----------- The application takes a sample data file as generated by the :ref:`SampleExtraction` application and generates synthetic samples to increase the number of available samples. Parameters ---------- .. contents:: :local: .. |br| raw:: html
.. |em| raw:: html   **Input samples** :code:`-in vectorfile` *Mandatory* |br| Vector data file containing samples (OGR format) **Output samples** :code:`-out filename [dtype]` *Mandatory* |br| Output vector data file storing new samples(OGR format). **Field Name** :code:`-field string` |br| Name of the field carrying the class name in the input vectors. **Layer Index** :code:`-layer int` *Default value: 0* |br| Layer index to read in the input vector file. **Label of the class to be augmented** :code:`-label int` *Default value: 1* |br| Label of the class of the input file for which new samples will be generated. **Number of generated samples** :code:`-samples int` *Default value: 100* |br| Number of synthetic samples that will be generated. **Field names for excluded features** :code:`-exclude string1 string2...` |br| List of field names in the input vector data that will not be generated in the output file. **Augmentation strategy** :code:`-strategy [replicate|jitter|smote]` *Default value: replicate* |br| * **Replicate input samples** |br| The new samples are generated by replicating input samples which are randomly selected with replacement. * **Jitter input samples** |br| The new samples are generated by adding gaussian noise to input samples which are randomly selected with replacement. * **Smote input samples** |br| The new samples are generated by using the SMOTE algorithm (http://dx.doi.org/10.1613/jair.953) on input samples which are randomly selected with replacement. Jitter input samples options ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Factor for dividing the standard deviation of each feature** :code:`-strategy.jitter.stdfactor float` *Default value: 10* |br| The noise added to the input samples will have the standard deviation of the input features divided by the value of this parameter. Smote input samples options ^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Number of nearest neighbors** :code:`-strategy.smote.neighbors int` *Default value: 5* |br| Number of nearest neighbors to be used in the SMOTE algorithm ------------ **Random seed** :code:`-seed int` |br| Set a specific random seed with integer value. Examples -------- From the command-line: .. code-block:: bash otbcli_SampleAugmentation -in samples.sqlite -field class -label 3 -samples 100 -out augmented_samples.sqlite -exclude OGC_FID name class originfid -strategy smote -strategy.smote.neighbors 5 From Python: .. code-block:: python import otbApplication app = otbApplication.Registry.CreateApplication("SampleAugmentation") app.SetParameterString("in", "samples.sqlite") app.SetParameterString("field", "class") app.SetParameterInt("label", 3) app.SetParameterInt("samples", 100) app.SetParameterString("out", "augmented_samples.sqlite") app.SetParameterStringList("exclude", ['OGC_FID', 'name', 'class', 'originfid']) app.SetParameterString("strategy","smote") app.SetParameterInt("strategy.smote.neighbors", 5) app.ExecuteAndWriteOutput()