VOCABULARY

Augmentation

Data augmentation refers to techniques used to artificially increase the size of a dataset by applying various transformations on the original data. Having a larger training dataset can lead to better generalization.

How Data Augmentation Works

Let's have a look at the types of data that can be augmented, the purpose and implementation of data augmentation techniques.

1. Type of Data:

Image Data:
  • Rotations, zooming, flips (horizontal & vertical), color variations, cropping, and more.
  • More advanced techniques include cutout, and mixup.
Text Data:
  • Back translation (translating a sentence from the original language to another language and then back to the original language), synonym replacement, and sentence shuffling.
Audio Data:
  • Changing pitch, speed, or adding noise.

2. Purpose:

  1. Enhance Generalization: By exposing the model to various modifications of the original data, the model becomes more robust and can generalize better to new, unseen data.
  2. Balance Datasets: Augmentation can be used to balance classes in datasets by artificially increasing the number of examples in underrepresented classes.

3. Implementation:

  1. Most machine learning libraries, such as TensorFlow and PyTorch, have built-in functions or modules for data augmentation.
  2. Data augmentation is typically applied during the training process. When a training batch is requested, raw data samples are fetched and augmented on-the-fly before being fed into the model.
Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Related terms
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.