Metadata-Version: 2.1
Name: waveaugment
Version: 0.2.4
Summary: WaveAugment performs data augmentation on audio data. 
Home-page: https://github.com/facebookresearch/WavAugment
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown
License-File: LICENSE

# WaveAugment

WavAugment performs data augmentation on audio data. 
The audio data is represented as [pytorch](https://pytorch.org/) tensors. 

It is particularly useful for speech data. 
Among others, it implements the augmentations that we found to be most useful for self-supervised learning 
(_Data Augmenting Contrastive Learning of Speech Representations in the Time Domain_, E. Kharitonov, M. Rivière, G. Synnaeve, L. Wolf, P.-E. Mazaré, M. Douze, E. Dupoux. [[arxiv]](https://arxiv.org/abs/2007.00991)):

* Pitch randomization,
* Reverberation,
* Additive noise,
* Time dropout (temporal masking),
* Band reject,
* Clipping

Internally, WavAugment uses [libsox](http://sox.sourceforge.net/libsox.html) and allows interleaving of libsox- and pytorch-based effects. 

### Requirements
 * Linux or MacOS
 * [pytorch](pytorch.org) >= 1.7.1
 * [torchaudio](pytorch.org/audio) >= 0.7

### Installation
To install WavAugment, run the following command:
```bash
pip install waveaugment
```

## Usage

First of all, we provide thouroughly documented [examples](./examples/python), where we demonstrate how a data-augmented dataset interface works. We also provide a Jupyter-based [tutorial](./examples/python/WavAugment_walkthrough.ipynb) [(open in colab)](https://colab.research.google.com/github/facebookresearch/WavAugment/blob/master/examples/python/WavAugment_walkthrough.ipynb) that illlustrates how one can apply various useful effects to a piece of speech (recorded over the mic or pre-recorded).

### The `EffectChain`

The central object is the chain of effects, `EffectChain`, that are applied on a `torch.Tensor` to produce another `torch.Tensor`. 
This chain can have multiple effects composed:
```python
import augment
effect_chain = augment.EffectChain().pitch(100).rate(16_000)
```
Parameters of the effect coincide with those of libsox (http://sox.sourceforge.net/libsox.html); however, you can also randomize the parameters by providing a python `Callable` and mix them with standard parameters:
```python
import numpy as np
random_pitch_shift = lambda: np.random.randint(-100, +100)
# the pitch will be changed by a shift somewhere between (-100, +100)
effect_chain = augment.EffectChain().pitch("-q", random_pitch_shift).rate(16_000)
```
Here, the flag`-q` makes `pitch` run faster at some expense of the quality.
If some parameters are provided by a Callable, this Callable will be invoked every time `EffectChain` is applied (eg. to generate random parameters).

### Applying the chain

To apply a chain of effects on a torch.Tensor, we code the following:
```python
output_tensor = augment.EffectChain().pitch(100).rate(16_000).apply(input_tensor, \
    src_info=src_info, target_info=target_info)
```
WavAugment expects `input_tensor` to have a shape of (channels, length). As `input_tensor` does not contain important meta-information, such as sampling rate, we need to provide it manually.
This is done by passing two dictionaries, `src_info` (meta-information about the input format) and `target_info` (our expectated format for the output).

At minimum, we need to set the sampling rate for the input tensor: `{'rate': 16_000}`. 

### Example usage

Below is a small gist of a potential usage:

```python
import augment
import numpy as np

x, sr = torchaudio.load(test_wav)

# input signal properties
src_info = {'rate': sr}

# output signal properties
target_info = {'channels': 1, 
               'length': 0, # not known beforehand
               'rate': 16_000}
# write down the chain of effects with their string parameters and call .apply()
# effects are specified as a chain of method calls with parameters that can be 
# strings, numbers, or callables. The latter case is used for generating randomized
# transformations
random_pitch = lambda: np.random.randint(-400, -200)
y = augment.EffectChain().pitch(random_pitch).rate(16_000).apply(x, \
    src_info=src_info, target_info=target_info)
```

## Important notes
It often happens that a command-line invocation of sox would change effect chain. To get a better idea of what sox executes internally, you can launch it with a -V flag, eg by running:
 ```bash
sox -V tests/test.wav out.wav reverb 0 50 100
```
we will see something like:
```
sox INFO sox: effects chain: input        16000Hz  1 channels
sox INFO sox: effects chain: reverb       16000Hz  2 channels
sox INFO sox: effects chain: channels     16000Hz  1 channels
sox INFO sox: effects chain: dither       16000Hz  1 channels
sox INFO sox: effects chain: output       16000Hz  1 channels
```
This output tells us that the `reverb` effect changes the number of channels, which are squashed into 1 channel by the `channel` effect. Sox also added `dither` effect to hide processing artifacts.

WavAugment remains explicit and doesn't add effects under the hood. 
If you want to emulate a Sox command that decomposes into several effects, we advise to consult `sox -V` and apply the effects manually. 
Try it out on some files before running a heavy machine-learning job. 

## Citation
If you find WavAugment useful in your research, please consider citing:
```
@article{wavaugment2020,
  title={Data Augmenting Contrastive Learning of Speech Representations in the Time Domain},
  author={Kharitonov, Eugene and Rivi{\`e}re, Morgane and Synnaeve, Gabriel and Wolf, Lior and Mazar{\'e}, Pierre-Emmanuel and Douze, Matthijs and Dupoux, Emmanuel},
  journal={arXiv preprint arXiv:2007.00991},
  year={2020}
}
```

## Contributing
See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.

## License
WavAugment is MIT licensed, as found in the LICENSE file.



