Metadata-Version: 2.1
Name: TensorFlowASR
Version: 0.2.8
Summary: Almost State-of-the-art Automatic Speech Recognition using Tensorflow 2
Home-page: https://github.com/TensorSpeech/TensorFlowASR
Author: Huy Le Nguyen
Author-email: nlhuy.cs.16@gmail.com
License: UNKNOWN
Description: <h1 align="center">
        <p>TensorFlowASR :zap:</p>
        <p align="center">
        <a href="https://github.com/TensorSpeech/TensorFlowASR/blob/main/LICENSE">
          <img alt="GitHub" src="https://img.shields.io/github/license/TensorSpeech/TensorFlowASR?style=for-the-badge&logo=apache">
        </a>
        <img alt="python" src="https://img.shields.io/badge/python-%3E%3D3.6-blue?style=for-the-badge&logo=python">
        <img alt="tensorflow" src="https://img.shields.io/badge/tensorflow-%3E%3D2.3.0-orange?style=for-the-badge&logo=tensorflow">
        <img alt="ubuntu" src="https://img.shields.io/badge/ubuntu-%3E%3D18.04-blueviolet?style=for-the-badge&logo=ubuntu">
        </p>
        </h1>
        <h2 align="center">
        <p>Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2</p>
        </h2>
        
        <p align="center">
        TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:
        </p>
        
        ## What's New?
        
        - (10/18/2020) Supported Streaming Transducer [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)
        - (10/15/2020) Add gradients accumulation and Refactor to TensorflowASR
        - (10/10/2020) Update documents and upload package to pypi
        - (10/6/2020) Change `nlpaug` version to `>=1.0.1`
        - (9/18/2020) Support `word-pieces` (aka `subwords`) using `tensorflow-datasets`
        - Support `transducer` tflite greedy decoding (conversion and invocation)
        - Distributed training using `tf.distribute.MirroredStrategy`
        
        ## :yum: Supported Models
        
        - **CTCModel** (End2end models using CTC Loss for training)
        - **Transducer Models** (End2end models using RNNT Loss for training)
        - **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100))
          See [examples/conformer](./examples/conformer)
        - **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621))
          See [examples/streaming_transducer](./examples/streaming_transducer)
        
        ## Setup Environment and Datasets
        
        Install tensorflow: `pip3 install -U tensorflow` or `pip3 install tf-nightly` (for using tflite)
        
        Install packages (choose _one_ of these options):
        
        - Run `pip3 install -U TensorFlowASR`
        - Clone the repo and run `python3 setup.py install` in the repo's directory
        
        For **setting up datasets**, see [datasets](./tensorflow_asr/datasets/README.md)
        
        - For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh`
        
        - For _training_ **Transducer Models**, export `CUDA_HOME` and run `./scripts/install_rnnt_loss.sh`
        
        - Method `tensorflow_asr.utils.setup_environment()` enable **mixed_precision** if available.
        
        - To enable XLA, run `TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script`
        
        Clean up: `python3 setup.py clean --all` (this will remove `/build` contents)
        
        ## TFLite Convertion
        
        After converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **unicode code points**, then we can convert unicode points to string.
        
        1. Install `tf-nightly` using `pip install tf-nightly`
        2. Build a model with the same architecture as the trained model _(if model has tflite argument, you must set it to True)_, then load the weights from trained model to the built model
        3. Load `TFSpeechFeaturizer` and `TextFeaturizer` to model using function `add_featurizers`
        4. Convert model's function to tflite as follows:
        
        ```python
        func = model.make_tflite_function(greedy=True) # or False
        concrete_func = func.get_concrete_function()
        converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
        converter.experimental_new_converter = True
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                               tf.lite.OpsSet.SELECT_TF_OPS]
        tflite_model = converter.convert()
        ```
        
        5. Save the converted tflite model as follows:
        
        ```python
        if not os.path.exists(os.path.dirname(tflite_path)):
            os.makedirs(os.path.dirname(tflite_path))
        with open(tflite_path, "wb") as tflite_out:
            tflite_out.write(tflite_model)
        ```
        
        5. Then the `.tflite` model is ready to be deployed
        
        ## Features Extraction
        
        See [features_extraction](./tensorflow_asr/featurizers/README.md)
        
        ## Augmentations
        
        See [augmentations](./tensorflow_asr/augmentations/README.md)
        
        ## Training & Testing
        
        **Example YAML Config Structure**
        
        ```yaml
        speech_config: ...
        model_config: ...
        decoder_config: ...
        learning_config:
          augmentations: ...
          dataset_config:
            train_paths: ...
            eval_paths: ...
            test_paths: ...
            tfrecords_dir: ...
          optimizer_config: ...
          running_config:
            batch_size: 8
            num_epochs: 20
            outdir: ...
            log_interval_steps: 500
        ```
        
        See [examples](./examples/) for some predefined ASR models and results
        
        ## Corpus Sources and Pretrained Models
        
        For pretrained models, go to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing)
        
        ### English
        
        |   **Name**   |                             **Source**                             | **Hours** |
        | :----------: | :----------------------------------------------------------------: | :-------: |
        | LibriSpeech  |              [LibriSpeech](http://www.openslr.org/12)              |   970h    |
        | Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) |   1932h   |
        
        ### Vietnamese
        
        |                **Name**                |                                       **Source**                                       | **Hours** |
        | :------------------------------------: | :------------------------------------------------------------------------------------: | :-------: |
        |                 Vivos                  |          [https://ailab.hcmus.edu.vn/vivos](https://ailab.hcmus.edu.vn/vivos)          |    15h    |
        |          InfoRe Technology 1           |  [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/25hours.zip)   |    25h    |
        | InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/audiobooks.zip) |   415h    |
        
        ### German
        
        |   **Name**   |                             **Source**                              | **Hours** |
        | :----------: | :-----------------------------------------------------------------: | :-------: |
        | Common Voice | [https://commonvoice.mozilla.org/](https://commonvoice.mozilla.org) |   750h    |
        
        ## References & Credits
        
        1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq)
        2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer)
        3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711)
        4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet)
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.6
Description-Content-Type: text/markdown
