Metadata-Version: 2.1
Name: seavoice-sdk
Version: 0.2.8
Summary: SeaVoice SDK: Client for Seasalt speech recognition and speech synthesis.
Home-page: UNKNOWN
Author: Seasalt.ai
Author-email: info@seasalt.ai
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# SeaVoice SDK

Please contact info@seasalt.ai if you have any questions.

## Speech-to-Text Example:

### Prerequisites
You will need a SeaVoice speech service account to run this example. Please contact info@seasalt.ai and apply for it.

### Install and import
To install SeaVoice SDK:

```pip install seavoice-sdk```

To import SeaVoice SDK:

```import seavoice_sdk.speech as speechsdk```


### Recognition
In the example below, we show how to recognize speech from an audio file. You can also apply recognition to an audio stream.

#### Speech Configuration
Use the following code to create `SpeechConfig` (contact info@seasalt.ai for the speech service account):
```
    speech_config = speechsdk.SpeechConfig(
        account_id=SEASALT_ACCOUNT,
        password=PASSWORD,
        speech_recognition_language="zh-TW",
        speech_recognition_punctuation=True
    )
```

#### Audio Configuration
Use the following code to create `AudioConfig`.
```
    audio_format = speechsdk.audio.AudioStreamFormat(
        samples_per_second=16000, bits_per_sample=16, channels=1)
    audio_stream = speechsdk.audio.PushAudioInputStream(stream_format=audio_format)
    audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)
```

#### Recognizer initialization
SpeechRecognizer can be initialzed as follows:
```
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config,
        audio_config=audio_config
    )
```

#### Callbacks connection
SpeechRecognizer has 5 kinds of callbacks:
- Recognizing - called when recognition is in progress.
- Recognized - called when a single utterance is recognized.
- Canceled - called when a continuous recognition is interrupted.
- Session_started - called when a recognition session is started.
- Session_stopped - called when a recognition session is stopped.

To connect the callbacks:
```
    speech_recognizer.recognizing.connect(
        lambda evt: print(f"Recognizing: {evt.result.text}"))
    speech_recognizer.recognized.connect(
        lambda evt: print(f'Recognized: {evt.result.text}'))
    speech_recognizer.canceled.connect(
        lambda evt: print(f'Canceled: {evt}'))
    speech_recognizer.session_started.connect(
        lambda evt: print(f'Session_started: {evt}'))
    speech_recognizer.session_stopped.connect(
        lambda evt: print(f'Session_stopped: {evt}'))
```

#### Recognizing speech
Now it is ready to run SpeechRecognizer. SpeechRecognizer has two ways for speech recognition:
- Single-shot recognition - Performs recognition once. This is to recognize a single audio file. It stops recognition after a single utterance is recognized.
- Continuous recognition (async) - Asynchronously initiates continuous recognition on an audio stream. Recognition results are available through callback functions. To stop the continuous recognition, call `stop_continuous_recognition_async()`.

```
    speech_recognizer.start_continuous_recognition_async()
    # Code commented out is for Single-shot recognition.
    # speech_recognizer.recognize_once()
```

#### Putting everything together
Now, put everything together and run the example:


```
import seavoice_sdk.speech as speechsdk
import time

SEASALT_ACCOUNT = "speechdemo"
PASSWORD = "12345678"
speech_config = speechsdk.SpeechConfig(
    account_id=SEASALT_ACCOUNT,
    password=PASSWORD,
    speech_recognition_language="zh-TW",
    speech_recognition_punctuation=True
)
audio_format = speechsdk.audio.AudioStreamFormat(16000,16,1)
audio_stream = speechsdk.audio.PushAudioInputStream(audio_format)
audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)

speech_recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config,
    audio_config=audio_config
)

def recognized_handler(evt):
    print(f'Recognized: {evt.result.text}')
    print(f'segment_start: {evt.segment_start}')
    print(f'segment_length: {evt.segment_length}')
    print(f'total_length: {evt.total_length}')
    print(f'word_alignment: {evt.result.word_alignment}')

done = False
def stopped_handler(evt):
    global done
    print(f'Session_stopped: {evt}')
    done = True

speech_recognizer.recognizing.connect(
    lambda evt: print(f"Recognizing: {evt.result.text}"))
speech_recognizer.recognized.connect(
    lambda evt: print(f'Recognized: {evt.result.text}'))
speech_recognizer.canceled.connect(
    lambda evt: print(f'Canceled: {evt}'))
speech_recognizer.session_started.connect(
    lambda evt: print(f'Session_started: {evt}'))
speech_recognizer.session_stopped.connect(stopped_handler)

speech_recognizer.start_continuous_recognition_async()
with open("test.wav", "rb") as audio_bytes:
    audio_stream.write(audio_bytes.read())
    audio_stream.write("EOS".encode('utf-8'))
while not done:
    time.sleep(3)
speech_recognizer.stop_continuous_recognition()
print("Finished recognizing")
```

##
##
## Text-to-Speech Example:

### Prerequisites
You will need a SeaVoice speech service account to run this example. Please contact info@seasalt.ai and apply for it.

### Install and import
To install SeaVoice SDK:

```pip install seavoice-sdk```

To import SeaVoice SDK:

```import seavoice_sdk.speech as speechsdk```


### Synthesis
In the example below, we show how to synthesize text to generate an audio file. You can also receive synthesis results from an audio stream.

#### Speech Configuration
Use the following code to create `SpeechConfig` (contact info@seasalt.ai for the speech service account):
```
    speech_config = speechsdk.SpeechConfig(
        account_id=SEASALT_ACCOUNT,
        password=PASSWORD,
        speech_synthesis_language="en-US",
        speech_synthesis_voice_name="TomHanks",
        speech_synthesis_output_format_id="riff-22khz-16bit-mono-pcm",
        speech_synthesis_output_pitch=0.0,
        speech_synthesis_output_speed=1.0
    )
```

Options for `speech_synthesis_language` could be `zh-TW`, `en-US` or `en-GB`.
For `zh-TW`, `speech_synthesis_voice_name` could be `Tongtong` or `Vivian`.
For `en-US`, `speech_synthesis_voice_name` could be `TomHanks`, `ReeseWitherspoon` or `AnneHathaway`.
For `en-GB`, `speech_synthesis_voice_name` could be `DavidAttenborough`.

Options for `speech_synthesis_output_format_id` could be `riff-22khz-16bit-mono-pcm`, `riff-16khz-16bit-mono-pcm` or `riff-8khz-16bit-mono-pcm`.

`speech_synthesis_output_pitch` could be a value between `-12.0` and `12.0`, where `0.0` is the default/normal value.
`speech_synthesis_output_speed` could be a value between `0.5` and `2.0`, where `1.0` is the default/normal value.

#### Audio Configuration
Use the following code to create `AudioOutputConfig`.
```
    import seavoice_sdk.audio as audio
    # Code commented out is an example for receiving synthesis results from an audio stream.
    # audio_stream = audio.AudioOutputStream()
    # audio_config = audio.AudioOutputConfig(stream=audio_stream)
    audio_config = audio.AudioOutputConfig(filename="output.wav")
```

#### Synthesizer initialization
Synthesizer can be initialzed as follows:
```
    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config,
        audio_config=audio_config
    )
```

#### Callbacks connection
SpeechSynthesizer has 4 kinds of callbacks:
- Synthesis_started - called when synthesis is started.
- Synthesizing - called when each time part of synthesis result is given.
- Synthesis_completed - called when all text was synthesized.
- Synthesis_canceled - called when synthesis is interrupted.

To connect the callbacks:
```
    speech_synthesizer.synthesis_started.connect(
        lambda : print("synthesis started"))
    speech_synthesizer.synthesizing.connect(
        lambda audio_data: print("synthesizing"))
    speech_synthesizer.synthesis_completed.connect(
        lambda audio_data: print("synthesis completed"))
    speech_synthesizer.synthesis_canceled.connect(
        lambda : print("synthesis canceled"))
```

#### Synthesizing text
Now it is ready to run SpeechSynthesizer. There are two ways to run SpeechSynthesizer:
- Synchronized - Perform synthesis until got all result.
- Asynchronized - Start synthesis and return a `speechsdk.ResultFuture`, which you could call its `get()` function to wait and get synthesis result.
```
    # Code commented out is for synchronized synthesis
    # result = speech_synthesizer.speak_text("Input your text to synthesize here.")
    result = speech_synthesizer.speak_text_async("Input your text to synthesize here.").get()
    # Code commented out is an example for reading synthesis result from an audio stream.
    # audio_data = audio_stream.read()
```

#### Judge result reason --> Check result
Both the synchronized and asynchronized methods return a `speechsdk.SpeechSynthesisResult` object, which indicates if synthesis was completed successfully:
```
    if result.reason == speechsdk.ResultReason.ResultReason_SynthesizingAudioCompleted:
        print("finished speech synthesizing")
```

#### Putting everything together
Now, put everything together and run the example:
```
from seavoice_sdk import speech as speechsdk
from seavoice_sdk import audio as audio

if __name__ == "__main__":
    SEASALT_ACCOUNT = "xxxxxxxxx"
    PASSWORD = "xxxxxxxx"
    speech_config = speechsdk.SpeechConfig(
        account_id=SEASALT_ACCOUNT,
        password=PASSWORD,
        speech_synthesis_language="en-US",
        speech_synthesis_voice_name="TomHanks",
        speech_synthesis_output_format_id="riff-22khz-16bit-mono-pcm",
        speech_synthesis_output_pitch=0.0,
        speech_synthesis_output_speed=1.0
    )
    audio_config = audio.AudioOutputConfig(filename="output.wav")
    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config,
        audio_config=audio_config
    )
    speech_synthesizer.synthesis_started.connect(
        lambda : print("synthesis started"))
    speech_synthesizer.synthesizing.connect(
        lambda audio_data: print("synthesizing"))
    speech_synthesizer.synthesis_completed.connect(
        lambda audio_data: print("synthesis completed"))
    speech_synthesizer.synthesis_canceled.connect(
        lambda : print("synthesis canceled"))

    # result = speech_synthesizer.speak_text("Seasalt.ai is a service company focusing on multi-modal AI solutions.")
    result = speech_synthesizer.speak_text_async("Seasalt.ai is a service company focusing on multi-modal AI solutions.").get()

    if result.reason == speechsdk.ResultReason.ResultReason_SynthesizingAudioCompleted:
        print("finished speech synthesizing")

```

##
##
## Change Log
[0.2.3] - 2022-9-23
#### Improvements
 - Add reconnection mechanism

[0.2.2] - 2021-8-16
#### Bugfixes
 - Some callbacks were never called

[0.2.1] - 2021-7-25
#### changed sdk name to seavoice

[0.1.14] - 2021-4-9
#### Improments
 - Added output of post-processing result

[0.1.13] - 2021-4-1
#### Improments
 - Added output of segment and word alignment information

[0.1.12] - 2020-12-10
#### Bugfixes
 - Remove unused variable
#### Improvements
 - Added websocket packages in requirements.txt file


