psi/Sources/Speech/Microsoft.Psi.Speech
danbohus 14fe1069a3
Various updates to PsiStudio, Audio and Sigma (#310)
* ### __Change to SIGMA__
* Fixed issues with moving to the previous step.
* Added support for using specific synthesis voices via the Sigma client configuration.
* Updated the Sigma app to use caching for speech synthesis.

### __Changes to Components__
* Added `SpeechSynthesisCache` component to `Microsoft.Psi.Speech` to implement a cache for generated speech synthesis content.
* The Azure-based `SpeechSynthesizer` component from `Microsoft.Psi.CognitiveServices.Speech` can now cache the generated utterances
for future use (future identical requests do not need to go to the cloud, speeding up synthesis). The developer can also now select (via configuration) whether the audio buffers are streamed in real-time or as they arrive, and how close ahead of real time the generated audio buffers are streamed. These options allow for better controlling playback and avoiding drops.

### __Changes to Audio__
* Added a `Streamline` operator for audio streams that allows for normalizing gaps and overlaps in the audio buffer stream based on different methods (Concatenate, Pleat, Unpleat)
* Added a batch processing task for exporting audio streams to wav files

### __Changes to PsiStudio and Visualization__
* Improved robustness of audio playback and addressed a number of issues that created drift between audio and visual playback.
* Run batch task processing menu is now split into submenus by batch task processing namespace.
* Added ability to export audio streams (or selections thereof) to a wav file.

### __Changes to Microsoft.Psi.Data__
* Added a `SessionImporter.OpenStream` overload that allows for providing a `[PartitionName]:StreamName` stream specification. This simplifies the configuration of batch processing tasks where input streams need to be specified, eliminating the need for specifying the partition separately. The existing batch tasks were adjusted to leverage this feature.

### __Changes to Runtime__
* Introduced `Merge` interval operator to compute a non-overlapping set of intervals that covers a given set of (potentially overlapping intervals)

* Update per CR
2024-04-03 14:59:46 -07:00
..
GrammarInfo.cs Merged PR 43330: Release 0.12.53.2 2020-06-17 00:33:21 +00:00
ISpeechRecognitionAlternate.cs Releasing Mule (0.2.123.1) to Master 2018-03-17 20:29:49 +00:00
ISpeechRecognitionResult.cs Releasing Mule (0.2.123.1) to Master 2018-03-17 20:29:49 +00:00
ISpeechRecognizer.cs Merged PR 77106: March 2024 Release 2024-03-14 21:26:24 +00:00
ISpeechSynthesizer.cs Merged PR 77106: March 2024 Release 2024-03-14 21:26:24 +00:00
IStreamingSpeechRecognitionResult.cs Merged PR 29988: Release 0.8.32.1 2019-07-03 22:56:38 +00:00
IVoiceActivityDetector.cs Merged PR 77106: March 2024 Release 2024-03-14 21:26:24 +00:00
Microsoft.Psi.Speech.csproj Merged PR 77106: March 2024 Release 2024-03-14 21:26:24 +00:00
SpeechRecognitionAlternate.cs Releasing Mule (0.2.123.1) to Master 2018-03-17 20:29:49 +00:00
SpeechRecognitionResult.cs Merged PR 29988: Release 0.8.32.1 2019-07-03 22:56:38 +00:00
SpeechSynthesisCache.cs Various updates to PsiStudio, Audio and Sigma (#310) 2024-04-03 14:59:46 -07:00
StreamingSpeechRecognitionResult.cs Merged PR 29988: Release 0.8.32.1 2019-07-03 22:56:38 +00:00
build.sh Merged PR 30335: Release 0.9.6.1 2019-07-18 21:43:46 +00:00
stylecop.json Releasing Mule (0.2.123.1) to Master 2018-03-17 20:29:49 +00:00