2247223e02 | ||
---|---|---|
UniSpeech | ||
UniSpeech-SAT | ||
LICENSE | ||
README.md | ||
azure-pipelines.yml |
README.md
UniSpeech
The family of UniSpeech:
UniSpeech (
ICML 2021
): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR
UniSpeech-SAT (
ICASSP 2022 Submission
): Universal Speech Representation Learning with Speaker Aware Pre-Training
Update
- [Model Release] Octorber 13, 2021: UniSpeech-SAT models are releaseed.
- [HuggingFace Integration] Octorber 11, 2021: UniSpeech models are on HuggingFace .
- [Model Release] June, 2021: UniSpeech v1 models are released.
Pre-trained models
We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.
Model | Dataset | Model |
---|---|---|
UniSpeech Base | 1500 hrs CommonVoice | download |
UniSpeech Large | 1500 hrs CommonVoice | download |
UniSpeech-SAT Base | 960 hrs LibriSpeech | download |
UniSpeech-SAT Base+ | 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli | download |
UniSpeech-SAT Large | 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli | download |
Universal Representation Evaluation on SUPERB
Downstream Task Performance
We also evaluate our models on typical speaker related benchmarks.
Speaker Verification
Model | Fix pre-train | Vox1-O | Vox1-E | Vox1-H |
---|---|---|---|---|
ECAPA-TDNN | - | 0.87 | 1.12 | 2.12 |
HuBERT large | Yes | 0.888 | 0.912 | 1.853 |
Wav2Vec2.0 (XLSR) | Yes | 0.915 | 0.945 | 1.895 |
UniSpeech-SAT large | Yes | 0.771 | 0.781 | 1.669 |
HuBERT large | No | 0.585 | 0.654 | 1.342 |
Wav2Vec2.0 (XLSR) | No | 0.564 | 0.605 | 1.23 |
UniSpeech-SAT large | No | 0.564 | 0.561 | 1.23 |
Regarding reproduction, please contact Zhengyang
Speech Separation
Evaluation on LibriCSS
Model | 0S | 0L | OV10 | OV20 | OV30 | OV40 |
---|---|---|---|---|---|---|
Conformer (SOTA) | 4.5 | 4.4 | 6.2 | 8.5 | 11 | 12.6 |
UniSpeech-SAT base | 4.4 | 4.4 | 5.4 | 7.2 | 9.2 | 10.5 |
UniSpeech-SAT large | 4.3 | 4.2 | 5.0 | 6.3 | 8.2 | 8.8 |
paper will appear soon
Regarding reproduction, please contact Sanyuan
Speech Diarization
Evaluation on CALLHOME
Model | spk_2 | spk_3 | spk_4 | spk_5 | spk_6 | spk_all |
---|---|---|---|---|---|---|
EEND (SOTA) | 7.96 | 11.93 | 16.38 | 21.21 | 23.1 | 12.49 |
UniSpeech-SAT large | 5.93 | 10.66 | 12.9 | 16.48 | 23.25 | 10.92 |
paper will appear soon
Regarding reproduction, please contact Zhengyang
License
This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.
Microsoft Open Source Code of Conduct
Reference
If you find Our work is useful in your research, please cite the following paper:
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
Contact Information
For help or issues using UniSpeech models, please submit a GitHub issue.
For other communications related to UniSpeech, please contact Yu Wu (yuwu1@microsoft.com
).