UniSpeech - Large Scale Self-Supervised Learning for Speech
Перейти к файлу
Sanyuan Chen (陈三元) 2247223e02
Add paper link for UnisSpeech-SAT
2021-10-13 12:53:29 +08:00
UniSpeech Reorg files 2021-10-04 19:47:52 +08:00
UniSpeech-SAT Add paper link for UnisSpeech-SAT 2021-10-13 12:53:29 +08:00
LICENSE add code 2021-07-14 12:51:22 +08:00
README.md Fix typo 2021-10-13 09:20:08 +08:00
azure-pipelines.yml Set up CI with Azure Pipelines 2021-07-14 13:02:44 +08:00

README.md

UniSpeech

The family of UniSpeech:

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Update

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model Dataset Model
UniSpeech Base 1500 hrs CommonVoice download
UniSpeech Large 1500 hrs CommonVoice download
UniSpeech-SAT Base 960 hrs LibriSpeech download
UniSpeech-SAT Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli download
UniSpeech-SAT Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli download

Universal Representation Evaluation on SUPERB

alt text

Downstream Task Performance

We also evaluate our models on typical speaker related benchmarks.

Speaker Verification

Model Fix pre-train Vox1-O Vox1-E Vox1-H
ECAPA-TDNN - 0.87 1.12 2.12
HuBERT large Yes 0.888 0.912 1.853
Wav2Vec2.0 (XLSR) Yes 0.915 0.945 1.895
UniSpeech-SAT large Yes 0.771 0.781 1.669
HuBERT large No 0.585 0.654 1.342
Wav2Vec2.0 (XLSR) No 0.564 0.605 1.23
UniSpeech-SAT large No 0.564 0.561 1.23

paper for verification

Regarding reproduction, please contact Zhengyang

Speech Separation

Evaluation on LibriCSS

Model 0S 0L OV10 OV20 OV30 OV40
Conformer (SOTA) 4.5 4.4 6.2 8.5 11 12.6
UniSpeech-SAT base 4.4 4.4 5.4 7.2 9.2 10.5
UniSpeech-SAT large 4.3 4.2 5.0 6.3 8.2 8.8

paper will appear soon

Regarding reproduction, please contact Sanyuan

Speech Diarization

Evaluation on CALLHOME

Model spk_2 spk_3 spk_4 spk_5 spk_6 spk_all
EEND (SOTA) 7.96 11.93 16.38 21.21 23.1 12.49
UniSpeech-SAT large 5.93 10.66 12.9 16.48 23.25 10.92

paper will appear soon

Regarding reproduction, please contact Zhengyang

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Reference

If you find Our work is useful in your research, please cite the following paper:

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu (yuwu1@microsoft.com).