UniSpeech - Large Scale Self-Supervised Learning for Speech

diarization pytorch speaker-verification speech speech-diarization speech-processing speech-recognition speech-separation

Перейти к файлу

Sanyuan Chen (陈三元) 2247223e02 Add paper link for UnisSpeech-SAT		2021-10-13 12:53:29 +08:00
UniSpeech	Reorg files	2021-10-04 19:47:52 +08:00
UniSpeech-SAT	Add paper link for UnisSpeech-SAT	2021-10-13 12:53:29 +08:00
LICENSE	add code	2021-07-14 12:51:22 +08:00
README.md	Fix typo	2021-10-13 09:20:08 +08:00
azure-pipelines.yml	Set up CI with Azure Pipelines	2021-07-14 13:02:44 +08:00

README.md

UniSpeech

The family of UniSpeech:

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Update

[Model Release] Octorber 13, 2021: UniSpeech-SAT models are releaseed.
[HuggingFace Integration] Octorber 11, 2021: UniSpeech models are on HuggingFace .
[Model Release] June, 2021: UniSpeech v1 models are released.

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model	Dataset	Model
UniSpeech Base	1500 hrs CommonVoice	download
UniSpeech Large	1500 hrs CommonVoice	download
UniSpeech-SAT Base	960 hrs LibriSpeech	download
UniSpeech-SAT Base+	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download
UniSpeech-SAT Large	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download

Universal Representation Evaluation on SUPERB

Downstream Task Performance

We also evaluate our models on typical speaker related benchmarks.

Speaker Verification

Model	Fix pre-train	Vox1-O	Vox1-E	Vox1-H
ECAPA-TDNN	-	0.87	1.12	2.12
HuBERT large	Yes	0.888	0.912	1.853
Wav2Vec2.0 (XLSR)	Yes	0.915	0.945	1.895
UniSpeech-SAT large	Yes	0.771	0.781	1.669
HuBERT large	No	0.585	0.654	1.342
Wav2Vec2.0 (XLSR)	No	0.564	0.605	1.23
UniSpeech-SAT large	No	0.564	0.561	1.23

paper for verification

Regarding reproduction, please contact Zhengyang

Speech Separation

Evaluation on LibriCSS

Model	0S	0L	OV10	OV20	OV30	OV40
Conformer (SOTA)	4.5	4.4	6.2	8.5	11	12.6
UniSpeech-SAT base	4.4	4.4	5.4	7.2	9.2	10.5
UniSpeech-SAT large	4.3	4.2	5.0	6.3	8.2	8.8

paper will appear soon

Regarding reproduction, please contact Sanyuan

Speech Diarization

Evaluation on CALLHOME

Model	spk_2	spk_3	spk_4	spk_5	spk_6	spk_all
EEND (SOTA)	7.96	11.93	16.38	21.21	23.1	12.49
UniSpeech-SAT large	5.93	10.66	12.9	16.48	23.25	10.92

paper will appear soon

Regarding reproduction, please contact Zhengyang

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Reference

If you find Our work is useful in your research, please cite the following paper:

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu (yuwu1@microsoft.com).