UniSpeech/UniSpeech-SAT
Sanyuan Chen (陈三元) 2247223e02
Add paper link for UnisSpeech-SAT
2021-10-13 12:53:29 +08:00
..
config upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
examples upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
fairseq add codes for hubert model 2021-10-08 16:41:20 +08:00
fairseq_cli upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
scripts upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
CODE_OF_CONDUCT.md upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
CONTRIBUTING.md upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
LICENSE upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
README.md Add paper link for UnisSpeech-SAT 2021-10-13 12:53:29 +08:00
SUPERB_Results.png upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
hubconf.py upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
preprocess.py upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
pyproject.toml upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
setup.py upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00
train.py upload codes for UniSpeech-SAT 2021-10-08 14:59:58 +08:00

README.md

UniSpeech-SAT

This is the official implementation of paper "UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training"(ICASSP 2022 Submission). The implementation mainly based on fairseq codebase.

Requirements and Installation

  • Pytorch >= 1.6.0
  • python version >= 3.7
cd UniSpeech/UniSpeech-SAT
pip install --editable ./ --user

Pre-trained models

Model Dataset Model
UniSpeech-SAT Base 960 hrs LibriSpeech download
UniSpeech-SAT Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli download
UniSpeech-SAT Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli download

Load pretrained model

Example usage:

import torch
import fairseq

cp_path = '/path/to/wav2vec.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
model = model[0]
model.remove_pretraining_modules()
model.eval()

wav_input_16khz = torch.randn(1,10000)
f = model.feature_extractor(wav_input_16khz)

Results on SUPERB

alt text

Citation

If you find our work useful, please cite our paper.