История

Sanyuan Chen (陈三元) 2247223e02 Add paper link for UnisSpeech-SAT		2021-10-13 12:53:29 +08:00
..
config	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
examples	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
fairseq	add codes for hubert model	2021-10-08 16:41:20 +08:00
fairseq_cli	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
scripts	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
CODE_OF_CONDUCT.md	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
CONTRIBUTING.md	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
LICENSE	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
README.md	Add paper link for UnisSpeech-SAT	2021-10-13 12:53:29 +08:00
SUPERB_Results.png	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
hubconf.py	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
preprocess.py	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
pyproject.toml	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
setup.py	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00
train.py	upload codes for UniSpeech-SAT	2021-10-08 14:59:58 +08:00

README.md

UniSpeech-SAT

This is the official implementation of paper "UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training"(ICASSP 2022 Submission). The implementation mainly based on fairseq codebase.

Requirements and Installation

Pytorch >= 1.6.0
python version >= 3.7

cd UniSpeech/UniSpeech-SAT
pip install --editable ./ --user

Pre-trained models

Model	Dataset	Model
UniSpeech-SAT Base	960 hrs LibriSpeech	download
UniSpeech-SAT Base+	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download
UniSpeech-SAT Large	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download

Load pretrained model

Example usage:

import torch
import fairseq

cp_path = '/path/to/wav2vec.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
model = model[0]
model.remove_pretraining_modules()
model.eval()

wav_input_16khz = torch.randn(1,10000)
f = model.feature_extractor(wav_input_16khz)

Results on SUPERB

Citation

If you find our work useful, please cite our paper.