You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/wav2vec/unsupervised/README.md
+10
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,16 @@ The fifth argument is which phonemizer to use. Supported values are [espeak](htt
48
48
49
49
Pre-trained fasttext LID models can be downloaded [here](https://fasttext.cc/docs/en/language-identification.html).
50
50
51
+
### Prepare TIMIT data
52
+
TIMIT transcripts include silence. Therefore VAD is not used for audio preprocessing, and we do not wrap transcripts with silences or insert random silence in between words.
53
+
54
+
To prepare TIMIT data for both the matched an unmatched setup:
Note that we assume the TIMIT distribution with capitalized directories and filenames are used (e.g., `TRAIN/DR1/FCJF0/SA1.PHN`).
60
+
51
61
## Generative adversarial training (GAN)
52
62
53
63
We then use a GAN model to build a first unsupervised ASR model. The data preparation above of both speech features and text data is a necessary procedure that enables the generator to match speech to text in an unsupervised way.
0 commit comments