Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou1,
Jiawei Zhou2,
Karen Livescu1 1Toyota Technological Institute at Chicago,
2Stony Brook University arxivcode
We use Flow-SLM-1B-extend to generate demos. Prompts are randomly selected from LibriSpeech test-clean and test-other subsets. First 3 seconds of the ground truth are used as audio prompt to generate 10 seconds of speech. The comparison with AudioLM uses the same prompts and generation settings.