Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou1,
Jiawei Zhou2,
Karen Livescu1 1Toyota Technological Institute at Chicago,
2Stony Brook University arxiv
We use Flow-SLM-1B-extend to generate demos. Prompts are randomly selected from LibriSpeech test-clean and test-other subsets. First 3 seconds of the ground truth are used as audio prompt to generate 10 seconds of speech.