Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou1, Jiawei Zhou2, Karen Livescu1
1Toyota Technological Institute at Chicago, 2Stony Brook University
arxiv
We use Flow-SLM-1B-extend to generate demos. Prompts are randomly selected from LibriSpeech test-clean and test-other subsets. First 3 seconds of the ground truth are used as audio prompt to generate 10 seconds of speech.

Prompted generation

Ground Truth Resynthesis Prompts Continuations
























Unprompted generation