Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou¹, Jiawei Zhou², Karen Livescu¹
¹Toyota Technological Institute at Chicago, ²Stony Brook University
arxiv

We use Flow-SLM-1B-extend to generate demos. Prompts are randomly selected from LibriSpeech test-clean and test-other subsets. First 3 seconds of the ground truth are used as audio prompt to generate 10 seconds of speech.

Prompted generation

Ground Truth	Resynthesis	Prompts	Continuations

Unprompted generation

Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Prompted generation

Unprompted generation

Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling