Flow-SLM:
Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling

Ju-Chieh Chou1, Jiawei Zhou2, Karen Livescu1
1Toyota Technological Institute at Chicago, 2Stony Brook University
arxiv     code
We use Flow-SLM-1B-extend to generate demos. Prompts are randomly selected from LibriSpeech test-clean and test-other subsets. First 3 seconds of the ground truth are used as audio prompt to generate 10 seconds of speech. The comparison with AudioLM uses the same prompts and generation settings.

Prosodic prompts

Prompt Continuation
Happy
Sorry
Whisper

Prompted generation

Ground Truth Resynthesis Prompts Continuations
























Comparison with AudioLM

Prompt Original AudioLM Flow-SLM








































































































































































Unprompted generation