🥤 SODA: Scaling Open Discrete Audio

Project Page: https://soda-audio.github.io/

SODA is a suite of open discrete audio foundation models. It unifies audio and text tasks (Continuation, ASR, TTS) into a single next-token prediction framework, meaning that all of these tasks only differ by how the SODA model is prompted. This demo runs on soda-4b-base. We release checkpoints for all sizes (135M to 4B) – find them on our Hugging Face collection.

Note: SODA was trained exclusively on English speech data and does not currently support other languages.

🥤 SODA: Scaling Open Discrete Audio

Continue speech from an audio prompt!

Input

Generation Parameters (100 tokens ≈ 1 second)

Output