User guide

In this user guide, you’ll find advice on how to get the most out of Stable Audio, information about the AI models and training data behind the product, and a guide to how you can use the tracks you create.


Prompts are the text you use to tell the AI how you want your music to sound.

Learn how to tweak your prompts to optimize what you can get out of the product.

Interface guide

An overview of how input audio and the extra input work.

Find out how these can enhance your audio generating workflow.


We use state-of-the-art audio diffusion AI models to generate music.

Check out some technical details about how our models work.


With Stable Audio, you describe the audio you want with a text prompt, and the system generates it for you. This guide shares some tips on how to prompt.

This is just what works for us - we encourage you to experiment and find what works for you!

Add detail

If you have something specific in mind, include it. Genres, descriptive phrases, instruments and moods work particularly well.

For example, a detailed prompt might look something like this:

Cinematic, Soundtrack, Wild West, High Noon Shoot Out, Percussion, Whistles, Horses, Action Scene, SFX, Shaker, Guitar, Bass, Timpani, Strings, Tense, Climactic, Atmospheric, Moody

Set the mood

When including detail on the mood you want, try using a combination of musical and emotional terms.

Musical might be groovy or rhythmic. Emotional might be sad or beautiful. Using both musical and emotional words in combination can work well.

Choose instruments

We’ve found that adding adjectives to instrument names is helpful.

For example, Reverberated Guitar, Powerful Choir, or Swelling Strings.

Set the BPM

Setting the beats per minute is a great way to ensure your output is the tempo you want, and can help keep it in time. The key here is to try to stick to BPM settings that are appropriate to the genre you’re generating.

For example, if you were generating a Drum and Bass track, you might want to add 170 BPM to your prompt.

Full instrumentals

Use Stable Audio to generate full musical audio encompassing a range of instruments.

Include as much detail as you can!

Trance, Ibiza, Beach, Sun, 4 AM, Progressive, Synthesizer, 909, Dramatic Chords, Choir, Euphoric, Nostalgic, Dynamic, Flowing

Synthpop, Big Reverbed Synthesizer Pad Chords, Driving Gated Drum Machine, Atmospheric, Moody, Nostalgic, Cool, Club, Striped-back, Pop Instrumental, 100 BPM

Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM

Ambient Techno, meditation, Scandinavian Forest, 808 drum machine, 808 kick, claps, shaker, synthesizer, synth bass, Synth Drones, beautiful, peaceful, Ethereal, Natural, 122 BPM, Instrumental

Warm soft hug, comfort, low synths, twinkle, wind and leaves, ambient, peace, relaxed, water

Lofi hip hop beat, chillhop

Drum solo

Disco, Driving Drum Machine, Synthesizer, Bass, Piano, Guitars, Instrumental, Clubby, Euphoric, Chicago, New York, 115 BPM

Ambient house, new age, meditation, advertisement, 808 drum machine, 808 kick, claps, shaker, synthesizer, synth bass, soaring lead heavily reverbed, modern, sleek, beautiful, inspiring, futuristic

Calm meditation music to play in a spa lobby

3/4, in 3, 3 beat, guitar, drums, bright, happy, claps

Individual stems

You can also use Stable Audio to generate individual stems featuring a single instrument or group of instruments.

Just specify what you want in your prompt.

Electric guitar top line solo instrumental, no drums, Classic Rock, 105 BPM, Grade: Featured, Instruments: Guitar

Samba percussion

Drum solo

Sound effects

Stable Audio can also be used to generate sound effects.

Cars, fireworks, footsteps... describe what you want and see how the model does.



Car passing by

Fireworks, 44.1k high fidelity

Interface guide

A screenshot of our input audio interface

Input audio

Select one of you generated tracks to be used as input audio. This will populate the input audio area with your selected track.


Input strength

This slider controls how much of your selected audio is imposed on the final result.

Higher percentages means that less diffusion will be done on your audio, whereas lower percentages allow for more diffusion and thus more variation.

A screenshot of our steps interface


Informs the amount of generation steps used to create your audio track. Higher step count = greater processing. This can increase the quality of your audio slightly. We have found 50 to be the sweet spot.

A screenshot of our results interface

Number of results

This controls the number of generated tracks returned while generating. Only accessible on the pro plan—with a maximum number of 5 generations at a time.

Note: by inputing 4 here this will cost you 4 tracks when generating.

A screenshot of our seed interface


By default this input is set to ‘random’. By adding your own input (e.g. 222222) this informs the specific arrangement of noise used to generate your audio. When using a model that supports deterministic results, providing a seed alongside the other settings used to generate will result in that same audio being generated again.

Note: when generating multiple results remember to have the seed set to ‘random’.

A screenshot of our input strength interface

Prompt strength

Controls how closely the model attempts to guide the audio to your text prompt.


The AI model behind Stable Audio is a latent diffusion model for audio generation.

You can read more about the model architecture in our research blog post here.


As a free user, you can use audio generated via Stable Audio as a sample in your own music production (i.e. music track).

As a paid user, you can use it in your commercial media projects: videos, games, podcasts and more. This includes internal projects and external client projects.

You can’t train AI models on the generated audio. For more detail, see our Terms of Service.