Skip to main content
The TextEncodeAceStepAudio1.5 node prepares text and audio-related metadata for use with the AceStepAudio 1.5 model. It takes descriptive tags, lyrics, and musical parameters, then uses a CLIP model to convert them into a conditioning format suitable for audio generation.

Inputs

ParameterDescriptionData TypeRequiredRange
clipThe CLIP model used to tokenize and encode the input text.CLIPYesN/A
tagsDescriptive tags for the audio, such as genre, mood, or instruments. Supports multiline input and dynamic prompts.STRINGYesN/A
lyricsThe lyrics for the audio track. Supports multiline input and dynamic prompts.STRINGYesN/A
seedA random seed value for reproducible generation. Has a control_after_generate widget. Default: 0.INTNo0 to 18446744073709551615
bpmThe beats per minute (BPM) for the generated audio. Default: 120.INTNo10 to 300
durationThe desired duration of the audio in seconds. Default: 120.0.FLOATNo0.0 to 2000.0
timesignatureThe musical time signature.COMBONo"2"
"3"
"4"
"6"
languageThe language of the input text. Default: “en”.COMBONo"ar"
"az"
"bg"
"bn"
"ca"
"cs"
"da"
"de"
"el"
"en"
"es"
"fa"
"fi"
"fr"
"he"
"hi"
"hr"
"ht"
"hu"
"id"
"is"
"it"
"ja"
"ko"
"la"
"lt"
"ms"
"ne"
"nl"
"no"
"pa"
"pl"
"pt"
"ro"
"ru"
"sa"
"sk"
"sr"
"sv"
"sw"
"ta"
"te"
"th"
"tl"
"tr"
"uk"
"ur"
"vi"
"yue"
"zh"
"unknown"
keyscaleThe musical key and scale (major or minor).COMBONo"C major"
"C minor"
"C# major"
"C# minor"
"Db major"
"Db minor"
"D major"
"D minor"
"D# major"
"D# minor"
"Eb major"
"Eb minor"
"E major"
"E minor"
"F major"
"F minor"
"F# major"
"F# minor"
"Gb major"
"Gb minor"
"G major"
"G minor"
"G# major"
"G# minor"
"Ab major"
"Ab minor"
"A major"
"A minor"
"A# major"
"A# minor"
"Bb major"
"Bb minor"
"B major"
"B minor"
generate_audio_codesEnable the LLM that generates audio codes. This can be slow but will increase the quality of the generated audio. Turn this off if you are giving the model an audio reference. Default: True.BOOLEANNoN/A
cfg_scaleThe classifier-free guidance scale. Higher values make the output more closely follow the prompt. Default: 2.0.FLOATNo0.0 to 100.0
temperatureA sampling temperature. Lower values make the output more deterministic. Default: 0.85.FLOATNo0.0 to 2.0
top_pThe nucleus sampling probability (top-p). Default: 0.9.FLOATNo0.0 to 2000.0
top_kThe number of highest probability tokens to consider (top-k). Default: 0.INTNo0 to 100
min_pThe minimum probability threshold for token sampling (min-p). Default: 0.000.FLOATNo0.0 to 1.0

Outputs

Output NameDescriptionData Type
CONDITIONINGThe conditioning data, which contains the encoded text and audio parameters for the AceStepAudio 1.5 model.CONDITIONING
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): cf948180c3576cd484593f03e849c04857cfb57a198071123ed44ec7b5067521