> ## Documentation Index
> Fetch the complete documentation index at: https://dripart-mintlify-b90d3c69.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# WanDancerVideo - ComfyUI Built-in Node Documentation

> Complete documentation for the WanDancerVideo node in ComfyUI. Learn its inputs, outputs, parameters and usage.

The WanDancerVideo node prepares conditioning data and an empty latent tensor for video generation with the WanDancer model. It combines positive and negative conditioning with optional inputs like a starting image, mask, CLIP vision embeddings, and audio features to control the generated video.

## Inputs

| Parameter                | Description                                                                                                                  | Data Type              | Required | Range                            |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------- | -------- | -------------------------------- |
| `positive`               | The positive conditioning to guide video generation.                                                                         | CONDITIONING           | Yes      |                                  |
| `negative`               | The negative conditioning to guide video generation.                                                                         | CONDITIONING           | Yes      |                                  |
| `vae`                    | The VAE used to encode the start image into the latent space.                                                                | VAE                    | Yes      |                                  |
| `width`                  | The width of the generated video in pixels (default: 480).                                                                   | INT                    | Yes      | 16 to MAX\_RESOLUTION (step: 16) |
| `height`                 | The height of the generated video in pixels (default: 832).                                                                  | INT                    | Yes      | 16 to MAX\_RESOLUTION (step: 16) |
| `length`                 | The number of frames in the generated video. Should stay 149 for WanDancer (default: 149).                                   | INT                    | Yes      | 1 to MAX\_RESOLUTION (step: 4)   |
| `clip_vision_output`     | The CLIP vision embeddings for the first frame.                                                                              | CLIP\_VISION\_OUTPUT   | No       |                                  |
| `clip_vision_output_ref` | The CLIP vision embeddings for the reference image.                                                                          | CLIP\_VISION\_OUTPUT   | No       |                                  |
| `start_image`            | The initial image(s) to be encoded. Can be any number of frames, up to the specified `length`.                               | IMAGE                  | No       |                                  |
| `mask`                   | Image conditioning mask for the start image(s). White areas are kept, black areas are generated. Used for local generations. | MASK                   | No       |                                  |
| `audio_encoder_output`   | The output from an audio encoder, providing audio features, fps, and inject scale for audio-conditional generation.          | AUDIO\_ENCODER\_OUTPUT | No       |                                  |

**Note on Parameter Constraints:**

* The `start_image` and `mask` inputs are optional but can be used together. When `start_image` is provided, it is encoded and concatenated with the latent. If `mask` is also provided, it controls which parts of the start image are kept (white) and which are regenerated (black). If `mask` is not provided, the entire start image area is used as a conditioning guide.
* The `clip_vision_output` and `clip_vision_output_ref` inputs are optional and can be used together to provide visual context for the first frame and a reference image.
* The `audio_encoder_output` input is optional and provides audio features for audio-conditional generation.

## Outputs

| Output Name | Description                                                                                      | Data Type    |
| ----------- | ------------------------------------------------------------------------------------------------ | ------------ |
| `positive`  | The positive conditioning with any additional data (concat latent, CLIP vision, audio) attached. | CONDITIONING |
| `negative`  | The negative conditioning with any additional data (concat latent, CLIP vision, audio) attached. | CONDITIONING |
| `latent`    | An empty latent tensor with dimensions matching the specified video length, height, and width.   | LATENT       |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/WanDancerVideo/en.md)

***

**Source fingerprint (SHA-256):** `0a75b24c8e5c164d81b08eb438862d94d4409ece8dc22c126979347e2350c828`