NVIDIANeMo Curator
Menu

Embeddings

Generate clip-level embeddings for search, question answering, filtering, and duplicate removal.

Use Cases

  • Prepare semantic vectors for search, clustering, and near-duplicate detection.
  • Score optional text prompts against clip content.
  • Enable downstream filtering or retrieval tasks that need clip-level vectors.

Before You Start

  • Create clips upstream. Refer to Clipping.
  • Provide frames for embeddings or sample at the required rate. Refer to Frame Extraction.
  • Access to model weights on each node (the stages download weights if missing).

Quickstart

Use the pipeline stages or the example script flags to generate clip-level embeddings.

Pipeline Stage

from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
from nemo_curator.stages.video.embedding.cosmos_embed1 import (
    CosmosEmbed1FrameCreationStage,
    CosmosEmbed1EmbeddingStage,
)

pipe = Pipeline(name="video_embeddings_example")
pipe.add_stage(
    ClipFrameExtractionStage(
        extraction_policies=(FrameExtractionPolicy.sequence,),
        extract_purposes=(FramePurpose.EMBEDDINGS,),
        target_res=(-1, -1),
        verbose=True,
    )
)
pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
pipe.run()

Script Flags

# Cosmos-Embed1 (224p)
python tutorials/video/getting-started/video_split_clip_example.py \
  ... \
  --generate-embeddings \
  --embedding-algorithm cosmos-embed1-224p \
  --embedding-gpu-memory-gb 20.0

Embedding Options

Cosmos-Embed1

  1. Add CosmosEmbed1FrameCreationStage to transform extracted frames into model-ready tensors.

    from nemo_curator.stages.video.embedding.cosmos_embed1 import (
        CosmosEmbed1FrameCreationStage,
        CosmosEmbed1EmbeddingStage,
    )
    
    frames = CosmosEmbed1FrameCreationStage(
        model_dir="/models",
        variant="224p",  # or 336p, 448p
        target_fps=2.0,
        verbose=True,
    )
  2. Add CosmosEmbed1EmbeddingStage to generate clip.cosmos_embed1_embedding and optional clip.cosmos_embed1_text_match.

    embed = CosmosEmbed1EmbeddingStage(
        model_dir="/models",
        variant="224p",
        gpu_memory_gb=20.0,
        verbose=True,
    )

Parameters

CosmosEmbed1FrameCreationStage

ParameterTypeDefaultDescription
model_dirstr"models/cosmos_embed1"Directory for model utilities and configs used to format input frames.
variant448p"336p"Resolution preset that controls the model’s expected input size.
target_fpsfloat2.0Source sampling rate used to select frames; may re-extract at higher FPS if needed.
num_cpusint3CPU cores used when on-the-fly re-extraction is required.
verboseboolFalseLog per-clip decisions and re-extraction messages.

CosmosEmbed1EmbeddingStage

ParameterTypeDefaultDescription
model_dirstr"models/cosmos_embed1"Directory for model weights; downloaded on each node if missing.
variant448p"336p"Resolution preset used by the model weights.
gpu_memory_gbint20Approximate GPU memory reservation per worker.
texts_to_verifylist[str] | NoneNoneOptional text prompts to score against the clip embedding.
verboseboolFalseLog setup and per-clip outcomes.

Outputs

  • clip.cosmos_embed1_frames → temporary tensors used by the embedding stage
  • clip.cosmos_embed1_embedding → final clip-level vector (NumPy array)
  • Optional: clip.cosmos_embed1_text_match

Troubleshooting

  • Not enough frames for embeddings: Increase target_fps during frame extraction or adjust clip length so that the model receives the required number of frames.
  • Out of memory during embedding: Lower gpu_memory_gb, reduce batch size if exposed, or use a smaller resolution variant.
  • Weights not found on node: Confirm model_dir and network access. The stages download weights if missing.

Next Steps