Embeddings
Generate clip-level embeddings for search, question answering, filtering, and duplicate removal.
Use Cases
- Prepare semantic vectors for search, clustering, and near-duplicate detection.
- Score optional text prompts against clip content.
- Enable downstream filtering or retrieval tasks that need clip-level vectors.
Before You Start
- Create clips upstream. Refer to Clipping.
- Provide frames for embeddings or sample at the required rate. Refer to Frame Extraction.
- Access to model weights on each node (the stages download weights if missing).
Quickstart
Use the pipeline stages or the example script flags to generate clip-level embeddings.
Pipeline Stage
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
from nemo_curator.stages.video.embedding.cosmos_embed1 import (
CosmosEmbed1FrameCreationStage,
CosmosEmbed1EmbeddingStage,
)
pipe = Pipeline(name="video_embeddings_example")
pipe.add_stage(
ClipFrameExtractionStage(
extraction_policies=(FrameExtractionPolicy.sequence,),
extract_purposes=(FramePurpose.EMBEDDINGS,),
target_res=(-1, -1),
verbose=True,
)
)
pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
pipe.run()Script Flags
# Cosmos-Embed1 (224p)
python tutorials/video/getting-started/video_split_clip_example.py \
... \
--generate-embeddings \
--embedding-algorithm cosmos-embed1-224p \
--embedding-gpu-memory-gb 20.0Embedding Options
Cosmos-Embed1
-
Add
CosmosEmbed1FrameCreationStageto transform extracted frames into model-ready tensors.from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) frames = CosmosEmbed1FrameCreationStage( model_dir="/models", variant="224p", # or 336p, 448p target_fps=2.0, verbose=True, ) -
Add
CosmosEmbed1EmbeddingStageto generateclip.cosmos_embed1_embeddingand optionalclip.cosmos_embed1_text_match.embed = CosmosEmbed1EmbeddingStage( model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True, )
Parameters
CosmosEmbed1FrameCreationStage
| Parameter | Type | Default | Description |
|---|---|---|---|
model_dir | str | "models/cosmos_embed1" | Directory for model utilities and configs used to format input frames. |
variant | 448p | "336p" | Resolution preset that controls the model’s expected input size. |
target_fps | float | 2.0 | Source sampling rate used to select frames; may re-extract at higher FPS if needed. |
num_cpus | int | 3 | CPU cores used when on-the-fly re-extraction is required. |
verbose | bool | False | Log per-clip decisions and re-extraction messages. |
CosmosEmbed1EmbeddingStage
| Parameter | Type | Default | Description |
|---|---|---|---|
model_dir | str | "models/cosmos_embed1" | Directory for model weights; downloaded on each node if missing. |
variant | 448p | "336p" | Resolution preset used by the model weights. |
gpu_memory_gb | int | 20 | Approximate GPU memory reservation per worker. |
texts_to_verify | list[str] | None | None | Optional text prompts to score against the clip embedding. |
verbose | bool | False | Log setup and per-clip outcomes. |
Outputs
clip.cosmos_embed1_frames→ temporary tensors used by the embedding stageclip.cosmos_embed1_embedding→ final clip-level vector (NumPy array)- Optional:
clip.cosmos_embed1_text_match
Troubleshooting
- Not enough frames for embeddings: Increase
target_fpsduring frame extraction or adjust clip length so that the model receives the required number of frames. - Out of memory during embedding: Lower
gpu_memory_gb, reduce batch size if exposed, or use a smaller resolution variant. - Weights not found on node: Confirm
model_dirand network access. The stages download weights if missing.
Next Steps
- Use embeddings for duplicate removal. Refer to Duplicate Removal.
- Generate captions and previews for review workflows. Refer to Captions & Preview.