Frame Extraction
Extract frames from clips or full videos at target rates and resolutions. Use frames for embeddings (such as Cosmos‑Embed1), aesthetic filtering, previews, and custom analysis.
Use Cases
- Prepare inputs for embedding models that expect frame sequences.
- Run aesthetic filtering that operates on sampled frames.
- Generate lightweight previews or QA snapshots.
- Provide frames for scene-change detection before clipping (TransNetV2).
Before You Start
If you need saved media files, frame extraction is optional. Embeddings and aesthetic filtering require frames.
Quickstart
Use the pipeline stages or the example script flags to extract frames for embeddings, filtering, and analysis.
Pipeline Stage
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage
from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage
from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
from nemo_curator.stages.video.embedding.cosmos_embed1 import (
CosmosEmbed1FrameCreationStage,
CosmosEmbed1EmbeddingStage,
)
pipe = Pipeline(name="clip_frames_embeddings")
pipe.add_stage(FixedStrideExtractorStage(clip_len_s=10.0, clip_stride_s=10.0))
pipe.add_stage(
ClipFrameExtractionStage(
extraction_policies=(FrameExtractionPolicy.sequence,),
extract_purposes=(FramePurpose.EMBEDDINGS,),
target_res=(-1, -1),
verbose=True,
)
)
pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True))
pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True))
pipe.run()Script Flags
# Clip frames implicitly when generating embeddings or aesthetics
python tutorials/video/getting-started/video_split_clip_example.py \
... \
--generate-embeddings \
--clip-extraction-target-res -1
# Full-video frames for TransNetV2 scene change
python tutorials/video/getting-started/video_split_clip_example.py \
... \
--splitting-algorithm transnetv2 \
--transnetv2-frame-decoder-mode pynvcOptions in NeMo Curator
NeMo Curator provides two complementary stages:
ClipFrameExtractionStage: Extracts frames from already‑split clips. Supports several target FPS values and computes an LCM rate to reduce decode work.VideoFrameExtractionStage: Extracts frames from full videos (for example, before scene‑change detection). Supports PyNvCodec (NVDEC) orffmpegCPU/GPU decode.
Extract Frames
From Clips
from nemo_curator.stages.video.clipping.clip_frame_extraction import (
ClipFrameExtractionStage,
)
from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose
extract_frames = ClipFrameExtractionStage(
extraction_policies=(FrameExtractionPolicy.sequence,),
extract_purposes=(FramePurpose.EMBEDDINGS,), # sets default FPS if target_fps not provided
target_res=(-1, -1), # keep original resolution
# target_fps=[1, 2], # optional: override with explicit FPS values
verbose=True,
)From Full Videos (Scene Change)
from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrameExtractionStage
frame_extractor = VideoFrameExtractionStage(
decoder_mode="pynvc", # or "ffmpeg_gpu", "ffmpeg_cpu"
output_hw=(27, 48), # (height, width) for frame extraction
pyncv_batch_size=64, # batch size for PyNvCodec
verbose=True,
)Parameters
| Parameter | Description |
|---|---|
extraction_policies | Frame selection strategy. Use sequence for uniform sampling. middle selects a single middle frame. |
target_fps | For clips: sampling rate in frames per second. If you provide several integer values, the stage uses LCM sampling. |
extract_purposes | Shortcut that sets default FPS for specific purposes (such as embeddings). You can still pass target_fps to override. |
target_res | Output frame resolution (height, width). Use (-1, -1) to keep original. |
num_cpus | Number of CPU cores for frame extraction. Default: 3. |
decoder_mode | For full‑video extraction: pynvc (NVDEC), ffmpeg_gpu, or ffmpeg_cpu. |
output_hw | For full‑video extraction: (height, width) tuple for frame dimensions. Default: (27, 48). |
pyncv_batch_size | For full‑video extraction: batch size for PyNvCodec processing. Default: 64. |
LCM Sampling for Several FPS Values
If you provide several integer target_fps values (such as 1 and 2), the clip stage decodes once at the LCM rate and then samples every k‑th frame to produce each target rate. This reduces decode cost.
ClipFrameExtractionStage(
extraction_policies=(FrameExtractionPolicy.sequence,),
target_fps=[1, 2], # LCM = 2; decode once at 2 FPS, then subsample
)
Hardware and Performance
- Prefer
pynvc(NVDEC) orffmpeg_gpufor high throughput when GPU hardware is available; otherwise useffmpeg_cpu. - Use batching where applicable and track worker resource use.
- Keep resolution modest if memory limits apply; set
target_reswhen needed.
Downstream Dependencies
- Embeddings: Cosmos‑Embed1 expects frames at specific rates. Refer to Embeddings.
- Aesthetic Filtering: Requires frames extracted earlier. Refer to Filtering.
- Clipping with TransNetV2: Uses full‑video frame extraction before scene‑change detection. Refer to Clipping.
Troubleshooting
- “Frame extraction failed”: Check decoder mode and availability; confirm
ffmpegand drivers for GPU modes. - Not enough frames for embeddings: Increase
target_fpsor adjust clip length; certain embedding stages can re‑extract at a higher rate when needed.