NeMo ASR Models

Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.

Model Selection

NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.

Example Model Usage

# Example using a test-verified model
example_model = "nvidia/parakeet-tdt-0.6b-v2"

# For production use, select appropriate models from:
# https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html

Basic Usage

Simple ASR Inference

from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
from nemo_curator.stages.resources import Resources

# Create ASR inference stage with a model from NeMo Framework
asr_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name",  # Select from NeMo Framework docs
    filepath_key="audio_filepath",
    pred_text_key="pred_text"
)

# Configure for GPU processing
asr_stage = asr_stage.with_(
    resources=Resources(gpus=1.0),
    batch_size=16
)

Custom Configuration

# Example with custom field names
custom_asr = InferenceAsrNemoStage(
    model_name="your_chosen_model_name",
    filepath_key="custom_audio_path",
    pred_text_key="transcription"
).with_(
    batch_size=32,
    resources=Resources(cpus=4.0, gpus=1.0)
)

Model Caching

Models are automatically downloaded and cached when first loaded:

# Models are cached automatically on first use
asr_stage = InferenceAsrNemoStage(model_name="your_chosen_model_name")

# The setup() method handles model downloading and caching
asr_stage.setup()

Resource Configuration

Configure GPU and CPU resources based on your hardware:

from nemo_curator.stages.resources import Resources

# Single GPU configuration
asr_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name"
).with_(
    resources=Resources(
        cpus=4.0,
        gpu_memory_gb=8.0  # Adjust based on your model's requirements
    ),
    batch_size=16
)

# Multi-GPU configuration
multi_gpu_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name"
).with_(
    resources=Resources(
        cpus=8.0,
        gpus=2.0  # Use 2 GPUs
    ),
    batch_size=32
)