Text Curation Tutorials
Hands-on tutorials for text curation workflows are available in the tutorials/text directory of the NeMo Curator GitHub repository.
Key Concepts for Tutorial Success
Before diving into the tutorials, familiarize yourself with these essential NeMo Curator concepts:
Pipeline Architecture
Core processing stages and pipeline concepts for text curation workflows data-structures distributed
Quality AssessmentScoring and filtering techniques used in tutorials heuristics classifiers
Data LoadingLoading data from various sources common-crawl custom-data
Distributed ClassificationGPU-accelerated classification concepts gpu scalable