Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.14444

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models.

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

Datasets Pretraining - Nemotron V3

Mamba/Transformers Combo Hybride

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

Nemotron v3 Pre-Training

Large scale pre-training datasets used in the Nemotron family of models.

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 152
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 8

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14, 2025 • 28
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14, 2025 • 21
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Paper • 2508.08974 • Published Aug 12, 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 43

NVIDIA Nemotron Pre-Training - Foundation Model Data

NVIDIA Nemotron pre-training datasets for large language model training and foundation model development

nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-CC-v2

Viewer • Updated Dec 23, 2025 • 8.79B • 24.7k • 105
nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17

Tiny LLMs and Datasets

A Benchmark for Learning to Translate a New Language from One Grammar Book

Paper • 2309.16575 • Published Sep 28, 2023 • 1
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Paper • 2409.19151 • Published Sep 27, 2024
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 39
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 43
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7, 2025 • 67
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 273

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.48k • 551
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 14.6k • 444
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 44
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models.

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

NVIDIA Nemotron Pre-Training - Foundation Model Data

NVIDIA Nemotron pre-training datasets for large language model training and foundation model development

nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-CC-v2

Viewer • Updated Dec 23, 2025 • 8.79B • 24.7k • 105
nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17

Datasets Pretraining - Nemotron V3

Mamba/Transformers Combo Hybride

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

Tiny LLMs and Datasets

A Benchmark for Learning to Translate a New Language from One Grammar Book

Paper • 2309.16575 • Published Sep 28, 2023 • 1
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Paper • 2409.19151 • Published Sep 27, 2024
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 39
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

Nemotron v3 Pre-Training

Large scale pre-training datasets used in the Nemotron family of models.

nvidia/Nemotron-Pretraining-Dataset-sample

Viewer • Updated Dec 22, 2025 • 27.7k • 611 • 40
nvidia/Nemotron-CC-Code-v1

Viewer • Updated Dec 22, 2025 • 216M • 1.23k • 17
nvidia/Nemotron-CC-v2.1

Viewer • Updated Dec 22, 2025 • 3.8B • 11k • 108
nvidia/Nemotron-Pretraining-Code-v2

Viewer • Updated Dec 22, 2025 • 836M • 2.08k • 106

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 43
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7, 2025 • 67
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 273

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 152
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 8

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.48k • 551
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 14.6k • 444
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14, 2025 • 28
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14, 2025 • 21
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Paper • 2508.08974 • Published Aug 12, 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 43

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 44
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs