FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 6 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 152 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 40 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 214
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 6 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 152 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 40 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 214
models
48
inference-optimization/granite-4.0-h-small-quantized.w8a8
Updated
inference-optimization/granite-4.0-h-small-NVFP4
Updated
inference-optimization/granite-4.0-h-small-quantized.w4a16
Updated
inference-optimization/granite-4.0-h-small-FP8-dynamic
Updated
inference-optimization/granite-4.0-h-small-FP8-block
Updated
inference-optimization/granite-4.0-h-tiny-quantized.w8a8
Updated
inference-optimization/granite-4.0-h-tiny-NVFP4
Updated
inference-optimization/granite-4.0-h-tiny-quantized.w4a16
Updated
inference-optimization/granite-4.0-h-tiny-FP8-dynamic
Updated
inference-optimization/granite-4.0-h-tiny-FP8-block
Updated
datasets
0
None public yet