tiiuae/falcon-refinedweb
Viewer • Updated • 968M • 22.1k • 911
How to use Austin207/Map-NEO with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Austin207/Map-NEO") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Austin207/Map-NEO", dtype="auto")How to use Austin207/Map-NEO with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Austin207/Map-NEO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Austin207/Map-NEO",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Austin207/Map-NEO
How to use Austin207/Map-NEO with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Austin207/Map-NEO" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Austin207/Map-NEO",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Austin207/Map-NEO" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Austin207/Map-NEO",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Austin207/Map-NEO with Docker Model Runner:
docker model run hf.co/Austin207/Map-NEO
MAP-NEO Mini is a 253M parameter autoregressive language model built from scratch with modern architectural improvements. It demonstrates that high-quality language models can be trained efficiently on modest hardware while achieving competitive performance through careful data curation and architectural choices.
tiiuae/falcon-refinedweb (curated subset)import torch
from transformers import AutoTokenizer
from model_neo import NeoMini, NeoMiniConfig
# Load model
config = NeoMiniConfig()
model = NeoMini(config)
checkpoint = torch.load("extended_context_model.pt")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Generate text
prompt = "The future of AI is"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(input_ids, max_length=100, temperature=0.8)
print(tokenizer.decode(output))
python interactive_chat.py
Intended Uses:
Out-of-Scope Uses:
[Antony Austin] - Model development and training [30/08/2025] - Model card creation
@misc{mapneo_mini_2025,
title={MAP-NEO Mini: An Efficient 253M Parameter Language Model},
author={[Antony Austin]},
year={2025},
howpublished={\url{https://huggingface.co/Austin207/Map-NEO}},
note={Trained on NVIDIA RTX 5070 Laptop GPU with RefinedWeb data}
}
Last Updated: August 30, 2025 Model Version: 1.0.0 Status: Base model (pre-conversational fine-tuning)