Upgrade model card: badges, quick start, training details, collection table, citations

22ed853 verified about 1 month ago

6.31 kB

	---
	library_name: peft
	pipeline_tag: text-generation
	license: bigcode-openrail-m
	language:
	- code
	base_model:
	- bigcode/starcoder2-15b-instruct-v0.1
	tags:
	- securecode
	- security
	- owasp
	- code-generation
	- secure-coding
	- lora
	- qlora
	- vulnerability-detection
	- cybersecurity
	datasets:
	- scthornton/securecode
	model-index:
	- name: starcoder2-15b-securecode
	results: []
	---

	# StarCoder2 15B SecureCode

	[![Parameters](https://img.shields.io/badge/parameters-15B-blue.svg)](#model-details) [![Dataset](https://img.shields.io/badge/dataset-2,185_examples-green.svg)](https://huggingface.co/datasets/scthornton/securecode) [![OWASP](https://img.shields.io/badge/OWASP-Top_10_2021_+_LLM_Top_10-red.svg)](#security-coverage) [![Method](https://img.shields.io/badge/method-QLoRA-purple.svg)](#training-details) [![License](https://img.shields.io/badge/license-BigCode_OpenRAIL--M-orange.svg)](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement)

	Open-source flagship security-aware code generation model. Fine-tuned on 2,185 real-world vulnerability examples covering OWASP Top 10 2021 and OWASP LLM Top 10 2025.

	[Dataset](https://huggingface.co/datasets/scthornton/securecode) \| [Paper](https://huggingface.co/papers/2512.18542) \| [Model Collection](https://huggingface.co/collections/scthornton/securecode) \| [perfecXion.ai](https://perfecxion.ai) \| [Blog Post](https://huggingface.co/blog/scthornton/securecode-models)

	---

	## What This Model Does

	StarCoder2 15B SecureCode generates security-aware code by teaching the model to recognize vulnerability patterns and produce secure implementations. Every training example includes:

	- Real-world incident grounding — Tied to documented CVEs and breach reports
	- Vulnerable + secure implementations — Side-by-side comparison
	- Attack demonstrations — Concrete exploit code
	- Defense-in-depth guidance — SIEM rules, logging, monitoring, infrastructure hardening

	---

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [bigcode/starcoder2-15b-instruct-v0.1](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1) \|
	\| Parameters \| 15B \|
	\| Architecture \| GPT-2 (StarCoder2) \|
	\| Method \| QLoRA (4-bit quantization + LoRA) \|
	\| LoRA Rank \| 16 \|
	\| LoRA Alpha \| 32 \|
	\| Training Data \| [scthornton/securecode](https://huggingface.co/datasets/scthornton/securecode) (2,185 examples) \|
	\| Training Time \| ~1h 40min \|
	\| Hardware \| 2x NVIDIA A100 40GB (GCP) \|
	\| Framework \| PEFT 0.18.1, Transformers 5.1.0, PyTorch 2.7.1 \|

	---

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model + LoRA adapter
	base_model = AutoModelForCausalLM.from_pretrained(
	"bigcode/starcoder2-15b-instruct-v0.1",
	device_map="auto",
	load_in_4bit=True
	)
	model = PeftModel.from_pretrained(base_model, "scthornton/starcoder2-15b-securecode")
	tokenizer = AutoTokenizer.from_pretrained("scthornton/starcoder2-15b-securecode")

	# Generate secure code
	prompt = "Write a secure JWT authentication handler in Python with proper token validation"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## Training Details

	\| Hyperparameter \| Value \|
	\|----------------\|-------\|
	\| Learning Rate \| 2e-4 \|
	\| Batch Size \| 1 \|
	\| Gradient Accumulation \| 16 \|
	\| Epochs \| 3 \|
	\| Scheduler \| Cosine \|
	\| Warmup Steps \| 100 \|
	\| Optimizer \| paged_adamw_8bit \|
	\| Max Sequence Length \| 2048 \|

	### Dataset Breakdown

	\| Component \| Examples \| Coverage \|
	\|-----------\|----------\|----------\|
	\| Web Security (OWASP Top 10:2021) \| 1,378 \| 12 languages, 9 frameworks \|
	\| AI/ML Security (OWASP LLM Top 10:2025) \| 750 \| Prompt injection, RAG poisoning, model theft \|
	\| Framework-Specific Additions \| 219 \| Django, Flask, Express, Spring Boot, etc. \|
	\| Total \| 2,185 \| Complete OWASP coverage \|

	---

	## SecureCode Model Collection

	\| Model \| Parameters \| Base \| Training Time \| Link \|
	\|-------\|------------\|------\|---------------\|------\|
	\| Llama 3.2 3B \| 3B \| Meta Llama 3.2 \| 1h 5min \| [scthornton/llama-3.2-3b-securecode](https://huggingface.co/scthornton/llama-3.2-3b-securecode) \|
	\| Qwen Coder 7B \| 7B \| Qwen 2.5 Coder \| 1h 24min \| [scthornton/qwen-coder-7b-securecode](https://huggingface.co/scthornton/qwen-coder-7b-securecode) \|
	\| CodeGemma 7B \| 7B \| Google CodeGemma \| 1h 27min \| [scthornton/codegemma-7b-securecode](https://huggingface.co/scthornton/codegemma-7b-securecode) \|
	\| DeepSeek Coder 6.7B \| 6.7B \| DeepSeek Coder \| 1h 15min \| [scthornton/deepseek-coder-6.7b-securecode](https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode) \|
	\| CodeLlama 13B \| 13B \| Meta CodeLlama \| 1h 32min \| [scthornton/codellama-13b-securecode](https://huggingface.co/scthornton/codellama-13b-securecode) \|
	\| Qwen Coder 14B \| 14B \| Qwen 2.5 Coder \| 1h 19min \| [scthornton/qwen2.5-coder-14b-securecode](https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode) \|
	\| StarCoder2 15B \| 15B \| BigCode StarCoder2 \| 1h 40min \| This model \|
	\| Granite 20B \| 20B \| IBM Granite Code \| 1h 19min \| [scthornton/granite-20b-code-securecode](https://huggingface.co/scthornton/granite-20b-code-securecode) \|

	---

	## Citation

	```bibtex
	@misc{thornton2025securecode,
	title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models},
	author={Thornton, Scott},
	year={2025},
	publisher={perfecXion.ai},
	url={https://perfecxion.ai/articles/securecode-v2-dataset-paper.html},
	note={Model: https://huggingface.co/scthornton/starcoder2-15b-securecode}
	}
	```

	---

	## Links

	- Dataset: [scthornton/securecode](https://huggingface.co/datasets/scthornton/securecode) (2,185 examples)
	- Paper: [SecureCode v2.0](https://huggingface.co/papers/2512.18542)
	- Model Collection: [SecureCode Models](https://huggingface.co/collections/scthornton/securecode) (8 models)
	- Blog Post: [Training Security-Aware Code Models](https://huggingface.co/blog/scthornton/securecode-models)
	- Publisher: [perfecXion.ai](https://perfecxion.ai)

	---

	## License

	BigCode OpenRAIL-M