Customer Support LLM (LoRA adapter)

LoRA adapter for meta-llama/Llama-3.2-3B-Instruct, fine-tuned on the Bitext customer support dataset using MLX-LM on Apple Silicon.

Source code, training scripts, evaluation pipeline: https://github.com/batuhne/customer-support-llm

Intended use

Drop-in adapter for English customer support chat over the 27 Bitext intents (cancel_order, track_order, check_invoice, recover_password, etc.). Outputs use {{Placeholder}} tokens that a downstream system is expected to substitute with real values.

Not for production use as is. Trained on a single public dataset, no RLHF, no safety tuning beyond the base model.

Results

Held-out test set, 297 stratified examples across all 27 intents.

Metric	Value
ROUGE-L	0.410
SacreBLEU	28.67
BERTScore F1	0.915
Placeholder score	0.643
Placeholder F1 (overall)	0.681

Comparison against earlier runs:

Run	ROUGE-L	SacreBLEU	BERTScore F1	Placeholder	n
2026-03-30 baseline	0.333	20.32	0.900	0.300	300
2026-05-12 postprocess fixes only	0.336	20.40	0.903	0.624	298
2026-05-12 retrain + all fixes	0.410	28.67	0.915	0.643	297

The big placeholder jump came from rebuilding INTENT_PLACEHOLDERS from the training data and adding data-driven footer injection; the retrain on top of that recovered ROUGE-L and BLEU.

Training config

Base: meta-llama/Llama-3.2-3B-Instruct
Method: LoRA via mlx_lm lora
Rank 16, alpha 32, dropout 0.05, scale 4.0
Target projections: q, k, v, o, gate, up, down (all 7)
num_layers: 16
Batch size 1, grad accumulation 8 (effective batch 8)
Learning rate 2e-4, 5000 iterations
Max sequence length 1024
--mask-prompt, --grad-checkpoint
Final train loss ~0.6, final val loss 0.630
~3.5 hours on M3 16 GB, peak memory 7.9 GB

Usage (MLX)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "meta-llama/Llama-3.2-3B-Instruct",
    adapter_path="batuhne/customer-support-llm",
)

messages = [
    {"role": "system", "content": "You are a customer support agent."},
    {"role": "user", "content": "I need to cancel order 12345."},
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

text = generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=256,
    sampler=make_sampler(temp=0.0),
)
print(text)

Greedy decoding (temp=0.0) is recommended for placeholder fidelity.

For best results, pair this with the postprocessing pipeline in the source repo, which strips out-of-intent placeholders and injects missing core placeholders.

Limitations

English only
Trained on Bitext templates; phrasing skews formal/synthetic
Placeholder coverage is uneven across intents: cancel_order, change_order, contact_human_agent, check_cancellation_fee, check_payment_methods, review, change_shipping_address reach F1 >= 0.95, but payment_issue, place_order, complaint, registration_problems, set_up_shipping_address, check_refund_policy, newsletter_subscription are still at F1 0.0
Apple Silicon (MLX) only as published; convert with mlx_lm.convert or merge into the base model for other backends

License

Llama 3.2 community license (inherited from base model).

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for batuhne/customer-support-llm

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(753)

this model

batuhne
/

customer-support-llm