Text Generation
MLX
English
lora
customer-support
llama-3.2
apple-silicon

Customer Support LLM (LoRA adapter)

LoRA adapter for meta-llama/Llama-3.2-3B-Instruct, fine-tuned on the Bitext customer support dataset using MLX-LM on Apple Silicon.

Source code, training scripts, evaluation pipeline: https://github.com/batuhne/customer-support-llm

Intended use

Drop-in adapter for English customer support chat over the 27 Bitext intents (cancel_order, track_order, check_invoice, recover_password, etc.). Outputs use {{Placeholder}} tokens that a downstream system is expected to substitute with real values.

Not for production use as is. Trained on a single public dataset, no RLHF, no safety tuning beyond the base model.

Results

Held-out test set, 297 stratified examples across all 27 intents.

Metric Value
ROUGE-L 0.410
SacreBLEU 28.67
BERTScore F1 0.915
Placeholder score 0.643
Placeholder F1 (overall) 0.681

Comparison against earlier runs:

Run ROUGE-L SacreBLEU BERTScore F1 Placeholder n
2026-03-30 baseline 0.333 20.32 0.900 0.300 300
2026-05-12 postprocess fixes only 0.336 20.40 0.903 0.624 298
2026-05-12 retrain + all fixes 0.410 28.67 0.915 0.643 297

The big placeholder jump came from rebuilding INTENT_PLACEHOLDERS from the training data and adding data-driven footer injection; the retrain on top of that recovered ROUGE-L and BLEU.

Training config

  • Base: meta-llama/Llama-3.2-3B-Instruct
  • Method: LoRA via mlx_lm lora
  • Rank 16, alpha 32, dropout 0.05, scale 4.0
  • Target projections: q, k, v, o, gate, up, down (all 7)
  • num_layers: 16
  • Batch size 1, grad accumulation 8 (effective batch 8)
  • Learning rate 2e-4, 5000 iterations
  • Max sequence length 1024
  • --mask-prompt, --grad-checkpoint
  • Final train loss ~0.6, final val loss 0.630
  • ~3.5 hours on M3 16 GB, peak memory 7.9 GB

Usage (MLX)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "meta-llama/Llama-3.2-3B-Instruct",
    adapter_path="batuhne/customer-support-llm",
)

messages = [
    {"role": "system", "content": "You are a customer support agent."},
    {"role": "user", "content": "I need to cancel order 12345."},
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True,
)

text = generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=256,
    sampler=make_sampler(temp=0.0),
)
print(text)

Greedy decoding (temp=0.0) is recommended for placeholder fidelity.

For best results, pair this with the postprocessing pipeline in the source repo, which strips out-of-intent placeholders and injects missing core placeholders.

Limitations

  • English only
  • Trained on Bitext templates; phrasing skews formal/synthetic
  • Placeholder coverage is uneven across intents: cancel_order, change_order, contact_human_agent, check_cancellation_fee, check_payment_methods, review, change_shipping_address reach F1 >= 0.95, but payment_issue, place_order, complaint, registration_problems, set_up_shipping_address, check_refund_policy, newsletter_subscription are still at F1 0.0
  • Apple Silicon (MLX) only as published; convert with mlx_lm.convert or merge into the base model for other backends

License

Llama 3.2 community license (inherited from base model).

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for batuhne/customer-support-llm

Adapter
(753)
this model

Dataset used to train batuhne/customer-support-llm