PRIMERA-BillSum-arXiv-PubMed (3-Stage Chain LoRA, bf16)

A LoRA adapter for allenai/PRIMERA trained via 3-stage sequential chain fine-tuning: BillSum → arXiv → PubMed.

Model Details

  • Base model: allenai/PRIMERA
  • Method: LoRA (Low-Rank Adaptation), bf16 precision (no quantization)
  • Chain order: BillSum → arXiv → PubMed
  • Language: English

Note: Earlier docs called this "QLoRA". 4-bit quantization caused NaN gradients with LED/Longformer in-place attention ops, so quantization was disabled. Training is standard LoRA in bf16.

Training Stages

Stage Dataset
1 BillSum
2 arXiv
3 PubMed

Hyperparameters

  • LoRA rank (r): 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Precision: bf16 (no quantization)
  • Target modules (LED/Longformer):
    • Encoder: query, key, value, query_global, key_global, value_global, output
    • Decoder: q_proj, k_proj, v_proj, out_proj
    • Feed-forward: fc1, fc2

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
import torch

tokenizer = AutoTokenizer.from_pretrained("xNoper/primera-billsum-arxiv-pubmed")
base = AutoModelForSeq2SeqLM.from_pretrained(
    "allenai/PRIMERA", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(base, "xNoper/primera-billsum-arxiv-pubmed")

Citation

If you use this model, please also cite the underlying base model:

@inproceedings{xiao-etal-2022-primera,
    title = "{PRIMERA}: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization",
    author = "Xiao, Wen and Beltagy, Iz and Carenini, Giuseppe and Cohan, Arman",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
}
Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xNoper/primera-billsum-arxiv-pubmed

Base model

allenai/PRIMERA
Adapter
(4)
this model

Datasets used to train xNoper/primera-billsum-arxiv-pubmed