PRIMERA-BillSum-arXiv-PubMed (3-Stage Chain LoRA, bf16)
A LoRA adapter for allenai/PRIMERA trained via 3-stage sequential chain fine-tuning: BillSum → arXiv → PubMed.
Model Details
- Base model: allenai/PRIMERA
- Method: LoRA (Low-Rank Adaptation), bf16 precision (no quantization)
- Chain order: BillSum → arXiv → PubMed
- Language: English
Note: Earlier docs called this "QLoRA". 4-bit quantization caused NaN gradients with LED/Longformer in-place attention ops, so quantization was disabled. Training is standard LoRA in bf16.
Training Stages
| Stage | Dataset |
|---|---|
| 1 | BillSum |
| 2 | arXiv |
| 3 | PubMed |
Hyperparameters
- LoRA rank (r): 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Precision: bf16 (no quantization)
- Target modules (LED/Longformer):
- Encoder: query, key, value, query_global, key_global, value_global, output
- Decoder: q_proj, k_proj, v_proj, out_proj
- Feed-forward: fc1, fc2
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
import torch
tokenizer = AutoTokenizer.from_pretrained("xNoper/primera-billsum-arxiv-pubmed")
base = AutoModelForSeq2SeqLM.from_pretrained(
"allenai/PRIMERA", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(base, "xNoper/primera-billsum-arxiv-pubmed")
Citation
If you use this model, please also cite the underlying base model:
@inproceedings{xiao-etal-2022-primera,
title = "{PRIMERA}: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization",
author = "Xiao, Wen and Beltagy, Iz and Carenini, Giuseppe and Cohan, Arman",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
year = "2022",
}
- Downloads last month
- 30
Model tree for xNoper/primera-billsum-arxiv-pubmed
Base model
allenai/PRIMERA