PaperAudit Qwen3 8B (SFT + RL)

Model Overview

PaperAudit_Qwen3_8B_sft_rl is a medium-scale model specifically trained for academic paper error detection and automated review tasks. This model is based on Qwen3 8B and has been optimized through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

Model Information

  • Base Model: Qwen3 8B
  • Model Parameters: ~8 billion parameters
  • Training Method: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RLHF)
  • Model Architecture: Qwen3ForCausalLM
  • Context Length: 40,960 tokens
  • Data Type: bfloat16

Model Features

  • Balanced Performance: 8B parameter scale, achieving a good balance between performance and efficiency
  • Specialized Optimization: Specifically optimized for academic paper error detection and review tasks
  • Reinforcement Learning: Aligned with human preferences through RLHF to improve review quality and error detection accuracy
  • Long Context Support: Supports 40K tokens context length, suitable for processing complete academic papers

Training Data

This model is trained on PaperAudit_Dataset. The dataset includes:

  • Academic papers downloaded from OpenReview
  • Structured paper content (processed via LlamaParse and LLM)
  • Synthetic error data for training error detection models
  • Human review feedback data

For more details about the dataset, please visit: https://huggingface.co/datasets/mayiwen/PaperAudit_Dataset

Usage

Install Dependencies

pip install transformers torch accelerate

Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "./qwen3_8b_sft_rl"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Inference Example

# Prepare input (paper error detection task)
prompt = """Please detect errors in the following academic paper paragraph:

[Paper content...]

Please identify errors and provide correction suggestions."""

# Encode input
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Application Scenarios

  • Academic paper error detection
  • Automated paper review
  • Academic writing quality assessment
  • Paper content analysis and feedback generation
  • Academic review assistant tools

Model Architecture Details

  • Hidden Size: 4096
  • Intermediate Size: 12288
  • Number of Attention Heads: 32
  • Number of Key-Value Heads: 8 (Grouped Query Attention)
  • Number of Hidden Layers: 36
  • Vocabulary Size: 151,936

Performance Advantages

Compared to the 3B model, the 8B model performs better in handling complex paper analysis tasks, with the ability to:

  • More accurately identify subtle academic errors
  • Provide more detailed and professional review comments
  • Better understand academic writing norms and standards

Notes

  • This model is specifically optimized for academic paper review tasks and may require further fine-tuning for other domains
  • It is recommended to use bfloat16 precision to save memory and improve inference speed
  • For long document processing, appropriate context window management strategies are recommended
  • Requires at least 16GB GPU memory for inference

Related Resources

  • Training Dataset: PaperAudit_Dataset
  • PaperAudit Project: For more details, please refer to the PaperAudit project documentation

License

Please refer to the license terms of the base model Qwen3.

Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mayiwen/PaperAudit_Qwen3_8B_sft_rl

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(913)
this model

Dataset used to train mayiwen/PaperAudit_Qwen3_8B_sft_rl