Multiple answers for a context

Hey
this is a very common with extractive QA.

RoBERTa (fine-tuned for extractive QA like SQuAD) is fundamentally trained to return one contiguous span per (question, context). So if your real intent is “given this context, return all matching spans (endorsement/code pairs)”, you’re trying to force a single-span head to do a multi-span extraction job — it will usually look “confused” because the supervision is contradictory (same input → different single-span labels).

There are basically 3 options:

  1. If you only need one answer at a time → make the question unambiguous

    • Example questions:
      • “What is the code for Endorsement 1?” → answer “code1”
      • “What is the code for Endorsement 2?” → answer “code2”
    • Same context is fine; the question must disambiguate.
  2. If you truly need multiple answers from one question → treat it as extraction / tagging, not classic QA

    • Best fit: token classification / sequence labeling (BIO tags) to mark all spans in the context, then collect the tagged spans.
    • This is the standard approach for “multi-span QA” in practice.
  3. If you want the model to output a list/JSON of all pairs → use a generative (seq2seq) model

    • e.g., T5/BART style: prompt like “Extract all endorsement-code pairs as JSON.”
    • This avoids the “single span” limitation entirely.

Small note:
“multiple answers” can also mean “multiple acceptable ground-truth strings for the same answer” (synonyms / aliases). That is supported by the SQuAD-style answers = { "text": [...], "answer_start": [...] } format — but the model still predicts one span; the multiple answers are mainly for evaluation and robustness, not for returning all spans.

Links:

If you share what your question looks like (and whether you need “all pairs” vs “one specific pair”), I can suggest what 1 of the 3 approaches above is best,
hope this helps, Liam