frreiss commited on
Commit
e76b4fb
·
1 Parent(s): 8f2dc00
Files changed (32) hide show
  1. .gitattributes +3 -0
  2. .gitignore +3 -0
  3. answerability/README.md +186 -0
  4. answerability/gpt-oss-20b/lora/adapter_config.json +45 -0
  5. answerability/gpt-oss-20b/lora/adapter_model.safetensors +3 -0
  6. answerability/gpt-oss-20b/lora/io.yaml +27 -0
  7. citations/README.md +216 -0
  8. citations/gpt-oss-20b/lora/adapter_config.json +36 -0
  9. citations/gpt-oss-20b/lora/adapter_model.safetensors +3 -0
  10. citations/gpt-oss-20b/lora/chat_template.jinja +331 -0
  11. citations/gpt-oss-20b/lora/io.yaml +97 -0
  12. citations/gpt-oss-20b/lora/special_tokens_map.json +23 -0
  13. citations/gpt-oss-20b/lora/tokenizer.json +3 -0
  14. citations/gpt-oss-20b/lora/tokenizer_config.json +185 -0
  15. hallucination_detection/README.md +117 -0
  16. hallucination_detection/gpt-oss-20b/lora/README.md +202 -0
  17. hallucination_detection/gpt-oss-20b/lora/adapter_config.json +36 -0
  18. hallucination_detection/gpt-oss-20b/lora/adapter_model.safetensors +3 -0
  19. hallucination_detection/gpt-oss-20b/lora/chat_template.jinja +331 -0
  20. hallucination_detection/gpt-oss-20b/lora/io.yaml +81 -0
  21. hallucination_detection/gpt-oss-20b/lora/special_tokens_map.json +23 -0
  22. hallucination_detection/gpt-oss-20b/lora/tokenizer.json +3 -0
  23. hallucination_detection/gpt-oss-20b/lora/tokenizer_config.json +185 -0
  24. query_rewrite/README.md +0 -0
  25. query_rewrite/gpt-oss-20b/lora/adapter_config.json +45 -0
  26. query_rewrite/gpt-oss-20b/lora/adapter_model.safetensors +3 -0
  27. query_rewrite/gpt-oss-20b/lora/chat_template.jinja +397 -0
  28. query_rewrite/gpt-oss-20b/lora/io.yaml +22 -0
  29. query_rewrite/gpt-oss-20b/lora/special_tokens_map.json +17 -0
  30. query_rewrite/gpt-oss-20b/lora/tokenizer.json +3 -0
  31. query_rewrite/gpt-oss-20b/lora/tokenizer_config.json +184 -0
  32. run_vllm.sh +45 -0
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ citations/gpt-oss-20b/lora/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ hallucination_detection/gpt-oss-20b/lora/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ query_rewrite/gpt-oss-20b/lora/tokenizer.json filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ **/.DS_Store
2
+ **/*.swp
3
+
answerability/README.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Answerability Classification
11
+
12
+ ## Model Summary
13
+ This is a RAG-specific family of intrinsics fine-tuned for binary answerability
14
+ classification task. The model takes as input a multi-turn conversation and a
15
+ set of documents, and classifies whether the user's final query is answerable or
16
+ unanswerable based on the available information in the documents.
17
+
18
+ We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
19
+ Granite-3.3-2b-instruct, Granite-3.3-8b-instruct, and GPT-OSS 20b.
20
+
21
+ - **Developer:** IBM Research
22
+ - **Model type:** LoRA and aLoRA adapter for
23
+ [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
24
+ [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct),
25
+ and [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)
26
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
27
+
28
+ ## Intended use
29
+ This is a family of intrinsincs that enables answerability classification for
30
+ the final user query in a multi-turn conversation, with respect to a set of
31
+ provided documents. The model is trained to determine whether the last user
32
+ query is answerable or unanswerable, based solely on the information present in
33
+ the documents. This makes it suitable for applications involving RAG and
34
+ document-grounded chatbots, where knowing whether sufficient information exists
35
+ to answer a query is crucial. The classification output from the answerability
36
+ model can be used in several downstream applications, including but not limited
37
+ to:
38
+ - Filter out unanswerable questions before sending them to generation in RAG
39
+ setting. By classifying a query as unanswerable upfront, the system can prevent
40
+ hallucinated or misleading responses.
41
+ - Re-query the retriever to get more
42
+ relevant documents. If a query is initially deemed unanswerable, the retriever
43
+ can be re-invoked with alternate formulations to fetch more relevant documents.
44
+
45
+ **Model input**: The input to the answerability intrinsic is an
46
+ OpenAI-compatible chat completion request, containing a list of conversation
47
+ turns that can alternate between the `user` and `assistant` role and ending with
48
+ a `user` turn, as well as list of documents.
49
+
50
+ **Model output**: The output of the answerability intrinsic is the result of the
51
+ original chat completion request formatted as a JSON object containing the
52
+ answerability likelihood score.
53
+
54
+ Please see the code snippets in the Quickstart Example section below for
55
+ examples that illustrate the intrinsic's input/output.
56
+
57
+ ## Quickstart Example
58
+
59
+ The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework.
60
+ Here is some example code for calling this intrinsic from Mellea:
61
+ ```
62
+ from mellea.backends.huggingface import LocalHFBackend
63
+ from mellea.stdlib.base import ChatContext, Document
64
+ from mellea.stdlib.chat import Message
65
+ from mellea.stdlib.intrinsics import rag
66
+
67
+
68
+ backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-2b-instruct")
69
+ context = ChatContext().add(Message("assistant", "Hello there, how can I help you?"))
70
+ next_user_turn = "What is the square root of 4?"
71
+ documents_answerable = [Document("The square root of 4 is 2.")]
72
+ documents_unanswerable = [Document("The square root of 8 is not 2.")]
73
+
74
+ result = rag.check_answerability(next_user_turn, documents_answerable, context, backend)
75
+ print(f"Result of answerability check when answer is in documents: {result}")
76
+
77
+ result = rag.check_answerability(
78
+ next_user_turn, documents_unanswerable, context, backend
79
+ )
80
+ print(f"Result of answerability check when answer is not in documents: {result}")
81
+ ```
82
+
83
+
84
+ ## Training Details
85
+
86
+ ### Training Data
87
+
88
+ The training data uses the publicly available Government corpus from
89
+ [MT-RAG](https://arxiv.org/pdf/2501.03468) as the source of documents. Based on
90
+ this corpus, we constructed a dataset consisting of a mix of human-created and
91
+ synthetically generated multi-turn conversations. It includes two types of
92
+ examples: (1) Answerable queries, where the final user question can be answered
93
+ based on the provided documents. These examples teach the adapter to recognize
94
+ when sufficient information is present to support an answer. (2) Unanswerable
95
+ queries, where the documents lack the necessary information to answer the final
96
+ user query. We used Mixtral as an automatic judge to validate the answerability
97
+ labels and filter out noisy samples.
98
+
99
+ #### Training Hyperparameters
100
+
101
+ The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
102
+ 32, learning rate = 5e-6, number of epochs = 25, with early stopping based on
103
+ validation set, and 90/10 split between training and validation.
104
+
105
+ ## Evaluation
106
+
107
+ ### Answerability Classification
108
+
109
+ We evaluated the model on binary answerability classification using MT-RAG
110
+ Benchmark. In this setting, the model is given the full multi-turn conversation
111
+ history along with the supporting documents. This benchmark evaluates the
112
+ model's ability to assess answerability when the final user query can also
113
+ depend on prior turns for context. The following table presents results
114
+ comparing baselines and frontier models with task-specific answerability
115
+ intrinsics on the answerability classification task on MT-RAG data. The LoRAs
116
+ consistently outperform frontier models, converging near \~90% accuracy
117
+ regardless of base model size. Even small models like Granite 3.3-2B, once
118
+ fine-tuned, match or surpass much larger models, including GPT-4o. The
119
+ difference between LoRA and aLoRA is minimal, indicating both are effective
120
+ fine-tuning strategies.
121
+
122
+ | | Models | Unanswerable F1 | Answerable F1 | Classification Accuracy | Weighted F1 |
123
+ |:--------------------------------------------:|:----------------------------------------------:|:--------------------------:|:---------------------------:|:-------------------------------------:|:-------------------------:|
124
+ | Baselines | BigBird (pre-trained embeddings) w/ MLP | 73.4 | 65.2 | 69.8 | 69.6 |
125
+ | | llama2-7b as classifier (Full SFT) | 88.2 | 85.9 | 87.1 | 87.1 |
126
+ | Frontier Models out-of-the-box | Granite 3.3-2b-instruct | 48.7 | 70.4 | 62.4 | 58.7 |
127
+ | | Granite 3.3-8b-instruct | 62.8 | 65.2 | 64.5 | 63.9 |
128
+ | | GPT-OSS-20b | 77.3 | 58.3 | 70.7 | 68.5 |
129
+ | | GPT-OSS-120b | 70.2 | 68.9 | 69.8 | 69.6 |
130
+ | | GPT4o-mini | 82.7 | 78.1 | 80.8 | 80.6 |
131
+ | | GPT4o | 85.7 | 77.5 | 82.5 | 81.9 |
132
+ | Trained LoRAs/aLoRAs | Granite 3.3-2b LoRA | 91.2 | 89.6 | 90.4 | 90.5 |
133
+ | | Granite 3.3-8b LoRA | 91.1 | 90.3 | 90.6 | 90.7 |
134
+ | | GPT-OSS-20b LoRA | 91.6 | 89.8 | 90.8 | 90.8 |
135
+ | | Granite 3.3-2b aLoRA | 89.8 | 88.6 | 89.1 | 89.2 |
136
+ | | Granite 3.3-8b aLoRA | 90.1 | 89.6 | 89.5 | 89.9 |
137
+ | | GPT-OSS-20b aLoRA | 90.4 | 88.6 | 89.6 | 89.6 |
138
+
139
+
140
+ ### Comparing the Answerability Intrinsics vs. Vanilla Granite Models for Answer Quality
141
+
142
+ We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
143
+ vs. answerability intrinsics implemented as LoRA adapters on a subset of MT-RAG
144
+ Benchmark. In this setup, each query is paired with only 5 retrieved passages as
145
+ context.
146
+
147
+ - Answerability Classification Performance: The answerability intrinsics
148
+ outperform the vanilla model in overall F1 on both answerables and
149
+ unanswerables. The answerability intrinsics achieves higher recall on
150
+ unanswerable queries, making it better at identifying questions that should
151
+ not be answered. However, this comes at the cost of lower recall on answerable
152
+ queries.
153
+
154
+ - Joint Answerability-Faithfulness Score computed as: \> = 1 (if model
155
+ prediction = IDK/unanswerable ∩ ground truth = unanswerable)
156
+
157
+ > = RAGAS Faithfulness (if model prediction = non-IDK/answerable ∩ ground
158
+ > truth = answerable)
159
+
160
+ > = 0 (otherwise)
161
+
162
+ This score rewards the model for correctly abstaining on unanswerable queries
163
+ (full credit) and for providing faithful answers on answerable queries
164
+ (partial credit based on RAGAS Faithfulness). No credit is given for incorrect
165
+ or unfaithful predictions.
166
+
167
+ The answerability intrinsics for granite-2b and granite-8b achieves 8% and 13%
168
+ lifts on this metric respectively. This rewards the model for correctly
169
+ abstaining on unanswerable queries and for being faithful when it chooses to
170
+ answer.
171
+
172
+
173
+ | | F1 Score Unanswerable | F1 Score Answerable | Recall Unanswerable | Recall Answerable | Joint Answerability- Faithfulness Score |
174
+ |:-----------------------:|:---------------------:|:-------------------:|:-------------------:|:-----------------:|:---------------------------------------:|
175
+ | Granite 3.3-2b Instruct | 13 | 77 | 7 | 99 | 48 |
176
+ | Granite 3.3-2b LoRA | 48 | 78 | 37 | 89 | 56 |
177
+ | Granite 3.3-8b Instruct | 17 | 77 | 10 | 99 | 49 |
178
+ | Granite 3.3-8b LoRA | 65 | 81 | 60 | 86 | 62 |
179
+
180
+ ## Model Card Authors
181
+
182
+ [Vraj Shah](mailto:vraj@ibm.com)
183
+
184
+ ### Framework versions
185
+
186
+ - PEFT 0.14.0
answerability/gpt-oss-20b/lora/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openai/gpt-oss-20b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "q_proj",
30
+ "k_proj"
31
+ ],
32
+ "target_parameters": [
33
+ "7.mlp.experts.gate_up_proj",
34
+ "7.mlp.experts.down_proj",
35
+ "15.mlp.experts.gate_up_proj",
36
+ "15.mlp.experts.down_proj",
37
+ "23.mlp.experts.gate_up_proj",
38
+ "23.mlp.experts.down_proj"
39
+ ],
40
+ "task_type": "CAUSAL_LM",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
answerability/gpt-oss-20b/lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ee17447d06e0327bed4cd6811da1fa2607520285bb35c712bccb8d6d7f9e772
3
+ size 219238968
answerability/gpt-oss-20b/lora/io.yaml ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model name string, or null to use whatever is provided in the chat completion request.
2
+ model: ~
3
+ # JSON schema of the model's output
4
+ response_format: |
5
+ {
6
+ "type": "string",
7
+ "enum": ["answerable", "unanswerable"]
8
+ }
9
+ transformations:
10
+ # Convert categorical answer to continuous value by decoding logprobs
11
+ - type: likelihood
12
+ categories_to_values:
13
+ "answerable": 1.0
14
+ "unanswerable": 0.0
15
+ input_path: []
16
+ # Convert scalar value to a record for consistency with other intrinsics
17
+ - type: nest
18
+ input_path: []
19
+ field_name: "answerability_likelihood"
20
+ instruction: ~
21
+ parameters:
22
+ # "unanswerable" can be 6 tokens at high temperatures
23
+ max_completion_tokens: 6
24
+ # No sentence boundary detection
25
+ sentence_boundaries: ~
26
+ # RAG documents go in first message
27
+ docs_as_message: string
citations/README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Citation Generation
11
+
12
+ ## Model Summary
13
+
14
+ This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
15
+ 1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
16
+ 2. **Post-hoc citation generation:** Since the intrinsic takes the assistant response as input, it can generate citations for responses generated by any LLM. Pick your favorite LLM and use the intrinsic to generate post-hoc citations!
17
+
18
+ We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
19
+
20
+ </br>
21
+
22
+ - **Developer:** IBM Research
23
+ - **Model type:** LoRA adapter for [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) and [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
24
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
25
+
26
+ ## Intended use
27
+ This is a family of citation generation intrinsics that give the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. They can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
28
+
29
+ > [!TIP]
30
+ > Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
31
+
32
+ **Intrinsic input**: The input to the citation generation intrinsic is an OpenAI-compatible chat completion request, containing a list of conversation turns ending with the assistant response for which the citations should be generated as well as the list of documents from which the citations should be drawn. Please see the code snippets in the Quickstart Example section below for examples on how to specify the chat completion request as a JSON object.
33
+
34
+ **Intrinsic output**: The output of the citation generation intrinsic is formatted as the result of the original chat completion request containing the citations for the last assistant response. The citations are provided in the form of a JSON array, whose items include the text and begin/end of a response span together with the text, document id and begin/end of a document span that serves as a citation for the response span. When there are more than one document spans that serve as citations for a single response span, they are represented as separate objects in the JSON array.
35
+
36
+ **Going from input to output**: When calling the intrinsic through granite-common one should follow the steps below to transform the intrinsic input to the corresponding output. These steps are also exemplified in the code snippets included in the Quickstart Example section below. Given an input chat completion request, the request should be passed to the corresponding input processor (also referred to as IntrinsicsRewriter) provided by granite-common. The input processor converts the request to the appropriate format expected by the underlying citation generation model. This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction. The input processor's result should then be passed to the underlying citation generation model for inference. The model generates citations using a compact representation consisting of sentence IDs in the last assistant response and documents. This output should finally be passed to the appropriate output processor (also referred to as IntrinsicsResultProcessor) provided by granite-common. The output processor converts the low-level raw model output to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly format ready for consumption by downstream applications.
37
+
38
+ ## Quickstart Example
39
+
40
+ The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework.
41
+ Here is some example code for calling this intrinsic from Mellea:
42
+ ```
43
+ from mellea.backends.huggingface import LocalHFBackend
44
+ from mellea.stdlib.base import ChatContext, Document
45
+ from mellea.stdlib.chat import Message
46
+ from mellea.stdlib.intrinsics import rag
47
+ import json
48
+
49
+
50
+ backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-2b-instruct")
51
+ context = ChatContext().add(
52
+ Message(
53
+ "user",
54
+ "How does Murdoch's expansion in Australia compare to his expansion "
55
+ "in New Zealand?",
56
+ )
57
+ )
58
+ assistant_response = (
59
+ "Murdoch expanded in Australia and New Zealand by acquiring and expanding local "
60
+ "newspapers. I do not have information about his expansion in New Zealand after "
61
+ "purchasing The Dominion."
62
+ )
63
+ documents = [
64
+ Document(
65
+ doc_id="1",
66
+ text="Keith Rupert Murdoch was born on 11 March 1931 in Melbourne, Australia, "
67
+ "the son of Sir Keith Murdoch (1885-1952) and Dame Elisabeth Murdoch (nee "
68
+ "Greene; 1909-2012). He is of English, Irish, and Scottish ancestry. Murdoch's "
69
+ "parents were also born in Melbourne. Keith Murdoch was a war correspondent "
70
+ "and later a regional newspaper magnate owning two newspapers in Adelaide, "
71
+ "South Australia, and a radio station in a faraway mining town. Following his "
72
+ "father's death, when he was 21, Murdoch returned from Oxford to take charge "
73
+ "of the family business News Limited, which had been established in 1923. "
74
+ "Rupert Murdoch turned its Adelaide newspaper, The News, its main asset, into "
75
+ "a major success. He began to direct his attention to acquisition and "
76
+ "expansion, buying the troubled Sunday Times in Perth, Western Australia "
77
+ "(1956) and over the next few years acquiring suburban and provincial "
78
+ "newspapers in New South Wales, Queensland, Victoria and the Northern "
79
+ "Territory, including the Sydney afternoon tabloid, The Daily Mirror (1960). "
80
+ 'The Economist describes Murdoch as "inventing the modern tabloid", as he '
81
+ "developed a pattern for his newspapers, increasing sports and scandal "
82
+ "coverage and adopting eye-catching headlines. Murdoch's first foray outside "
83
+ "Australia involved the purchase of a controlling interest in the New Zealand "
84
+ "daily The Dominion. In January 1964, while touring New Zealand with friends "
85
+ "in a rented Morris Minor after sailing across the Tasman, Murdoch read of a "
86
+ "takeover bid for the Wellington paper by the British-based Canadian newspaper "
87
+ "magnate, Lord Thomson of Fleet. On the spur of the moment, he launched a "
88
+ "counter-bid. A four-way battle for control ensued in which the 32-year-old "
89
+ "Murdoch was ultimately successful. Later in 1964, Murdoch launched The "
90
+ "Australian, Australia's first national daily newspaper, which was based "
91
+ "first in Canberra and later in Sydney. In 1972, Murdoch acquired the Sydney "
92
+ "morning tabloid The Daily Telegraph from Australian media mogul Sir Frank "
93
+ "Packer, who later regretted selling it to him. In 1984, Murdoch was appointed "
94
+ "Companion of the Order of Australia (AC) for services to publishing. In 1999, "
95
+ "Murdoch significantly expanded his music holdings in Australia by acquiring "
96
+ "the controlling share in a leading Australian independent label, Michael "
97
+ "Gudinski's Mushroom Records; he merged that with Festival Records, and the "
98
+ "result was Festival Mushroom Records (FMR). Both Festival and FMR were "
99
+ "managed by Murdoch's son James Murdoch for several years.",
100
+ ),
101
+ Document(
102
+ doc_id="2",
103
+ text="This document has nothing to do with Rupert Murdoch. This document is "
104
+ "two sentences long.",
105
+ ),
106
+ ]
107
+
108
+
109
+ result = rag.find_citations(assistant_response, documents, context, backend)
110
+ print(f"Result of citations intrinsic:\n{json.dumps(result, indent=2)}")
111
+ ```
112
+
113
+ ## Training Details
114
+
115
+ The citation generation intrinsics were trained on synthetically-generated citation datasets. The process of generating the training data consisted of two main steps:
116
+ - **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpora. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
117
+ - **Citation generation:** For each turn of the multi-turn RAG conversations from the previous step, we used a multi-step synthetic citation generation pipeline to generate citations for the assistant response.
118
+
119
+ The resulting data instances were used to train the citation generation intrinsics.
120
+
121
+ ### Training Data
122
+
123
+ The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
124
+ - [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
125
+ - [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
126
+ - [QuAC](https://huggingface.co/datasets/allenai/quac)
127
+
128
+
129
+ ## Evaluation
130
+
131
+ We evaluate the citation generation intrinsics on two citation benchmarks:
132
+ - [ALCE](https://aclanthology.org/2023.emnlp-main.398/): Evaluates the ability of models to produce document/passage-level citations (i.e., identify the documents/passages that support a statement in the response).
133
+ - [LongBench-Cite](https://arxiv.org/abs/2409.02897): Evaluates the ability of models to produce fine-grained span-level citations (i.e., identify the spans within the input documents/passages that support a statement in the response) with a focus on long contexts.
134
+
135
+ Since the intrinsics correspond to a post-hoc citation generation approach, their performance on the two benchmarks depends on the assistant responses for which they are asked to generate citations. To facilitate an apples-to-apples comparison, for each experiment, we keep the assistant responses the same and change the model that is used to generate the citations. In particular, we prompt an LLM to create an assistant response together with citations and evaluate the generated citations on the corresponding benchmark. Then, we compute and evaluate the citations generated for the same LLM response by each of the citation generation intrinsics. We provide results for the two intrinsics, implemented as LoRA adapters over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
136
+
137
+ ### Evaluation on ALCE
138
+
139
+ For the ALCE evaluation, we prompt Llama-3.1-70B-Instruct and Mixtral-8x22B-Instruct to generate both the assistant response and corresponding passage-level citations. We first calculate the performance of the citations generated by these models on ALCE. Subsequently, we feed the responses of these models (leaving out the citations) to the citation generation intrinsics and evaluate their generated citations. The results are shown in the table below:
140
+
141
+ Model used to generate response | Model used to generate citations | Recall | Precision | F1 |
142
+ |--------------| ----------------------------- | --------------- | ----------------- | --------- |
143
+ | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | 61.4 | 58.1 | 59.7 |
144
+ | Llama-3.1-70B-Instruct | Granite-3.3-2B LoRA citations | 51.5 | 64.2 | 57.2 |
145
+ | Llama-3.1-70B-Instruct | Granite-3.3-8B LoRA citations | 55.4 | 64.2 | 59.5 |
146
+ | Mixtral-8x22B-Instruct | Mixtral-8x22B-Instruct | 62.2 | 62.5 | 62.3 |
147
+ | Mixtral-8x22B-Instruct | Granite-3.3-2B LoRA citations | 51.4 | 67.3 | 58.3 |
148
+ | Mixtral-8x22B-Instruct | Granite-3.3-8B LoRA citations | 55.8 | 68.5 | 61.5 |
149
+
150
+ We observe that the LoRA adapter over Granite-3.3-8b-instruct performs on par with much bigger models when those are prompted to create passage-level citations (with the LoRA adapter over over Granite-3.3-2b-instruct being slightly worse). It is interesting to note that while the adapter's F1 performance is similar to the baselines, it exhibits a different precision-recall trade-off, trading lower recall for higher precision.
151
+
152
+ Notes:
153
+ - All results are reported on the ELI5 dataset using the ORACLE (5-psg) setting.
154
+ - To prompt Llama and Mixtral, we employ a setting similar to the one proposed in the ALCE paper; in particular we use a two-shot prompt comprised of two of the ICL examples from ALCE as well as a slightly modified version of the instruction from the paper.
155
+ - Sentence splitting of context/response is performed using NLTK.
156
+ - Finally, since ALCE expects passage-level citations, we elevate the finer-grained citations produced by the LoRA adapter to the passage level before running the ALCE evaluation.
157
+
158
+
159
+ ### Evaluation on LongBench-Cite
160
+
161
+ For the LonBench-Cite evaluation, we prompt Llama-3.1-70B-Instruct to generate both the assistant response and corresponding citations. Then we evaluate the citations generated by Llama as well as the post-hoc citations generated by the citation generation intrinsics when invoked on the Llama responses. The results are shown in the table below:
162
+
163
+ <table>
164
+ <tr>
165
+ <th>Model used to generate response</th>
166
+ <th>Model used to generate citations</th>
167
+ <th colspan="3">Longbench-Chat (en)</th>
168
+ <th colspan="3">MultifieldQA (en)</th>
169
+ <th colspan="3">HotpotQA</th>
170
+ <th colspan="3">GovReport</th>
171
+ </tr>
172
+ <tr>
173
+ <th></th>
174
+ <th></th>
175
+ <th>R</th><th>P</th><th>F1</th>
176
+ <th>R</th><th>P</th><th>F1</th>
177
+ <th>R</th><th>P</th><th>F1</th>
178
+ <th>R</th><th>P</th><th>F1</th>
179
+ </tr>
180
+ <tr>
181
+ <td>Llama-3.1-70B-Instruct</td>
182
+ <td>Llama-3.1-70B-Instruct</td>
183
+ <td>27.0</td><td>34.4</td><td>26.1</td>
184
+ <td>46.1</td><td>63.3</td><td>49.7</td>
185
+ <td>34.0</td><td>39.4</td><td>30.2</td>
186
+ <td>55.0</td><td>77.5</td><td>62.0</td>
187
+ </tr>
188
+ <tr>
189
+ <td>Llama-3.1-70B-Instruct</td>
190
+ <td>Granite-3.3-2B LoRA citations</td>
191
+ <td>38.7</td><td>47.4</td><td>39.3</td>
192
+ <td>66.4</td><td>81.8</td><td>70.4</td>
193
+ <td>60.7</td><td>68.5</td><td>59.7</td>
194
+ <td>60.1</td><td>72.4</td><td>64.7</td>
195
+ </tr>
196
+ <tr>
197
+ <td>Llama-3.1-70B-Instruct</td>
198
+ <td>Granite-3.3-8B LoRA citations</td>
199
+ <td>54.5</td><td>59.9</td><td>55.6</td>
200
+ <td>73.0</td><td>82.9</td><td>75.7</td>
201
+ <td>68.5</td><td>73.8</td><td>66.4</td>
202
+ <td>73.5</td><td>84.6</td><td>78.2</td>
203
+ </tr>
204
+ </table>
205
+
206
+ We observe that both variants of the LoRA adapter (even the one trained over Granite-3.3-2b-instruct) perform across the board significantly better than Llama-3.1-70B-Instruct when prompted to create span-level citations. This demonstrates the value of the adapter to create post-hoc citations even for assistant responses generated by much bigger LLMs.
207
+
208
+ Notes:
209
+ - The evaluation results are reported on the English subset of LongBench-Cite (i.e., restricted to instances whose `language` field equals to `en`).
210
+ - To prompt Llama to generate a response with citations, we use the one-shot prompt described in the paper.
211
+ - For the LoRA adapter, sentence splitting of the context is performed using NLTK. For the response, we reuse the splitting in Llama's output (since the LongBench-Cite prompt instructs the model to output a response split into sentences/statements).
212
+
213
+ ## Model Card Authors
214
+
215
+ [Yannis Katsis](mailto:yannis.katsis@ibm.com)</br>
216
+ [Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)
citations/gpt-oss-20b/lora/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openai/gpt-oss-20b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "v_proj",
29
+ "q_proj",
30
+ "k_proj"
31
+ ],
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
citations/gpt-oss-20b/lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d27a9b6f66f37e21e36c0248eefd1d4284f92c6e7cccf5dd544b14e32fbd71f0
3
+ size 31876192
citations/gpt-oss-20b/lora/chat_template.jinja ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#-
2
+ In addition to the normal inputs of `messages` and `tools`, this template also accepts the
3
+ following kwargs:
4
+ - "builtin_tools": A list, can contain "browser" and/or "python".
5
+ - "model_identity": A string that optionally describes the model identity.
6
+ - "reasoning_effort": A string that describes the reasoning effort, defaults to "medium".
7
+ #}
8
+
9
+ {#- Tool Definition Rendering ============================================== #}
10
+ {%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%}
11
+ {%- if param_spec.type == "array" -%}
12
+ {%- if param_spec['items'] -%}
13
+ {%- if param_spec['items']['type'] == "string" -%}
14
+ {{- "string[]" }}
15
+ {%- elif param_spec['items']['type'] == "number" -%}
16
+ {{- "number[]" }}
17
+ {%- elif param_spec['items']['type'] == "integer" -%}
18
+ {{- "number[]" }}
19
+ {%- elif param_spec['items']['type'] == "boolean" -%}
20
+ {{- "boolean[]" }}
21
+ {%- else -%}
22
+ {%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%}
23
+ {%- if inner_type == "object | object" or inner_type|length > 50 -%}
24
+ {{- "any[]" }}
25
+ {%- else -%}
26
+ {{- inner_type + "[]" }}
27
+ {%- endif -%}
28
+ {%- endif -%}
29
+ {%- if param_spec.nullable -%}
30
+ {{- " | null" }}
31
+ {%- endif -%}
32
+ {%- else -%}
33
+ {{- "any[]" }}
34
+ {%- if param_spec.nullable -%}
35
+ {{- " | null" }}
36
+ {%- endif -%}
37
+ {%- endif -%}
38
+ {%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%}
39
+ {#- Handle array of types like ["object", "object"] from Union[dict, list] #}
40
+ {%- if param_spec.type | length > 1 -%}
41
+ {{- param_spec.type | join(" | ") }}
42
+ {%- else -%}
43
+ {{- param_spec.type[0] }}
44
+ {%- endif -%}
45
+ {%- elif param_spec.oneOf -%}
46
+ {#- Handle oneOf schemas - check for complex unions and fallback to any #}
47
+ {%- set has_object_variants = false -%}
48
+ {%- for variant in param_spec.oneOf -%}
49
+ {%- if variant.type == "object" -%}
50
+ {%- set has_object_variants = true -%}
51
+ {%- endif -%}
52
+ {%- endfor -%}
53
+ {%- if has_object_variants and param_spec.oneOf|length > 1 -%}
54
+ {{- "any" }}
55
+ {%- else -%}
56
+ {%- for variant in param_spec.oneOf -%}
57
+ {{- render_typescript_type(variant, required_params) -}}
58
+ {%- if variant.description %}
59
+ {{- "// " + variant.description }}
60
+ {%- endif -%}
61
+ {%- if variant.default is defined %}
62
+ {{ "// default: " + variant.default|tojson }}
63
+ {%- endif -%}
64
+ {%- if not loop.last %}
65
+ {{- " | " }}
66
+ {% endif -%}
67
+ {%- endfor -%}
68
+ {%- endif -%}
69
+ {%- elif param_spec.type == "string" -%}
70
+ {%- if param_spec.enum -%}
71
+ {{- '"' + param_spec.enum|join('" | "') + '"' -}}
72
+ {%- else -%}
73
+ {{- "string" }}
74
+ {%- if param_spec.nullable %}
75
+ {{- " | null" }}
76
+ {%- endif -%}
77
+ {%- endif -%}
78
+ {%- elif param_spec.type == "number" -%}
79
+ {{- "number" }}
80
+ {%- elif param_spec.type == "integer" -%}
81
+ {{- "number" }}
82
+ {%- elif param_spec.type == "boolean" -%}
83
+ {{- "boolean" }}
84
+
85
+ {%- elif param_spec.type == "object" -%}
86
+ {%- if param_spec.properties -%}
87
+ {{- "{\n" }}
88
+ {%- for prop_name, prop_spec in param_spec.properties.items() -%}
89
+ {{- prop_name -}}
90
+ {%- if prop_name not in (param_spec.required or []) -%}
91
+ {{- "?" }}
92
+ {%- endif -%}
93
+ {{- ": " }}
94
+ {{ render_typescript_type(prop_spec, param_spec.required or []) }}
95
+ {%- if not loop.last -%}
96
+ {{-", " }}
97
+ {%- endif -%}
98
+ {%- endfor -%}
99
+ {{- "}" }}
100
+ {%- else -%}
101
+ {{- "object" }}
102
+ {%- endif -%}
103
+ {%- else -%}
104
+ {{- "any" }}
105
+ {%- endif -%}
106
+ {%- endmacro -%}
107
+
108
+ {%- macro render_tool_namespace(namespace_name, tools) -%}
109
+ {{- "## " + namespace_name + "\n\n" }}
110
+ {{- "namespace " + namespace_name + " {\n\n" }}
111
+ {%- for tool in tools %}
112
+ {%- set tool = tool.function %}
113
+ {{- "// " + tool.description + "\n" }}
114
+ {{- "type "+ tool.name + " = " }}
115
+ {%- if tool.parameters and tool.parameters.properties %}
116
+ {{- "(_: {\n" }}
117
+ {%- for param_name, param_spec in tool.parameters.properties.items() %}
118
+ {%- if param_spec.description %}
119
+ {{- "// " + param_spec.description + "\n" }}
120
+ {%- endif %}
121
+ {{- param_name }}
122
+ {%- if param_name not in (tool.parameters.required or []) -%}
123
+ {{- "?" }}
124
+ {%- endif -%}
125
+ {{- ": " }}
126
+ {{- render_typescript_type(param_spec, tool.parameters.required or []) }}
127
+ {%- if param_spec.default is defined -%}
128
+ {%- if param_spec.enum %}
129
+ {{- ", // default: " + param_spec.default }}
130
+ {%- elif param_spec.oneOf %}
131
+ {{- "// default: " + param_spec.default }}
132
+ {%- else %}
133
+ {{- ", // default: " + param_spec.default|tojson }}
134
+ {%- endif -%}
135
+ {%- endif -%}
136
+ {%- if not loop.last %}
137
+ {{- ",\n" }}
138
+ {%- else %}
139
+ {{- ",\n" }}
140
+ {%- endif -%}
141
+ {%- endfor %}
142
+ {{- "}) => any;\n\n" }}
143
+ {%- else -%}
144
+ {{- "() => any;\n\n" }}
145
+ {%- endif -%}
146
+ {%- endfor %}
147
+ {{- "} // namespace " + namespace_name }}
148
+ {%- endmacro -%}
149
+
150
+ {%- macro render_builtin_tools(browser_tool, python_tool) -%}
151
+ {%- if browser_tool %}
152
+ {{- "## browser\n\n" }}
153
+ {{- "// Tool for browsing.\n" }}
154
+ {{- "// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.\n" }}
155
+ {{- "// Cite information from the tool using the following format:\n" }}
156
+ {{- "// `【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`.\n" }}
157
+ {{- "// Do not quote more than 10 words directly from the tool output.\n" }}
158
+ {{- "// sources=web (default: web)\n" }}
159
+ {{- "namespace browser {\n\n" }}
160
+ {{- "// Searches for information related to `query` and displays `topn` results.\n" }}
161
+ {{- "type search = (_: {\n" }}
162
+ {{- "query: string,\n" }}
163
+ {{- "topn?: number, // default: 10\n" }}
164
+ {{- "source?: string,\n" }}
165
+ {{- "}) => any;\n\n" }}
166
+ {{- "// Opens the link `id` from the page indicated by `cursor` starting at line number `loc`, showing `num_lines` lines.\n" }}
167
+ {{- "// Valid link ids are displayed with the formatting: `【{id}†.*】`.\n" }}
168
+ {{- "// If `cursor` is not provided, the most recent page is implied.\n" }}
169
+ {{- "// If `id` is a string, it is treated as a fully qualified URL associated with `source`.\n" }}
170
+ {{- "// If `loc` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.\n" }}
171
+ {{- "// Use this function without `id` to scroll to a new location of an opened page.\n" }}
172
+ {{- "type open = (_: {\n" }}
173
+ {{- "id?: number | string, // default: -1\n" }}
174
+ {{- "cursor?: number, // default: -1\n" }}
175
+ {{- "loc?: number, // default: -1\n" }}
176
+ {{- "num_lines?: number, // default: -1\n" }}
177
+ {{- "view_source?: boolean, // default: false\n" }}
178
+ {{- "source?: string,\n" }}
179
+ {{- "}) => any;\n\n" }}
180
+ {{- "// Finds exact matches of `pattern` in the current page, or the page given by `cursor`.\n" }}
181
+ {{- "type find = (_: {\n" }}
182
+ {{- "pattern: string,\n" }}
183
+ {{- "cursor?: number, // default: -1\n" }}
184
+ {{- "}) => any;\n\n" }}
185
+ {{- "} // namespace browser\n\n" }}
186
+ {%- endif -%}
187
+
188
+ {%- if python_tool %}
189
+ {{- "## python\n\n" }}
190
+ {{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).\n\n" }}
191
+ {{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.\n\n" }}
192
+ {%- endif -%}
193
+ {%- endmacro -%}
194
+
195
+ {#- System Message Construction ============================================ #}
196
+ {%- macro build_system_message() -%}
197
+ {%- if model_identity is not defined %}
198
+ {%- set model_identity = "You are ChatGPT, a large language model trained by OpenAI." %}
199
+ {%- endif %}
200
+ {{- model_identity + "\n" }}
201
+ {{- "Knowledge cutoff: 2024-06\n" }}
202
+ {{- "Current date: " + strftime_now("%Y-%m-%d") + "\n\n" }}
203
+ {%- if reasoning_effort is not defined %}
204
+ {%- set reasoning_effort = "medium" %}
205
+ {%- endif %}
206
+ {{- "Reasoning: " + reasoning_effort + "\n\n" }}
207
+ {%- if builtin_tools %}
208
+ {{- "# Tools\n\n" }}
209
+ {%- set available_builtin_tools = namespace(browser=false, python=false) %}
210
+ {%- for tool in builtin_tools %}
211
+ {%- if tool == "browser" %}
212
+ {%- set available_builtin_tools.browser = true %}
213
+ {%- elif tool == "python" %}
214
+ {%- set available_builtin_tools.python = true %}
215
+ {%- endif %}
216
+ {%- endfor %}
217
+ {{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }}
218
+ {%- endif -%}
219
+ {{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }}
220
+ {%- if tools -%}
221
+ {{- "\nCalls to these tools must go to the commentary channel: 'functions'." }}
222
+ {%- endif -%}
223
+ {%- endmacro -%}
224
+
225
+ {#- Main Template Logic ================================================= #}
226
+ {#- Set defaults #}
227
+
228
+ {#- Render system message #}
229
+ {{- "<|start|>system<|message|>" }}
230
+ {{- build_system_message() }}
231
+ {{- "<|end|>" }}
232
+
233
+ {#- Extract developer message #}
234
+ {%- if messages[0].role == "developer" or messages[0].role == "system" %}
235
+ {%- set developer_message = messages[0].content %}
236
+ {%- set loop_messages = messages[1:] %}
237
+ {%- else %}
238
+ {%- set developer_message = "" %}
239
+ {%- set loop_messages = messages %}
240
+ {%- endif %}
241
+
242
+ {#- Render developer message #}
243
+ {%- if developer_message or tools %}
244
+ {{- "<|start|>developer<|message|>" }}
245
+ {%- if developer_message %}
246
+ {{- "# Instructions\n\n" }}
247
+ {{- developer_message }}
248
+ {{- "\n\n" }}
249
+ {%- endif %}
250
+ {%- if tools -%}
251
+ {{- "# Tools\n\n" }}
252
+ {{- render_tool_namespace("functions", tools) }}
253
+ {%- endif -%}
254
+ {{- "<|end|>" }}
255
+ {%- endif %}
256
+
257
+ {#- Render messages #}
258
+ {%- set last_tool_call = namespace(name=none) %}
259
+ {%- for message in loop_messages -%}
260
+ {#- At this point only assistant/user/tool messages should remain #}
261
+ {%- if message.role == 'assistant' -%}
262
+ {#- Checks to ensure the messages are being passed in the format we expect #}
263
+ {%- if "content" in message %}
264
+ {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
265
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the content field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
266
+ {%- endif %}
267
+ {%- endif %}
268
+ {%- if "thinking" in message %}
269
+ {%- if "<|channel|>analysis<|message|>" in message.thinking or "<|channel|>final<|message|>" in message.thinking %}
270
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
271
+ {%- endif %}
272
+ {%- endif %}
273
+ {%- if "tool_calls" in message %}
274
+ {#- We need very careful handling here - we want to drop the tool call analysis message if the model #}
275
+ {#- has output a later <|final|> message, but otherwise we want to retain it. This is the only case #}
276
+ {#- when we render CoT/analysis messages in inference. #}
277
+ {%- set future_final_message = namespace(found=false) %}
278
+ {%- for future_message in loop_messages[loop.index:] %}
279
+ {%- if future_message.role == 'assistant' and "tool_calls" not in future_message %}
280
+ {%- set future_final_message.found = true %}
281
+ {%- endif %}
282
+ {%- endfor %}
283
+ {#- We assume max 1 tool call per message, and so we infer the tool call name #}
284
+ {#- in "tool" messages from the most recent assistant tool call name #}
285
+ {%- set tool_call = message.tool_calls[0] %}
286
+ {%- if tool_call.function %}
287
+ {%- set tool_call = tool_call.function %}
288
+ {%- endif %}
289
+ {%- if message.content and message.thinking %}
290
+ {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
291
+ {%- elif message.content and not future_final_message.found %}
292
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
293
+ {%- elif message.thinking and not future_final_message.found %}
294
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
295
+ {%- endif %}
296
+ {{- "<|start|>assistant to=" }}
297
+ {{- "functions." + tool_call.name + "<|channel|>commentary " }}
298
+ {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
299
+ {{- tool_call.arguments|tojson }}
300
+ {{- "<|call|>" }}
301
+ {%- set last_tool_call.name = tool_call.name %}
302
+ {%- elif loop.last and not add_generation_prompt %}
303
+ {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
304
+ {#- This is a situation that should only occur in training, never in inference. #}
305
+ {%- if "thinking" in message %}
306
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
307
+ {%- endif %}
308
+ {#- <|return|> indicates the end of generation, but <|end|> does not #}
309
+ {#- <|return|> should never be an input to the model, but we include it as the final token #}
310
+ {#- when training, so the model learns to emit it. #}
311
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
312
+ {%- else %}
313
+ {#- CoT is dropped during all previous turns, so we never render it for inference #}
314
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
315
+ {%- set last_tool_call.name = none %}
316
+ {%- endif %}
317
+ {%- elif message.role == 'tool' -%}
318
+ {%- if last_tool_call.name is none %}
319
+ {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
320
+ {%- endif %}
321
+ {{- "<|start|>functions." + last_tool_call.name }}
322
+ {{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }}
323
+ {%- elif message.role == 'user' -%}
324
+ {{- "<|start|>user<|message|>" + message.content + "<|end|>" }}
325
+ {%- endif -%}
326
+ {%- endfor -%}
327
+
328
+ {#- Generation prompt #}
329
+ {%- if add_generation_prompt -%}
330
+ <|start|>assistant
331
+ {%- endif -%}
citations/gpt-oss-20b/lora/io.yaml ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model name string, or null to use whatever is provided in the chat completion request
2
+ model: ~
3
+ # JSON schema of the model's output
4
+ response_format: |
5
+ {
6
+ "$defs": {
7
+ "_MODEL_OUTPUT_ENTRY": {
8
+ "properties": {
9
+ "r": {
10
+ "minimum": 0,
11
+ "title": "R",
12
+ "type": "integer"
13
+ },
14
+ "c": {
15
+ "items": {
16
+ "minimum": 0,
17
+ "type": "integer"
18
+ },
19
+ "title": "C",
20
+ "type": "array"
21
+ }
22
+ },
23
+ "required": [
24
+ "r",
25
+ "c"
26
+ ],
27
+ "title": "_MODEL_OUTPUT_ENTRY",
28
+ "type": "object"
29
+ }
30
+ },
31
+ "items": {
32
+ "$ref": "#/$defs/_MODEL_OUTPUT_ENTRY"
33
+ },
34
+ "title": "_MODEL_OUTPUT",
35
+ "type": "array"
36
+ }
37
+ transformations:
38
+ # Explode the list of document sentences in each citation
39
+ - type: explode
40
+ input_path: [] # Zero-length path means match root element
41
+ target_field: "c"
42
+ # Model may repeat itself; drop the resulting duplicates.
43
+ - type: drop_duplicates
44
+ input_path: [] # Zero-length path means match root element
45
+ target_fields: ["r", "c"]
46
+ # Replace sentence number with sentence location and contents.
47
+ # Do this first for sentences from the last turn, then for sentences from documents.
48
+ - type: decode_sentences
49
+ source: "last_message"
50
+ input_path: [~, "r"] # Null in path means wildcard
51
+ # New fields to add for each sentence
52
+ output_names:
53
+ begin: "response_begin"
54
+ end: "response_end"
55
+ text: "response_text"
56
+ - type: decode_sentences
57
+ source: "documents"
58
+ input_path: [~, "c"] # Null in path means wildcard
59
+ # New fields to add for each sentence
60
+ output_names:
61
+ document_id: "citation_doc_id"
62
+ begin: "citation_begin"
63
+ end: "citation_end"
64
+ text: "citation_text"
65
+ # Remove fields that we no longer need
66
+ - type: project
67
+ input_path: []
68
+ retained_fields:
69
+ - "response_begin"
70
+ - "response_end"
71
+ - "response_text"
72
+ - "citation_doc_id"
73
+ - "citation_begin"
74
+ - "citation_end"
75
+ - "citation_text"
76
+ # Merge adjacent document spans
77
+ - type: merge_spans
78
+ input_path: []
79
+ group_fields: ["response_begin", "response_end", "response_text", "citation_doc_id"]
80
+ begin_field: "citation_begin"
81
+ end_field: "citation_end"
82
+ text_field: "citation_text"
83
+
84
+ instruction: >
85
+ Split the last assistant response into individual sentences.
86
+ For each sentence in the response, identify the statement IDs from the below
87
+ documents that it references. Ensure that your output includes all response
88
+ sentence IDs, and for each response sentence ID, provide the list of corresponding
89
+ referring document sentence IDs. The output must be a json structure.
90
+ parameters:
91
+ max_completion_tokens: 4096
92
+ sentence_boundaries:
93
+ # Mapping from string location to sentence delimiter prefix
94
+ last_message: "r" # <r0>, <r1>, etc.
95
+ documents: "c"
96
+ # gpt-oss base models have no "documents" argument
97
+ docs_as_message: json
citations/gpt-oss-20b/lora/special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|return|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
citations/gpt-oss-20b/lora/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0614fe83cadab421296e664e1f48f4261fa8fef6e03e63bb75c20f38e37d07d3
3
+ size 27868174
citations/gpt-oss-20b/lora/tokenizer_config.json ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "199998": {
4
+ "content": "<|startoftext|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "199999": {
12
+ "content": "<|endoftext|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "200000": {
20
+ "content": "<|reserved_200000|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "200001": {
28
+ "content": "<|reserved_200001|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "200002": {
36
+ "content": "<|return|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "200003": {
44
+ "content": "<|constrain|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "200004": {
52
+ "content": "<|reserved_200004|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "200005": {
60
+ "content": "<|channel|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "200006": {
68
+ "content": "<|start|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "200007": {
76
+ "content": "<|end|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "200008": {
84
+ "content": "<|message|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "200009": {
92
+ "content": "<|reserved_200009|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "200010": {
100
+ "content": "<|reserved_200010|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "200011": {
108
+ "content": "<|reserved_200011|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "200012": {
116
+ "content": "<|call|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "200013": {
124
+ "content": "<|reserved_200013|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "200014": {
132
+ "content": "<|reserved_200014|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "200015": {
140
+ "content": "<|reserved_200015|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "200016": {
148
+ "content": "<|reserved_200016|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "200017": {
156
+ "content": "<|reserved_200017|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "200018": {
164
+ "content": "<|endofprompt|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ }
171
+ },
172
+ "bos_token": "<|startoftext|>",
173
+ "clean_up_tokenization_spaces": false,
174
+ "eos_token": "<|return|>",
175
+ "extra_special_tokens": {},
176
+ "model_input_names": [
177
+ "input_ids",
178
+ "attention_mask"
179
+ ],
180
+ "model_max_length": 1000000000000000019884624838656,
181
+ "pad_token": "<|endoftext|>",
182
+ "padding_side": "right",
183
+ "split_special_tokens": false,
184
+ "tokenizer_class": "PreTrainedTokenizerFast"
185
+ }
hallucination_detection/README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Hallucination Detection
11
+
12
+ ## Model Summary
13
+
14
+ This is a RAG-specific family of intrinsics fine-tuned for the hallucination detection task. Given a multi-turn conversation between a user and an AI assistant, ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a hallucination label (faithful/partial/unfaithful/NA) for each sentence in the assistant response.
15
+
16
+ We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
17
+
18
+ </br>
19
+
20
+ - **Developer:** IBM Research
21
+ - **Model type:** LoRA adapter for [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) and [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
22
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
23
+
24
+ ## Intended use
25
+ This is a family of hallucination detection intrinsics that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
26
+
27
+ > [!TIP]
28
+ > Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output). We next describe the input/output of the hallucination detection intrinsics when invoked through granite-common.
29
+
30
+ **Intrinsic input**: The hallucination detection intrinsic takes as input an OpenAI-compatible chat completion request. This request includes: a list of conversation turns that ends with the assistant’s response (the response to be checked for hallucinations) and a list of reference documents that the final assistant response should be grounded on. See the code snippets in the Quickstart Example section below for examples of how to format the chat completion request as a JSON object.
31
+
32
+ **Intrinsic output**: The output of the hallucination detection intrinsic is formatted as the result of the original chat completion request containing the hallucinations detected for the last assistant response. The hallucinations are provided in the form of a JSON array, whose items include the text and begin/end of a response span (sentence) together with the text, faithfulness_likelihood of the response sentence, and the explanation for the faithfulness_likelihood.
33
+
34
+ **Going from input to output**: When calling the intrinsic through granite-common one should follow the steps below to transform the intrinsic input to the corresponding output. These steps are also exemplified in the code snippets included in the Quickstart Example section below. Given an input chat completion request, the request should be passed to the corresponding input processor (also referred to as IntrinsicsRewriter) provided by granite-common. The input processor converts the request to the appropriate format expected by the underlying hallucination detection model. This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction. The input processor's result should then be passed to the underlying hallucination detection model for inference. The model identifies hallucinations using a compact representation consisting of sentence IDs in the last assistant response and documents. This output should finally be passed to the appropriate output processor (also referred to as IntrinsicsResultProcessor) provided by granite-common. The output processor converts the low-level raw model output to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly format ready for consumption by downstream applications.
35
+
36
+ ## Quickstart Example
37
+
38
+ The recommended way to call this intrinsic is through the [Mellea](https://mellea.ai) framework.
39
+ Here is some example code for calling this intrinsic from Mellea:
40
+ ```
41
+ from mellea.backends.huggingface import LocalHFBackend
42
+ from mellea.stdlib.base import ChatContext, Document
43
+ from mellea.stdlib.chat import Message
44
+ from mellea.stdlib.intrinsics import rag
45
+ import json
46
+
47
+
48
+ backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-2b-instruct")
49
+ context = (
50
+ ChatContext()
51
+ .add(Message("assistant", "Hello there, how can I help you?"))
52
+ .add(Message("user", "Tell me about some yellow fish."))
53
+ )
54
+
55
+ assistant_response = "Purple bumble fish are yellow. Green bumble fish are also yellow."
56
+
57
+ documents = [
58
+ Document(
59
+ doc_id="1",
60
+ text="The only type of fish that is yellow is the purple bumble fish.",
61
+ )
62
+ ]
63
+
64
+ result = rag.flag_hallucinated_content(assistant_response, documents, context, backend)
65
+ print(f"Result of hallucination check: {json.dumps(result, indent=2)}")
66
+ ```
67
+
68
+
69
+ ## Training Details
70
+
71
+ The process of generating the training data for the hallucination detection intrinsic consisted of two main steps:
72
+
73
+ - **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpus. For details on the RAG conversation generation process, please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
74
+
75
+ - **Faithfulness label generation:** For creating the faithfulness labels for the responses, we used a multi-step synthetic hallucination label and reasoning generation pipeline.
76
+ This process resulted in ~50K data instances, which were used to train the LoRA adapter.
77
+
78
+
79
+
80
+ ### Training Data
81
+
82
+
83
+
84
+ The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
85
+
86
+ - [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
87
+
88
+ - [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
89
+
90
+ - [QuAC](https://huggingface.co/datasets/allenai/quac)
91
+
92
+
93
+
94
+
95
+ ## Evaluation
96
+
97
+ We evaluated the LoRA adapter on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the LoRA adapter and the methods reported in the RAGTruth paper. The responses that obtain a faithfulness labels `partial` or `unfaithful` for at least one sentence are considered as hallucinated responses.
98
+
99
+
100
+
101
+ The results are shown in the table below. The results for the baselines are extracted from the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) paper.
102
+
103
+
104
+ | Model | Precision | Recall | F1 |
105
+ |---|---|---|---|
106
+ | GPT 4o mini (prompted) | 46.8 | 59.6 | 52.4 |
107
+ | GPT 4o (prompted) | 49.5 | 60.1 | 54.3 |
108
+ | gpt-4-turbo (prompted) | 33.2 | 90.6 | 45.6 |
109
+ | [SelfCheckGPT](https://aclanthology.org/2023.emnlp-main.557.pdf) | 35.0 | 58.0 | 43.7 |
110
+ | [LMvLM](https://aclanthology.org/2023.emnlp-main.778.pdf) | 18.7 | 76.9 | 30.1 |
111
+ | Granite 3.3-2b_hallucination-detection_LoRA | 55.8 | 74.9 | 63.9 |
112
+ | Granite 3.3-8b_hallucination-detection_LoRA | 58.1 | 77.6 | 66.5 |
113
+
114
+
115
+ ## Model Card Author
116
+
117
+ [Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)
hallucination_detection/gpt-oss-20b/lora/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/gpt-oss-20b
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
hallucination_detection/gpt-oss-20b/lora/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openai/gpt-oss-20b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "k_proj",
28
+ "q_proj",
29
+ "o_proj",
30
+ "v_proj"
31
+ ],
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
hallucination_detection/gpt-oss-20b/lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d324b0a7fad875c444762e7681d4ae1ebce10aa138c2823be6617f7a214a6bf
3
+ size 31876192
hallucination_detection/gpt-oss-20b/lora/chat_template.jinja ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#-
2
+ In addition to the normal inputs of `messages` and `tools`, this template also accepts the
3
+ following kwargs:
4
+ - "builtin_tools": A list, can contain "browser" and/or "python".
5
+ - "model_identity": A string that optionally describes the model identity.
6
+ - "reasoning_effort": A string that describes the reasoning effort, defaults to "medium".
7
+ #}
8
+
9
+ {#- Tool Definition Rendering ============================================== #}
10
+ {%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%}
11
+ {%- if param_spec.type == "array" -%}
12
+ {%- if param_spec['items'] -%}
13
+ {%- if param_spec['items']['type'] == "string" -%}
14
+ {{- "string[]" }}
15
+ {%- elif param_spec['items']['type'] == "number" -%}
16
+ {{- "number[]" }}
17
+ {%- elif param_spec['items']['type'] == "integer" -%}
18
+ {{- "number[]" }}
19
+ {%- elif param_spec['items']['type'] == "boolean" -%}
20
+ {{- "boolean[]" }}
21
+ {%- else -%}
22
+ {%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%}
23
+ {%- if inner_type == "object | object" or inner_type|length > 50 -%}
24
+ {{- "any[]" }}
25
+ {%- else -%}
26
+ {{- inner_type + "[]" }}
27
+ {%- endif -%}
28
+ {%- endif -%}
29
+ {%- if param_spec.nullable -%}
30
+ {{- " | null" }}
31
+ {%- endif -%}
32
+ {%- else -%}
33
+ {{- "any[]" }}
34
+ {%- if param_spec.nullable -%}
35
+ {{- " | null" }}
36
+ {%- endif -%}
37
+ {%- endif -%}
38
+ {%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%}
39
+ {#- Handle array of types like ["object", "object"] from Union[dict, list] #}
40
+ {%- if param_spec.type | length > 1 -%}
41
+ {{- param_spec.type | join(" | ") }}
42
+ {%- else -%}
43
+ {{- param_spec.type[0] }}
44
+ {%- endif -%}
45
+ {%- elif param_spec.oneOf -%}
46
+ {#- Handle oneOf schemas - check for complex unions and fallback to any #}
47
+ {%- set has_object_variants = false -%}
48
+ {%- for variant in param_spec.oneOf -%}
49
+ {%- if variant.type == "object" -%}
50
+ {%- set has_object_variants = true -%}
51
+ {%- endif -%}
52
+ {%- endfor -%}
53
+ {%- if has_object_variants and param_spec.oneOf|length > 1 -%}
54
+ {{- "any" }}
55
+ {%- else -%}
56
+ {%- for variant in param_spec.oneOf -%}
57
+ {{- render_typescript_type(variant, required_params) -}}
58
+ {%- if variant.description %}
59
+ {{- "// " + variant.description }}
60
+ {%- endif -%}
61
+ {%- if variant.default is defined %}
62
+ {{ "// default: " + variant.default|tojson }}
63
+ {%- endif -%}
64
+ {%- if not loop.last %}
65
+ {{- " | " }}
66
+ {% endif -%}
67
+ {%- endfor -%}
68
+ {%- endif -%}
69
+ {%- elif param_spec.type == "string" -%}
70
+ {%- if param_spec.enum -%}
71
+ {{- '"' + param_spec.enum|join('" | "') + '"' -}}
72
+ {%- else -%}
73
+ {{- "string" }}
74
+ {%- if param_spec.nullable %}
75
+ {{- " | null" }}
76
+ {%- endif -%}
77
+ {%- endif -%}
78
+ {%- elif param_spec.type == "number" -%}
79
+ {{- "number" }}
80
+ {%- elif param_spec.type == "integer" -%}
81
+ {{- "number" }}
82
+ {%- elif param_spec.type == "boolean" -%}
83
+ {{- "boolean" }}
84
+
85
+ {%- elif param_spec.type == "object" -%}
86
+ {%- if param_spec.properties -%}
87
+ {{- "{\n" }}
88
+ {%- for prop_name, prop_spec in param_spec.properties.items() -%}
89
+ {{- prop_name -}}
90
+ {%- if prop_name not in (param_spec.required or []) -%}
91
+ {{- "?" }}
92
+ {%- endif -%}
93
+ {{- ": " }}
94
+ {{ render_typescript_type(prop_spec, param_spec.required or []) }}
95
+ {%- if not loop.last -%}
96
+ {{-", " }}
97
+ {%- endif -%}
98
+ {%- endfor -%}
99
+ {{- "}" }}
100
+ {%- else -%}
101
+ {{- "object" }}
102
+ {%- endif -%}
103
+ {%- else -%}
104
+ {{- "any" }}
105
+ {%- endif -%}
106
+ {%- endmacro -%}
107
+
108
+ {%- macro render_tool_namespace(namespace_name, tools) -%}
109
+ {{- "## " + namespace_name + "\n\n" }}
110
+ {{- "namespace " + namespace_name + " {\n\n" }}
111
+ {%- for tool in tools %}
112
+ {%- set tool = tool.function %}
113
+ {{- "// " + tool.description + "\n" }}
114
+ {{- "type "+ tool.name + " = " }}
115
+ {%- if tool.parameters and tool.parameters.properties %}
116
+ {{- "(_: {\n" }}
117
+ {%- for param_name, param_spec in tool.parameters.properties.items() %}
118
+ {%- if param_spec.description %}
119
+ {{- "// " + param_spec.description + "\n" }}
120
+ {%- endif %}
121
+ {{- param_name }}
122
+ {%- if param_name not in (tool.parameters.required or []) -%}
123
+ {{- "?" }}
124
+ {%- endif -%}
125
+ {{- ": " }}
126
+ {{- render_typescript_type(param_spec, tool.parameters.required or []) }}
127
+ {%- if param_spec.default is defined -%}
128
+ {%- if param_spec.enum %}
129
+ {{- ", // default: " + param_spec.default }}
130
+ {%- elif param_spec.oneOf %}
131
+ {{- "// default: " + param_spec.default }}
132
+ {%- else %}
133
+ {{- ", // default: " + param_spec.default|tojson }}
134
+ {%- endif -%}
135
+ {%- endif -%}
136
+ {%- if not loop.last %}
137
+ {{- ",\n" }}
138
+ {%- else %}
139
+ {{- ",\n" }}
140
+ {%- endif -%}
141
+ {%- endfor %}
142
+ {{- "}) => any;\n\n" }}
143
+ {%- else -%}
144
+ {{- "() => any;\n\n" }}
145
+ {%- endif -%}
146
+ {%- endfor %}
147
+ {{- "} // namespace " + namespace_name }}
148
+ {%- endmacro -%}
149
+
150
+ {%- macro render_builtin_tools(browser_tool, python_tool) -%}
151
+ {%- if browser_tool %}
152
+ {{- "## browser\n\n" }}
153
+ {{- "// Tool for browsing.\n" }}
154
+ {{- "// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.\n" }}
155
+ {{- "// Cite information from the tool using the following format:\n" }}
156
+ {{- "// `【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`.\n" }}
157
+ {{- "// Do not quote more than 10 words directly from the tool output.\n" }}
158
+ {{- "// sources=web (default: web)\n" }}
159
+ {{- "namespace browser {\n\n" }}
160
+ {{- "// Searches for information related to `query` and displays `topn` results.\n" }}
161
+ {{- "type search = (_: {\n" }}
162
+ {{- "query: string,\n" }}
163
+ {{- "topn?: number, // default: 10\n" }}
164
+ {{- "source?: string,\n" }}
165
+ {{- "}) => any;\n\n" }}
166
+ {{- "// Opens the link `id` from the page indicated by `cursor` starting at line number `loc`, showing `num_lines` lines.\n" }}
167
+ {{- "// Valid link ids are displayed with the formatting: `【{id}†.*】`.\n" }}
168
+ {{- "// If `cursor` is not provided, the most recent page is implied.\n" }}
169
+ {{- "// If `id` is a string, it is treated as a fully qualified URL associated with `source`.\n" }}
170
+ {{- "// If `loc` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.\n" }}
171
+ {{- "// Use this function without `id` to scroll to a new location of an opened page.\n" }}
172
+ {{- "type open = (_: {\n" }}
173
+ {{- "id?: number | string, // default: -1\n" }}
174
+ {{- "cursor?: number, // default: -1\n" }}
175
+ {{- "loc?: number, // default: -1\n" }}
176
+ {{- "num_lines?: number, // default: -1\n" }}
177
+ {{- "view_source?: boolean, // default: false\n" }}
178
+ {{- "source?: string,\n" }}
179
+ {{- "}) => any;\n\n" }}
180
+ {{- "// Finds exact matches of `pattern` in the current page, or the page given by `cursor`.\n" }}
181
+ {{- "type find = (_: {\n" }}
182
+ {{- "pattern: string,\n" }}
183
+ {{- "cursor?: number, // default: -1\n" }}
184
+ {{- "}) => any;\n\n" }}
185
+ {{- "} // namespace browser\n\n" }}
186
+ {%- endif -%}
187
+
188
+ {%- if python_tool %}
189
+ {{- "## python\n\n" }}
190
+ {{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).\n\n" }}
191
+ {{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.\n\n" }}
192
+ {%- endif -%}
193
+ {%- endmacro -%}
194
+
195
+ {#- System Message Construction ============================================ #}
196
+ {%- macro build_system_message() -%}
197
+ {%- if model_identity is not defined %}
198
+ {%- set model_identity = "You are ChatGPT, a large language model trained by OpenAI." %}
199
+ {%- endif %}
200
+ {{- model_identity + "\n" }}
201
+ {{- "Knowledge cutoff: 2024-06\n" }}
202
+ {{- "Current date: " + strftime_now("%Y-%m-%d") + "\n\n" }}
203
+ {%- if reasoning_effort is not defined %}
204
+ {%- set reasoning_effort = "medium" %}
205
+ {%- endif %}
206
+ {{- "Reasoning: " + reasoning_effort + "\n\n" }}
207
+ {%- if builtin_tools %}
208
+ {{- "# Tools\n\n" }}
209
+ {%- set available_builtin_tools = namespace(browser=false, python=false) %}
210
+ {%- for tool in builtin_tools %}
211
+ {%- if tool == "browser" %}
212
+ {%- set available_builtin_tools.browser = true %}
213
+ {%- elif tool == "python" %}
214
+ {%- set available_builtin_tools.python = true %}
215
+ {%- endif %}
216
+ {%- endfor %}
217
+ {{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }}
218
+ {%- endif -%}
219
+ {{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }}
220
+ {%- if tools -%}
221
+ {{- "\nCalls to these tools must go to the commentary channel: 'functions'." }}
222
+ {%- endif -%}
223
+ {%- endmacro -%}
224
+
225
+ {#- Main Template Logic ================================================= #}
226
+ {#- Set defaults #}
227
+
228
+ {#- Render system message #}
229
+ {{- "<|start|>system<|message|>" }}
230
+ {{- build_system_message() }}
231
+ {{- "<|end|>" }}
232
+
233
+ {#- Extract developer message #}
234
+ {%- if messages[0].role == "developer" or messages[0].role == "system" %}
235
+ {%- set developer_message = messages[0].content %}
236
+ {%- set loop_messages = messages[1:] %}
237
+ {%- else %}
238
+ {%- set developer_message = "" %}
239
+ {%- set loop_messages = messages %}
240
+ {%- endif %}
241
+
242
+ {#- Render developer message #}
243
+ {%- if developer_message or tools %}
244
+ {{- "<|start|>developer<|message|>" }}
245
+ {%- if developer_message %}
246
+ {{- "# Instructions\n\n" }}
247
+ {{- developer_message }}
248
+ {{- "\n\n" }}
249
+ {%- endif %}
250
+ {%- if tools -%}
251
+ {{- "# Tools\n\n" }}
252
+ {{- render_tool_namespace("functions", tools) }}
253
+ {%- endif -%}
254
+ {{- "<|end|>" }}
255
+ {%- endif %}
256
+
257
+ {#- Render messages #}
258
+ {%- set last_tool_call = namespace(name=none) %}
259
+ {%- for message in loop_messages -%}
260
+ {#- At this point only assistant/user/tool messages should remain #}
261
+ {%- if message.role == 'assistant' -%}
262
+ {#- Checks to ensure the messages are being passed in the format we expect #}
263
+ {%- if "content" in message %}
264
+ {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
265
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the content field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
266
+ {%- endif %}
267
+ {%- endif %}
268
+ {%- if "thinking" in message %}
269
+ {%- if "<|channel|>analysis<|message|>" in message.thinking or "<|channel|>final<|message|>" in message.thinking %}
270
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
271
+ {%- endif %}
272
+ {%- endif %}
273
+ {%- if "tool_calls" in message %}
274
+ {#- We need very careful handling here - we want to drop the tool call analysis message if the model #}
275
+ {#- has output a later <|final|> message, but otherwise we want to retain it. This is the only case #}
276
+ {#- when we render CoT/analysis messages in inference. #}
277
+ {%- set future_final_message = namespace(found=false) %}
278
+ {%- for future_message in loop_messages[loop.index:] %}
279
+ {%- if future_message.role == 'assistant' and "tool_calls" not in future_message %}
280
+ {%- set future_final_message.found = true %}
281
+ {%- endif %}
282
+ {%- endfor %}
283
+ {#- We assume max 1 tool call per message, and so we infer the tool call name #}
284
+ {#- in "tool" messages from the most recent assistant tool call name #}
285
+ {%- set tool_call = message.tool_calls[0] %}
286
+ {%- if tool_call.function %}
287
+ {%- set tool_call = tool_call.function %}
288
+ {%- endif %}
289
+ {%- if message.content and message.thinking %}
290
+ {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
291
+ {%- elif message.content and not future_final_message.found %}
292
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
293
+ {%- elif message.thinking and not future_final_message.found %}
294
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
295
+ {%- endif %}
296
+ {{- "<|start|>assistant to=" }}
297
+ {{- "functions." + tool_call.name + "<|channel|>commentary " }}
298
+ {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
299
+ {{- tool_call.arguments|tojson }}
300
+ {{- "<|call|>" }}
301
+ {%- set last_tool_call.name = tool_call.name %}
302
+ {%- elif loop.last and not add_generation_prompt %}
303
+ {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
304
+ {#- This is a situation that should only occur in training, never in inference. #}
305
+ {%- if "thinking" in message %}
306
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
307
+ {%- endif %}
308
+ {#- <|return|> indicates the end of generation, but <|end|> does not #}
309
+ {#- <|return|> should never be an input to the model, but we include it as the final token #}
310
+ {#- when training, so the model learns to emit it. #}
311
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
312
+ {%- else %}
313
+ {#- CoT is dropped during all previous turns, so we never render it for inference #}
314
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
315
+ {%- set last_tool_call.name = none %}
316
+ {%- endif %}
317
+ {%- elif message.role == 'tool' -%}
318
+ {%- if last_tool_call.name is none %}
319
+ {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
320
+ {%- endif %}
321
+ {{- "<|start|>functions." + last_tool_call.name }}
322
+ {{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }}
323
+ {%- elif message.role == 'user' -%}
324
+ {{- "<|start|>user<|message|>" + message.content + "<|end|>" }}
325
+ {%- endif -%}
326
+ {%- endfor -%}
327
+
328
+ {#- Generation prompt #}
329
+ {%- if add_generation_prompt -%}
330
+ <|start|>assistant
331
+ {%- endif -%}
hallucination_detection/gpt-oss-20b/lora/io.yaml ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model name string, or null to use whatever is provided in the chat completion request
2
+ model: ~
3
+ # JSON schema of the model's output
4
+ response_format: |
5
+ {
6
+ "$defs": {
7
+ "HallucinationOutputEntry": {
8
+ "properties": {
9
+ "r": {
10
+ "minimum": 0,
11
+ "title": "Sentence Num",
12
+ "type": "integer"
13
+ },
14
+ "f": {
15
+ "title": "Is Faithful",
16
+ "type": "string",
17
+ "enum": ["faithful", "partial", "unfaithful"]
18
+ },
19
+ "e": {
20
+ "title": "Reasoning",
21
+ "type": "string"
22
+ }
23
+ },
24
+ "required": [
25
+ "r",
26
+ "e",
27
+ "f"
28
+ ],
29
+ "title": "HallucinationOutputEntry",
30
+ "type": "object"
31
+ }
32
+ },
33
+ "items": {
34
+ "$ref": "#/$defs/HallucinationOutputEntry"
35
+ },
36
+ "title": "HallucinationOutput",
37
+ "type": "array"
38
+ }
39
+ transformations:
40
+ # Use logprobs to replace "f" flag with a probability
41
+ - type: likelihood
42
+ categories_to_values:
43
+ "faithful": 1.0
44
+ "partial": 0.5
45
+ "unfaithful": 0.0
46
+ input_path: [~, "f"] # Null in path means wildcard
47
+ # Replace sentence number with sentence location and contents
48
+ - type: decode_sentences
49
+ source: "last_message"
50
+ input_path: [~, "r"] # Null in path means wildcard
51
+ # New fields to add for each sentence
52
+ output_names:
53
+ begin: "response_begin"
54
+ end: "response_end"
55
+ text: "response_text"
56
+ # Remove fields that we no longer need and rename some of the fields.
57
+ - type: project
58
+ input_path: []
59
+ retained_fields:
60
+ "response_begin": "response_begin"
61
+ "response_end": "response_end"
62
+ "response_text": "response_text"
63
+ "f": "faithfulness_likelihood"
64
+ "e": "explanation"
65
+ instruction: >
66
+ Split the last assistant response into individual sentences.
67
+ For each sentence in the last assistant response, identify the faithfulness
68
+ by comparing with the provided documents and generate the faithfulness reasoning
69
+ and faithfulness decision.
70
+ Ensure that your output includes all response sentence IDs,
71
+ and for each response sentence ID, provide the corresponding faithfulness
72
+ reasoning and faithfulness decision.
73
+ The output must be a json structure.
74
+ parameters:
75
+ # Current LoRA can be quite verbose in its explanations.
76
+ max_completion_tokens: 4096
77
+ sentence_boundaries:
78
+ last_message: "i"
79
+
80
+ # gpt-oss base model has no "documents" argument
81
+ docs_as_message: json
hallucination_detection/gpt-oss-20b/lora/special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|return|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
hallucination_detection/gpt-oss-20b/lora/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0614fe83cadab421296e664e1f48f4261fa8fef6e03e63bb75c20f38e37d07d3
3
+ size 27868174
hallucination_detection/gpt-oss-20b/lora/tokenizer_config.json ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "199998": {
4
+ "content": "<|startoftext|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "199999": {
12
+ "content": "<|endoftext|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "200000": {
20
+ "content": "<|reserved_200000|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "200001": {
28
+ "content": "<|reserved_200001|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "200002": {
36
+ "content": "<|return|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "200003": {
44
+ "content": "<|constrain|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "200004": {
52
+ "content": "<|reserved_200004|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "200005": {
60
+ "content": "<|channel|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "200006": {
68
+ "content": "<|start|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "200007": {
76
+ "content": "<|end|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "200008": {
84
+ "content": "<|message|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "200009": {
92
+ "content": "<|reserved_200009|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "200010": {
100
+ "content": "<|reserved_200010|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "200011": {
108
+ "content": "<|reserved_200011|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "200012": {
116
+ "content": "<|call|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "200013": {
124
+ "content": "<|reserved_200013|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "200014": {
132
+ "content": "<|reserved_200014|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "200015": {
140
+ "content": "<|reserved_200015|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "200016": {
148
+ "content": "<|reserved_200016|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "200017": {
156
+ "content": "<|reserved_200017|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "200018": {
164
+ "content": "<|endofprompt|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ }
171
+ },
172
+ "bos_token": "<|startoftext|>",
173
+ "clean_up_tokenization_spaces": false,
174
+ "eos_token": "<|return|>",
175
+ "extra_special_tokens": {},
176
+ "model_input_names": [
177
+ "input_ids",
178
+ "attention_mask"
179
+ ],
180
+ "model_max_length": 1000000000000000019884624838656,
181
+ "pad_token": "<|endoftext|>",
182
+ "padding_side": "right",
183
+ "split_special_tokens": false,
184
+ "tokenizer_class": "PreTrainedTokenizerFast"
185
+ }
query_rewrite/README.md ADDED
The diff for this file is too large to render. See raw diff
 
query_rewrite/gpt-oss-20b/lora/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openai/gpt-oss-20b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "k_proj",
30
+ "v_proj"
31
+ ],
32
+ "target_parameters": [
33
+ "7.mlp.experts.gate_up_proj",
34
+ "7.mlp.experts.down_proj",
35
+ "15.mlp.experts.gate_up_proj",
36
+ "15.mlp.experts.down_proj",
37
+ "23.mlp.experts.gate_up_proj",
38
+ "23.mlp.experts.down_proj"
39
+ ],
40
+ "task_type": "CAUSAL_LM",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
query_rewrite/gpt-oss-20b/lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa15f38f74bc8f34b42bd252cc9b557455bcb33370bec17ea3cd38305d6acc0b
3
+ size 219238968
query_rewrite/gpt-oss-20b/lora/chat_template.jinja ADDED
@@ -0,0 +1,397 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#-
2
+ In addition to the normal inputs of `messages` and `tools`, this template also accepts the
3
+ following kwargs:
4
+ - "builtin_tools": A list, can contain "browser" and/or "python".
5
+ - "model_identity": A string that optionally describes the model identity.
6
+ - "reasoning_effort": A string that describes the reasoning effort, defaults to "medium".
7
+ #}
8
+
9
+ {#- Tool Definition Rendering ============================================== #}
10
+ {%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%}
11
+ {%- if param_spec.type == "array" -%}
12
+ {%- if param_spec['items'] -%}
13
+ {%- if param_spec['items']['type'] == "string" -%}
14
+ {{- "string[]" }}
15
+ {%- elif param_spec['items']['type'] == "number" -%}
16
+ {{- "number[]" }}
17
+ {%- elif param_spec['items']['type'] == "integer" -%}
18
+ {{- "number[]" }}
19
+ {%- elif param_spec['items']['type'] == "boolean" -%}
20
+ {{- "boolean[]" }}
21
+ {%- else -%}
22
+ {%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%}
23
+ {%- if inner_type == "object | object" or inner_type|length > 50 -%}
24
+ {{- "any[]" }}
25
+ {%- else -%}
26
+ {{- inner_type + "[]" }}
27
+ {%- endif -%}
28
+ {%- endif -%}
29
+ {%- if param_spec.nullable -%}
30
+ {{- " | null" }}
31
+ {%- endif -%}
32
+ {%- else -%}
33
+ {{- "any[]" }}
34
+ {%- if param_spec.nullable -%}
35
+ {{- " | null" }}
36
+ {%- endif -%}
37
+ {%- endif -%}
38
+ {%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%}
39
+ {#- Handle array of types like ["object", "object"] from Union[dict, list] #}
40
+ {%- if param_spec.type | length > 1 -%}
41
+ {{- param_spec.type | join(" | ") }}
42
+ {%- else -%}
43
+ {{- param_spec.type[0] }}
44
+ {%- endif -%}
45
+ {%- elif param_spec.oneOf -%}
46
+ {#- Handle oneOf schemas - check for complex unions and fallback to any #}
47
+ {%- set has_object_variants = false -%}
48
+ {%- for variant in param_spec.oneOf -%}
49
+ {%- if variant.type == "object" -%}
50
+ {%- set has_object_variants = true -%}
51
+ {%- endif -%}
52
+ {%- endfor -%}
53
+ {%- if has_object_variants and param_spec.oneOf|length > 1 -%}
54
+ {{- "any" }}
55
+ {%- else -%}
56
+ {%- for variant in param_spec.oneOf -%}
57
+ {{- render_typescript_type(variant, required_params) -}}
58
+ {%- if variant.description %}
59
+ {{- "// " + variant.description }}
60
+ {%- endif -%}
61
+ {%- if variant.default is defined %}
62
+ {{ "// default: " + variant.default|tojson }}
63
+ {%- endif -%}
64
+ {%- if not loop.last %}
65
+ {{- " | " }}
66
+ {% endif -%}
67
+ {%- endfor -%}
68
+ {%- endif -%}
69
+ {%- elif param_spec.type == "string" -%}
70
+ {%- if param_spec.enum -%}
71
+ {{- '"' + param_spec.enum|join('" | "') + '"' -}}
72
+ {%- else -%}
73
+ {{- "string" }}
74
+ {%- if param_spec.nullable %}
75
+ {{- " | null" }}
76
+ {%- endif -%}
77
+ {%- endif -%}
78
+ {%- elif param_spec.type == "number" -%}
79
+ {{- "number" }}
80
+ {%- elif param_spec.type == "integer" -%}
81
+ {{- "number" }}
82
+ {%- elif param_spec.type == "boolean" -%}
83
+ {{- "boolean" }}
84
+
85
+ {%- elif param_spec.type == "object" -%}
86
+ {%- if param_spec.properties -%}
87
+ {{- "{
88
+ " }}
89
+ {%- for prop_name, prop_spec in param_spec.properties.items() -%}
90
+ {{- prop_name -}}
91
+ {%- if prop_name not in (param_spec.required or []) -%}
92
+ {{- "?" }}
93
+ {%- endif -%}
94
+ {{- ": " }}
95
+ {{ render_typescript_type(prop_spec, param_spec.required or []) }}
96
+ {%- if not loop.last -%}
97
+ {{-", " }}
98
+ {%- endif -%}
99
+ {%- endfor -%}
100
+ {{- "}" }}
101
+ {%- else -%}
102
+ {{- "object" }}
103
+ {%- endif -%}
104
+ {%- else -%}
105
+ {{- "any" }}
106
+ {%- endif -%}
107
+ {%- endmacro -%}
108
+
109
+ {%- macro render_tool_namespace(namespace_name, tools) -%}
110
+ {{- "## " + namespace_name + "
111
+
112
+ " }}
113
+ {{- "namespace " + namespace_name + " {
114
+
115
+ " }}
116
+ {%- for tool in tools %}
117
+ {%- set tool = tool.function %}
118
+ {{- "// " + tool.description + "
119
+ " }}
120
+ {{- "type "+ tool.name + " = " }}
121
+ {%- if tool.parameters and tool.parameters.properties %}
122
+ {{- "(_: {
123
+ " }}
124
+ {%- for param_name, param_spec in tool.parameters.properties.items() %}
125
+ {%- if param_spec.description %}
126
+ {{- "// " + param_spec.description + "
127
+ " }}
128
+ {%- endif %}
129
+ {{- param_name }}
130
+ {%- if param_name not in (tool.parameters.required or []) -%}
131
+ {{- "?" }}
132
+ {%- endif -%}
133
+ {{- ": " }}
134
+ {{- render_typescript_type(param_spec, tool.parameters.required or []) }}
135
+ {%- if param_spec.default is defined -%}
136
+ {%- if param_spec.enum %}
137
+ {{- ", // default: " + param_spec.default }}
138
+ {%- elif param_spec.oneOf %}
139
+ {{- "// default: " + param_spec.default }}
140
+ {%- else %}
141
+ {{- ", // default: " + param_spec.default|tojson }}
142
+ {%- endif -%}
143
+ {%- endif -%}
144
+ {%- if not loop.last %}
145
+ {{- ",
146
+ " }}
147
+ {%- else %}
148
+ {{- "
149
+ " }}
150
+ {%- endif -%}
151
+ {%- endfor %}
152
+ {{- "}) => any;
153
+
154
+ " }}
155
+ {%- else -%}
156
+ {{- "() => any;
157
+
158
+ " }}
159
+ {%- endif -%}
160
+ {%- endfor %}
161
+ {{- "} // namespace " + namespace_name }}
162
+ {%- endmacro -%}
163
+
164
+ {%- macro render_builtin_tools(browser_tool, python_tool) -%}
165
+ {%- if browser_tool %}
166
+ {{- "## browser
167
+
168
+ " }}
169
+ {{- "// Tool for browsing.
170
+ " }}
171
+ {{- "// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.
172
+ " }}
173
+ {{- "// Cite information from the tool using the following format:
174
+ " }}
175
+ {{- "// `【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`.
176
+ " }}
177
+ {{- "// Do not quote more than 10 words directly from the tool output.
178
+ " }}
179
+ {{- "// sources=web (default: web)
180
+ " }}
181
+ {{- "namespace browser {
182
+
183
+ " }}
184
+ {{- "// Searches for information related to `query` and displays `topn` results.
185
+ " }}
186
+ {{- "type search = (_: {
187
+ " }}
188
+ {{- "query: string,
189
+ " }}
190
+ {{- "topn?: number, // default: 10
191
+ " }}
192
+ {{- "source?: string,
193
+ " }}
194
+ {{- "}) => any;
195
+
196
+ " }}
197
+ {{- "// Opens the link `id` from the page indicated by `cursor` starting at line number `loc`, showing `num_lines` lines.
198
+ " }}
199
+ {{- "// Valid link ids are displayed with the formatting: `【{id}†.*】`.
200
+ " }}
201
+ {{- "// If `cursor` is not provided, the most recent page is implied.
202
+ " }}
203
+ {{- "// If `id` is a string, it is treated as a fully qualified URL associated with `source`.
204
+ " }}
205
+ {{- "// If `loc` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.
206
+ " }}
207
+ {{- "// Use this function without `id` to scroll to a new location of an opened page.
208
+ " }}
209
+ {{- "type open = (_: {
210
+ " }}
211
+ {{- "id?: number | string, // default: -1
212
+ " }}
213
+ {{- "cursor?: number, // default: -1
214
+ " }}
215
+ {{- "loc?: number, // default: -1
216
+ " }}
217
+ {{- "num_lines?: number, // default: -1
218
+ " }}
219
+ {{- "view_source?: boolean, // default: false
220
+ " }}
221
+ {{- "source?: string,
222
+ " }}
223
+ {{- "}) => any;
224
+
225
+ " }}
226
+ {{- "// Finds exact matches of `pattern` in the current page, or the page given by `cursor`.
227
+ " }}
228
+ {{- "type find = (_: {
229
+ " }}
230
+ {{- "pattern: string,
231
+ " }}
232
+ {{- "cursor?: number, // default: -1
233
+ " }}
234
+ {{- "}) => any;
235
+
236
+ " }}
237
+ {{- "} // namespace browser
238
+
239
+ " }}
240
+ {%- endif -%}
241
+
242
+ {%- if python_tool %}
243
+ {{- "## python
244
+
245
+ " }}
246
+ {{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).
247
+
248
+ " }}
249
+ {{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.
250
+
251
+ " }}
252
+ {%- endif -%}
253
+ {%- endmacro -%}
254
+
255
+ {#- System Message Construction ============================================ #}
256
+ {%- macro build_system_message() -%}
257
+ {%- if model_identity is not defined %}
258
+ {%- set model_identity = "You are ChatGPT, a large language model trained by OpenAI." %}
259
+ {%- endif %}
260
+ {{- model_identity + "
261
+ " }}
262
+ {{- "Knowledge cutoff: 2024-06
263
+ " }}
264
+ {{- "Current date: " + strftime_now("%Y-%m-%d") + "
265
+
266
+ " }}
267
+ {%- if reasoning_effort is not defined %}
268
+ {%- set reasoning_effort = "medium" %}
269
+ {%- endif %}
270
+ {{- "Reasoning: " + reasoning_effort + "
271
+
272
+ " }}
273
+ {%- if builtin_tools %}
274
+ {{- "# Tools
275
+
276
+ " }}
277
+ {%- set available_builtin_tools = namespace(browser=false, python=false) %}
278
+ {%- for tool in builtin_tools %}
279
+ {%- if tool == "browser" %}
280
+ {%- set available_builtin_tools.browser = true %}
281
+ {%- elif tool == "python" %}
282
+ {%- set available_builtin_tools.python = true %}
283
+ {%- endif %}
284
+ {%- endfor %}
285
+ {{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }}
286
+ {%- endif -%}
287
+ {{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }}
288
+ {%- if tools -%}
289
+ {{- "
290
+ Calls to these tools must go to the commentary channel: 'functions'." }}
291
+ {%- endif -%}
292
+ {%- endmacro -%}
293
+
294
+ {#- Main Template Logic ================================================= #}
295
+ {#- Set defaults #}
296
+
297
+ {#- Render system message #}
298
+ {{- "<|start|>system<|message|>" }}
299
+ {{- build_system_message() }}
300
+ {{- "<|end|>" }}
301
+
302
+ {#- Extract developer message #}
303
+ {%- if messages[0].role == "developer" or messages[0].role == "system" %}
304
+ {%- set developer_message = messages[0].content %}
305
+ {%- set loop_messages = messages[1:] %}
306
+ {%- else %}
307
+ {%- set developer_message = "" %}
308
+ {%- set loop_messages = messages %}
309
+ {%- endif %}
310
+
311
+ {#- Render developer message #}
312
+ {%- if developer_message or tools %}
313
+ {{- "<|start|>developer<|message|>" }}
314
+ {%- if developer_message %}
315
+ {{- "# Instructions
316
+
317
+ " }}
318
+ {{- developer_message }}
319
+ {%- endif %}
320
+ {%- if tools -%}
321
+ {{- "
322
+
323
+ " }}
324
+ {{- "# Tools
325
+
326
+ " }}
327
+ {{- render_tool_namespace("functions", tools) }}
328
+ {%- endif -%}
329
+ {{- "<|end|>" }}
330
+ {%- endif %}
331
+
332
+ {#- Render messages #}
333
+ {%- set last_tool_call = namespace(name=none) %}
334
+ {%- for message in loop_messages -%}
335
+ {#- At this point only assistant/user/tool messages should remain #}
336
+ {%- if message.role == 'assistant' -%}
337
+ {#- Checks to ensure the messages are being passed in the format we expect #}
338
+ {%- if "content" in message %}
339
+ {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
340
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the content field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
341
+ {%- endif %}
342
+ {%- endif %}
343
+ {%- if "thinking" in message %}
344
+ {%- if "<|channel|>analysis<|message|>" in message.thinking or "<|channel|>final<|message|>" in message.thinking %}
345
+ {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
346
+ {%- endif %}
347
+ {%- endif %}
348
+ {%- if "tool_calls" in message %}
349
+ {#- We assume max 1 tool call per message, and so we infer the tool call name #}
350
+ {#- in "tool" messages from the most recent assistant tool call name #}
351
+ {%- set tool_call = message.tool_calls[0] %}
352
+ {%- if tool_call.function %}
353
+ {%- set tool_call = tool_call.function %}
354
+ {%- endif %}
355
+ {%- if message.content and message.thinking %}
356
+ {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
357
+ {%- elif message.content %}
358
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
359
+ {%- elif message.thinking %}
360
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
361
+ {%- endif %}
362
+ {{- "<|start|>assistant to=" }}
363
+ {{- "functions." + tool_call.name + "<|channel|>commentary " }}
364
+ {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
365
+ {{- tool_call.arguments|tojson }}
366
+ {{- "<|call|>" }}
367
+ {%- set last_tool_call.name = tool_call.name %}
368
+ {%- elif loop.last and not add_generation_prompt %}
369
+ {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
370
+ {#- This is a situation that should only occur in training, never in inference. #}
371
+ {%- if "thinking" in message %}
372
+ {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
373
+ {%- endif %}
374
+ {#- <|return|> indicates the end of generation, but <|end|> does not #}
375
+ {#- <|return|> should never be an input to the model, but we include it as the final token #}
376
+ {#- when training, so the model learns to emit it. #}
377
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
378
+ {%- else %}
379
+ {#- CoT is dropped during all previous turns, so we never render it for inference #}
380
+ {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
381
+ {%- set last_tool_call.name = none %}
382
+ {%- endif %}
383
+ {%- elif message.role == 'tool' -%}
384
+ {%- if last_tool_call.name is none %}
385
+ {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
386
+ {%- endif %}
387
+ {{- "<|start|>functions." + last_tool_call.name }}
388
+ {{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }}
389
+ {%- elif message.role == 'user' -%}
390
+ {{- "<|start|>user<|message|>" + message.content + "<|end|>" }}
391
+ {%- endif -%}
392
+ {%- endfor -%}
393
+
394
+ {#- Generation prompt #}
395
+ {%- if add_generation_prompt -%}
396
+ <|start|>assistant
397
+ {%- endif -%}
query_rewrite/gpt-oss-20b/lora/io.yaml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model name string, or null to use whatever is provided in the chat completion request
2
+ model: ~
3
+ # JSON schema of the model's output
4
+ response_format: |
5
+ {
6
+ "properties": {
7
+ "rewritten_question": {
8
+ "title": "Rewritten Question",
9
+ "type": "string"
10
+ }
11
+ },
12
+ "required": [
13
+ "rewritten_question"
14
+ ],
15
+ "title": "QueryRewriteOutput",
16
+ "type": "object"
17
+ }
18
+ transformations: ~
19
+ instruction: ~
20
+ parameters:
21
+ max_completion_tokens: 1024
22
+ sentence_boundaries: false
query_rewrite/gpt-oss-20b/lora/special_tokens_map.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|return|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|return|>"
17
+ }
query_rewrite/gpt-oss-20b/lora/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0614fe83cadab421296e664e1f48f4261fa8fef6e03e63bb75c20f38e37d07d3
3
+ size 27868174
query_rewrite/gpt-oss-20b/lora/tokenizer_config.json ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "199998": {
4
+ "content": "<|startoftext|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "199999": {
12
+ "content": "<|endoftext|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "200000": {
20
+ "content": "<|reserved_200000|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "200001": {
28
+ "content": "<|reserved_200001|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "200002": {
36
+ "content": "<|return|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "200003": {
44
+ "content": "<|constrain|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "200004": {
52
+ "content": "<|reserved_200004|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "200005": {
60
+ "content": "<|channel|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "200006": {
68
+ "content": "<|start|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "200007": {
76
+ "content": "<|end|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "200008": {
84
+ "content": "<|message|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "200009": {
92
+ "content": "<|reserved_200009|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "200010": {
100
+ "content": "<|reserved_200010|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "200011": {
108
+ "content": "<|reserved_200011|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "200012": {
116
+ "content": "<|call|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "200013": {
124
+ "content": "<|reserved_200013|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "200014": {
132
+ "content": "<|reserved_200014|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "200015": {
140
+ "content": "<|reserved_200015|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "200016": {
148
+ "content": "<|reserved_200016|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "200017": {
156
+ "content": "<|reserved_200017|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "200018": {
164
+ "content": "<|endofprompt|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ }
171
+ },
172
+ "bos_token": "<|startoftext|>",
173
+ "clean_up_tokenization_spaces": false,
174
+ "eos_token": "<|return|>",
175
+ "extra_special_tokens": {},
176
+ "model_input_names": [
177
+ "input_ids",
178
+ "attention_mask"
179
+ ],
180
+ "model_max_length": 1000000000000000019884624838656,
181
+ "pad_token": "<|return|>",
182
+ "padding_side": "left",
183
+ "tokenizer_class": "PreTrainedTokenizerFast"
184
+ }
run_vllm.sh ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /bin/bash
2
+
3
+ ################################################################################
4
+ # Shell script that starts a copy of vLLM with a base model plus all the
5
+ # available LoRA adapters in this repository.
6
+ #
7
+ # To run this script:
8
+ # 1. Install an appropriate build of vLLM for your machine (`pip install vllm`)
9
+ # 2. Install the Hugging Face CLI (`pip install -U "huggingface_hub[cli]"`)
10
+ # 3. Download the intrinsics library by running:
11
+ # hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
12
+ # 4. Edit the constants BASE_MODEL_NAME, BASE_MODEL_ORG, and PORT as needed
13
+ # 5. Run this script from the root of your local copy of rag-intrinsics-lib.
14
+ ################################################################################
15
+
16
+ BASE_MODEL_NAME=gpt-oss-20b
17
+ BASE_MODEL_ORG=openai
18
+ PORT=55555
19
+
20
+ export VLLM_API_KEY=rag_intrinsics_1234
21
+
22
+ # Find all LoRA adapters for the target base model.
23
+ LORAS=""
24
+ for item in "."/*; do
25
+ # Remove the "./"
26
+ name=$(basename -- "${item}")
27
+ if [ -d "./${name}/${BASE_MODEL_NAME}/lora" ]; then
28
+ LORAS+="${name}=./${name}/${BASE_MODEL_NAME}/lora "
29
+ fi
30
+ done
31
+
32
+
33
+ CMD="vllm serve ${BASE_MODEL_ORG}/${BASE_MODEL_NAME} \
34
+ --port ${PORT} \
35
+ --gpu-memory-utilization 0.45 \
36
+ --max-model-len 8192 \
37
+ --enable-lora \
38
+ --max_lora_rank 64 \
39
+ --lora-modules $LORAS"
40
+
41
+ echo $CMD
42
+ $CMD
43
+
44
+
45
+