Add files using upload-large-folder tool
Browse files- .gitattributes +1 -0
- README.md +6 -0
- plots/eval_loss_all_folds.png +3 -0
- plots/loss_curves_fold_1.png +0 -0
- plots/token_accuracy_curves_fold_1.png +0 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
plots/eval_loss_all_folds.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -79,14 +79,20 @@ The model was fine-tuned using **QLoRA** (4-bit quantization with LoRA) and then
|
|
| 79 |
|
| 80 |
### Evaluation Loss — All Folds
|
| 81 |
|
|
|
|
|
|
|
| 82 |
All five folds show consistent and monotonically decreasing evaluation loss throughout training. By step 200, every fold converges to a final eval loss in the range of **0.19–0.24**, demonstrating stable learning without signs of overfitting across different data splits.
|
| 83 |
|
| 84 |
### Loss Curves — Fold 1
|
| 85 |
|
|
|
|
|
|
|
| 86 |
For Fold 1 (the best-performing fold), training loss drops steeply from ~2.4 at initialization and quickly converges near the evaluation loss by step 10. Both train and eval loss then decrease together steadily, with no divergence — indicating no overfitting.
|
| 87 |
|
| 88 |
### Token Accuracy — Fold 1
|
| 89 |
|
|
|
|
|
|
|
| 90 |
Token-level accuracy for Fold 1 climbs from ~0.51 at the start to **~0.94** by the final step. Train and eval accuracy track each other closely throughout, with eval accuracy slightly above train accuracy in the later steps.
|
| 91 |
|
| 92 |
## Usage
|
|
|
|
| 79 |
|
| 80 |
### Evaluation Loss — All Folds
|
| 81 |
|
| 82 |
+

|
| 83 |
+
|
| 84 |
All five folds show consistent and monotonically decreasing evaluation loss throughout training. By step 200, every fold converges to a final eval loss in the range of **0.19–0.24**, demonstrating stable learning without signs of overfitting across different data splits.
|
| 85 |
|
| 86 |
### Loss Curves — Fold 1
|
| 87 |
|
| 88 |
+

|
| 89 |
+
|
| 90 |
For Fold 1 (the best-performing fold), training loss drops steeply from ~2.4 at initialization and quickly converges near the evaluation loss by step 10. Both train and eval loss then decrease together steadily, with no divergence — indicating no overfitting.
|
| 91 |
|
| 92 |
### Token Accuracy — Fold 1
|
| 93 |
|
| 94 |
+

|
| 95 |
+
|
| 96 |
Token-level accuracy for Fold 1 climbs from ~0.51 at the start to **~0.94** by the final step. Train and eval accuracy track each other closely throughout, with eval accuracy slightly above train accuracy in the later steps.
|
| 97 |
|
| 98 |
## Usage
|
plots/eval_loss_all_folds.png
ADDED
|
Git LFS Details
|
plots/loss_curves_fold_1.png
ADDED
|
plots/token_accuracy_curves_fold_1.png
ADDED
|