Read this before making a chat model, you will pull your hair out otherwise

by qikp - opened Mar 2

Mar 2

I noticed that this model lacks a generation_config.json. One of the things it dictates is the stop token. For a standard text completion this is usually fine. But if you train a chat model using this, you'll notice that it never stops, and in some cases, it may even look like it never predicted the EOS token. This has made me freak out for multiple days.

The fix is really simple though. In your training pipeline, simply add model.generation_config.eos_token_id = tokenizer.eos_token_id and that should fix this.

Since Cerebras explicitly mentions fine-tuning this to be a chat model as a use case, and people could've been freaking out, I decided to note this.

qikp changed discussion title from In order to make this a chat model to Read this before making a chat model, you will pull your hair out otherwise Mar 2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment