Text Generation
Transformers
PyTorch
English
gpt2
causal-lm
text-generation-inference

Read this before making a chat model, you will pull your hair out otherwise

#7
by qikp - opened

I noticed that this model lacks a generation_config.json. One of the things it dictates is the stop token. For a standard text completion this is usually fine. But if you train a chat model using this, you'll notice that it never stops, and in some cases, it may even look like it never predicted the EOS token. This has made me freak out for multiple days.

The fix is really simple though. In your training pipeline, simply add model.generation_config.eos_token_id = tokenizer.eos_token_id and that should fix this.

Since Cerebras explicitly mentions fine-tuning this to be a chat model as a use case, and people could've been freaking out, I decided to note this.

qikp changed discussion title from In order to make this a chat model to Read this before making a chat model, you will pull your hair out otherwise

Sign up or log in to comment