Lower acceptance rate on tool-calling prompts compared to EAGLE-3

#6
by laixiaohang - opened

Hi, I've tested DFlash on my own dataset and found its performance is comparable to or slightly worse than EAGLE-3. My prompts are mainly tool-calling / function-calling related.

Is tool-calling a known weak spot for the current checkpoint? Are there plans to improve this scenario (e.g., training on agent/tool-use data)?
Thanks!

Yeah, this model wasn't trained on tool-calling data. Collecting tool-calling data is a little bit difficult and slow as we need to run in real environment and collect multi-round interactions. We will try to collect some tool-calling and agent data for the Kimi-K2.6 training, which should help improve the performance in agentic tasks.

Got it, looking forward to it!

laixiaohang changed discussion status to closed

Sign up or log in to comment