Glad to see
Ujjwal Tyagi
AI & ML interests
Recent Activity
Organizations
what is your model precision you are training on? BF16 or FP8?
I think the most likely reason you're seeing R1 hit 100% constantly is some training/validation overlap. When you moved from the 200k set to the 500k set, a portion of the validation samples probably ended up inside the training pool, so the model is essentially seeing the answers beforehand and memorizing them.
The best way to fix, is would be to rebuild the validation split from a completely separate dataset (or at least re-split the full dataset with strict deduplication so no caption/image pairs appear in both sets). Once the validation set is clean and never seen during training, the recall numbers should drop to something more realistic and you'll get a proper measure of generalization.
this happens to me many times..as model starts memorizing the training data, I used this common methods:
Dropout: randomly turns off some neurons during training so the model doesn’t rely on the same paths and just memorize the data.
Weight decay (L2 regularization): slightly penalizes big weights so the model learns simpler patterns instead of fitting exact samples.
Data augmentation: adds small variations to the data (image crops, jitter, caption noise) so the model sees slightly different versions instead of the exact same inputs.
Label smoothing: stops the model from being overly confident about the “correct” answer, which helps reduce memorization.
Early stopping: you stop training once validation stops improving so the model doesn’t keep training and start memorizing.
Hard negative mining: give the model harder wrong examples so it actually learns the differences instead of just remembering pairs.
Wow, amazing
Which hardware you are using to train that model, and if you ever release the distilled data of 5 berts teacher models that is also really helpful
That's great! Keep doing the work :)
Where is that model?
Oh wow
Interesting
that's good to hear but all of those guys are doing distillations at large scale of both open and closed source models, so like it's very common, but still these chinese model having too much censorships and full of chinese propagandas, so it is worthless to make them as a base model, but they are good for distillation anyway ;)
thanks a lot, I can now understand..how you did
nice work!
On this platform, you can create a "soulmate" that matches your personality, and configure it with the skills, tools you want it to have, as well as the platforms it can integrate with (such as Telegram, Discord, etc.).
You can even create group chats, invite multiple agents and your friends to chat about recent events, discuss projects together, and so on.
On the one hand, I hope it can better accompany you in daily life by virtue of its unique memory mechanism, self-feedback and iteration mechanism, and the modeling of users' emotions. On the other hand, I also hope it can help you better handle your work with its unique skills, tools and ability to deal with complex task scenarios.
Although the entire product has taken shape, I think there are still many areas that need adjustment and optimization. I also hope to rely on the strength of the community to do a good job in AI emotional companionship.
This is the project introduction URL: https://opensoul-web.vercel.app
This is the GitHub project URL: https://github.com/NJX-njx/opensoul
@AdinaY @lilianweng@burtenshaw@clem
let's just do it
So like if you provide guidelines, instructions, process on how you write custom parsing, etc that would be really helpful
We can't trust Anthropic, even they have stolen pirated data from Lib Genesis and thousands of copyrighted songs, our data is not safe
Oh wow, looks interesting
So like, It can be good for training training vision language models, also helpful for robotics...too and also good for training text to image models, nice work! I wanted to know that how you collect these kind of amazing datasets?
934,191 image records index Eastern Europe and Northern Asia. Temporal links map historical views at identical coordinates across nine years.
Key Stats:
- 905,940 unique images
- Coverage spanning 2016 to 2025
- Average 14.3 historical links per location
Geographic bounds span 20.49° E to 152.32° E. Urban centers show higher data density.