โWe are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch. โDeveloped at SKT AI LABS, this corpus is not just a collection of data; itโs a mission to decentralize high-grade AI training for regional languages and global knowledge.
โ๐ Key Highlights:
โโขโข Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.
โขโข โPure Quality: Curated from 500+ Elite Sources
โขโข โStructured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-๐ป series) for seamless distributed training.
โ๐ค Open for Collaboration!
โWe are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture designโletโs build the future together.
AMD summer hackathons are here! A chance to get hands-on with MI300X GPUs and accelerate models. ๐ซ๐ท Paris - Station F - July 5-6 ๐ฎ๐ณ Mumbai - July 12-13 ๐ฎ๐ณ Bengaluru - July 19-20
Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm
Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!
Enterprise orgs now enable serverless Inference Providers for all members - includes $2 free usage per org member (e.g. an Enterprise org with 1,000 members share $2,000 free credit each month) - admins can set a monthly spend limit for the entire org - works today with Together, fal, Novita, Cerebras and HF Inference.