paper/metaAI Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 45
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 45
models/misc Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 20.1k • 38 si-pbc/hertz-dev Audio-to-Audio • Updated Nov 14, 2024 • 215 microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 1.93k • 461
Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 20.1k • 38
paper/misc Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56 Stealing Part of a Production Language Model Paper • 2403.06634 • Published Mar 11, 2024 • 91 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
dataset/text HuggingFaceFW/fineweb Viewer • Updated Jul 11, 2025 • 52.5B • 582k • 2.78k microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 1.93k • 461 ai4bharat/SeamlessAlign Viewer • Updated Nov 15, 2024 • 3.01M • 346 • 6
paper/metaAI Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 45
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 45
paper/misc Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56 Stealing Part of a Production Language Model Paper • 2403.06634 • Published Mar 11, 2024 • 91 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
models/misc Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 20.1k • 38 si-pbc/hertz-dev Audio-to-Audio • Updated Nov 14, 2024 • 215 microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 1.93k • 461
Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 20.1k • 38
dataset/text HuggingFaceFW/fineweb Viewer • Updated Jul 11, 2025 • 52.5B • 582k • 2.78k microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 1.93k • 461 ai4bharat/SeamlessAlign Viewer • Updated Nov 15, 2024 • 3.01M • 346 • 6