Attention Is All You Need
Paper
• 1706.03762
• Published • 115
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
• 1907.11692
• Published • 10
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published • 22
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published • 17
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
• 2101.03961
• Published • 13
Finetuned Language Models Are Zero-Shot Learners
Paper
• 2109.01652
• Published • 5
Multitask Prompted Training Enables Zero-Shot Task Generalization
Paper
• 2110.08207
• Published • 2
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
• 2112.06905
• Published • 2
Scaling Language Models: Methods, Analysis & Insights from Training
Gopher
Paper
• 2112.11446
• Published • 1
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published • 15
LaMDA: Language Models for Dialog Applications
Paper
• 2201.08239
• Published • 5
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
Large-Scale Generative Language Model
Paper
• 2201.11990
• Published • 1
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published • 24
PaLM: Scaling Language Modeling with Pathways
Paper
• 2204.02311
• Published • 3
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published • 11
OPT: Open Pre-trained Transformer Language Models
Paper
• 2205.01068
• Published • 2
UL2: Unifying Language Learning Paradigms
Paper
• 2205.05131
• Published • 5
Language Models are General-Purpose Interfaces
Paper
• 2206.06336
• Published • 1
Improving alignment of dialogue agents via targeted human judgements
Paper
• 2209.14375
• Published
Scaling Instruction-Finetuned Language Models
Paper
• 2210.11416
• Published • 8
GLM-130B: An Open Bilingual Pre-trained Model
Paper
• 2210.02414
• Published • 3
Holistic Evaluation of Language Models
Paper
• 2211.09110
• Published • 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published • 37
Galactica: A Large Language Model for Science
Paper
• 2211.09085
• Published • 4
OPT-IML: Scaling Language Model Instruction Meta Learning through the
Lens of Generalization
Paper
• 2212.12017
• Published • 1
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published • 10
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published • 21
PaLM-E: An Embodied Multimodal Language Model
Paper
• 2303.03378
• Published
Paper
• 2303.08774
• Published • 7
Pythia: A Suite for Analyzing Large Language Models Across Training and
Scaling
Paper
• 2304.01373
• Published • 9
Paper
• 2305.10403
• Published • 8
RWKV: Reinventing RNNs for the Transformer Era
Paper
• 2305.13048
• Published • 21
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 250
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper
• 2306.02707
• Published • 51
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published • 89
Paper
• 2310.06825
• Published • 58
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper
• 2310.09199
• Published • 28
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published • 123
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
• 2310.17680
• Published • 74
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published • 51
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Paper
• 2311.16079
• Published • 19
SeaLLMs -- Large Language Models for Southeast Asia
Paper
• 2312.00738
• Published • 25
Kandinsky 3.0 Technical Report
Paper
• 2312.03511
• Published • 45
Large Language Models for Mathematicians
Paper
• 2312.04556
• Published • 12
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper
• 2309.03852
• Published • 45
Paper
• 2309.03450
• Published • 8
Baichuan 2: Open Large-scale Language Models
Paper
• 2309.10305
• Published • 22
Paper
• 2309.16609
• Published • 38
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
Pre-trained from Scratch
Paper
• 2309.10706
• Published • 17
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
• 2310.09478
• Published • 21
Position-Enhanced Visual Instruction Tuning for Multimodal Large
Language Models
Paper
• 2308.13437
• Published • 4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Paper
• 2308.12067
• Published • 4
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
• 2310.17631
• Published • 35
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper
• 2311.00272
• Published • 11
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
• 2311.00176
• Published • 9
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Paper
• 2310.06266
• Published • 2
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper
• 2312.04724
• Published • 21
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
Generative Multimodal Models are In-Context Learners
Paper
• 2312.13286
• Published • 36
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published • 29
Unsupervised Cross-lingual Representation Learning at Scale
Paper
• 1911.02116
• Published • 4
YAYI 2: Multilingual Open-Source Large Language Models
Paper
• 2312.14862
• Published • 14
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
• 2312.12682
• Published • 9
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 48
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
• 2312.06550
• Published • 57
WizardLM: Empowering Large Language Models to Follow Complex
Instructions
Paper
• 2304.12244
• Published • 13
The Falcon Series of Open Language Models
Paper
• 2311.16867
• Published • 14
Clinical Camel: An Open-Source Expert-Level Medical Language Model with
Dialogue-Based Knowledge Encoding
Paper
• 2305.12031
• Published • 5
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical
Domain Knowledge
Paper
• 2303.14070
• Published • 10
LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day
Paper
• 2306.00890
• Published • 14
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical
Knowledge Graph Insights
Paper
• 2311.16075
• Published • 6
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained
Language Model
Paper
• 2311.11564
• Published • 1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
Regime and Better Alignment to Human Preferences
Paper
• 2311.06025
• Published • 1
BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations
Paper
• 2310.07276
• Published • 5
BIOptimus: Pre-training an Optimal Biomedical Language Model with
Curriculum Learning for Named Entity Recognition
Paper
• 2308.08625
• Published • 2
BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed
Search Logs for Zero-shot Biomedical Information Retrieval
Paper
• 2307.00589
• Published • 1
Radiology-GPT: A Large Language Model for Radiology
Paper
• 2306.08666
• Published • 2
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained
Transformer for Vision, Language, and Multimodal Tasks
Paper
• 2305.17100
• Published • 2
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via
Generative Data Augmentation
Paper
• 2305.07804
• Published • 2
Llemma: An Open Language Model For Mathematics
Paper
• 2310.10631
• Published • 57
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published • 11
Skywork: A More Open Bilingual Foundation Model
Paper
• 2310.19341
• Published • 6
SkyMath: Technical Report
Paper
• 2310.16713
• Published • 2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
• 2309.12284
• Published • 19
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
• 2311.08552
• Published • 8
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
• 2312.11370
• Published • 20
Language Is Not All You Need: Aligning Perception with Language Models
Paper
• 2302.14045
• Published
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse
Heterogeneous Computing
Paper
• 2303.10845
• Published • 3
BloombergGPT: A Large Language Model for Finance
Paper
• 2303.17564
• Published • 31
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Paper
• 2304.14454
• Published
StarCoder: may the source be with you!
Paper
• 2305.06161
• Published • 33
OctoPack: Instruction Tuning Code Large Language Models
Paper
• 2308.07124
• Published • 32
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
• 2312.16862
• Published • 31
GeoGalactica: A Scientific Large Language Model in Geoscience
Paper
• 2401.00434
• Published • 9
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published • 53
Paper
• 2401.04088
• Published • 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published • 74
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published • 59
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper
• 2306.08568
• Published • 33
ChatQA: Building GPT-4 Level Conversational QA Models
Paper
• 2401.10225
• Published • 36
Orion-14B: Open-source Multilingual Large Language Models
Paper
• 2401.12246
• Published • 14
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
• 2401.14196
• Published • 71
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published • 45
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published • 18
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published • 85
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Paper
• 2204.06745
• Published • 1
CroissantLLM: A Truly Bilingual French-English Language Model
Paper
• 2402.00786
• Published • 26
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper
• 2402.16840
• Published • 25
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published • 134
Nemotron-4 15B Technical Report
Paper
• 2402.16819
• Published • 46
StarCoder 2 and The Stack v2: The Next Generation
Paper
• 2402.19173
• Published • 154
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published • 50
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published • 64
Sailor: Open Language Models for South-East Asia
Paper
• 2404.03608
• Published • 21
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published • 126
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published • 259
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published • 40
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published • 25
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published • 69
Aya 23: Open Weight Releases to Further Multilingual Progress
Paper
• 2405.15032
• Published • 32
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts
Language Models
Paper
• 2406.06563
• Published • 20
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
• 2406.14491
• Published • 96
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published • 117