Resources

Recommended Materials

Neural Networks, BERT, attention, Transformers, Word Embeddings, LLMs

Elements of Statistical Learning
Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.
Wang et al. (2019) GLUE: A Multi-Task Benchmark And Analysis Platform For Natural Language Understanding
Hue et al. (2020) X-TREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Strubell et al. (2019) Energy and Policy Considerations for Deep Learning in NLP
Dodge et al. (2022) Measuring the Carbon Intensity of AI in Cloud Instances
Sheng et al. (2019) The Woman Worked as a Babysitter: On Biases in Language Generation
Gupta et al. (2014) Improved pattern learning for bootstrapped entity extraction
Dou et al. (2016) Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Karpathy, Andrej (2016) Yes you should understand Backprop
Karpathy, Andrej (2015) The Unreasonable Effectiveness of Recurrent Neural Networks
Olah, Christopher (2015) Understanding LSTM Networks
Olah, Christopher (2016) Attention and Augmented Recurrent Neural Networks
Mikolov et al. (2013) Efficient Estimation of Word Representations in Vector Space
Pennington et al (2014) GloVe: Global Vectors for Word Representation
Bojanowski et al. (2016) Enriching Word Vectors with Subword Information
Peters et al., (2018) Deep contextualized word representations
Howard & Ruder, (2018) Universal Language Model Fine-tuning for Text Classification
Devlin et al., (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Alamar, Jay, (2018) The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Vaswani et al., (2017), Attention Is All You Need
Uszkoreit, Jakop, (2017) Transformer: A Novel Neural Network Architecture for Language Understanding
Alamar, Jay, (2018) The Illustrated Transformer
Adaloglou, Nikola, (2020) How Transformers work in deep learning and NLP: an intuitive introduction
Liu et al., (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach
Wolf et al. (2019) HuggingFace's Transformers: State-of-the-art Natural Language Processing
Sun et al., (2019), How to Fine-Tune BERT for Text Classification?
Brown et al., (2020), Language Models are Few-Shot Learners
Gao et al., (2020), Making Pre-trained Language Models Better Few-shot Learners
Gao, Tianyu, (2021), Prompting: Better Ways of Using Language Models for NLP Tasks
Timo Schick and Hinrich Schütze (2021). Generating Datasets with Pretrained Language Models.
Timo Schick and Hinrich Schütze (2021). Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference
Bender et al., (2021), On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
Kirk et al., (2021), How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases
Timo Schick et al., (2021). Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Le Scao et al. (2022), BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Suau et al., (2022), Self-conditioning Pre-Trained Language Models
Agüera et al. (2022) Do Large Language Models Understand Us?
Touvron et al. (2023), LLaMA: Open and Efficient Foundation Language Models
Manakul et al. (2023) SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Kaswan et al. (2023), The (Ab)Use of Open Source Code to Train Large Language Models
Luccioni et al. (2023), Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Yao et al. (2023), ReAct: Synergizing reasoning and acting in Language Models
Huyen (2025), AI Engineering
Warner et al. (2024) Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Fine Tuning and Inference
Chen et al. (2024) What is the Role of Small Models in the LLM Era: A Survey
Weng (2024) Extrinsic Hallucinations in LLMs
Mitchell (2025), LLMs and World Models
Vafa et al. (2024), Evaluating the World Model Implicit in a Generative Model
Feng et al. (2024), Were RNNs All We Needed?