Skip to content

Resources

Neural Networks, BERT, attention, Transformers, Word Embeddings, LLMs

  • Elements of Statistical Learning
  • Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.
  • Wang et al. (2019) GLUE: A Multi-Task Benchmark And Analysis Platform For Natural Language Understanding
  • Hue et al. (2020) X-TREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
  • Strubell et al. (2019) Energy and Policy Considerations for Deep Learning in NLP
  • Dodge et al. (2022) Measuring the Carbon Intensity of AI in Cloud Instances
  • Sheng et al. (2019) The Woman Worked as a Babysitter: On Biases in Language Generation
  • Gupta et al. (2014) ​​Improved pattern learning for bootstrapped entity extraction
  • Dou et al. (2016) Word Alignment by Fine-tuning Embeddings on Parallel Corpora
  • Karpathy, Andrej (2016) Yes you should understand Backprop
  • Karpathy, Andrej (2015) The Unreasonable Effectiveness of Recurrent Neural Networks
  • Olah, Christopher (2015) Understanding LSTM Networks
  • Olah, Christopher (2016) Attention and Augmented Recurrent Neural Networks
  • Mikolov et al. (2013) Efficient Estimation of Word Representations in Vector Space
  • Pennington et al (2014) GloVe: Global Vectors for Word Representation
  • Bojanowski et al. (2016) Enriching Word Vectors with Subword Information
  • Peters et al., (2018) Deep contextualized word representations
  • Howard & Ruder, (2018) Universal Language Model Fine-tuning for Text Classification
  • Devlin et al., (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Alamar, Jay, (2018) The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
  • Vaswani et al., (2017), Attention Is All You Need
  • Uszkoreit, Jakop, (2017) Transformer: A Novel Neural Network Architecture for Language Understanding
  • Alamar, Jay, (2018) The Illustrated Transformer
  • Adaloglou, Nikola, (2020) How Transformers work in deep learning and NLP: an intuitive introduction
  • Liu et al., (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach
  • Wolf et al. (2019) HuggingFace's Transformers: State-of-the-art Natural Language Processing
  • Sun et al., (2019), How to Fine-Tune BERT for Text Classification?
  • Brown et al., (2020), Language Models are Few-Shot Learners
  • Gao et al., (2020), Making Pre-trained Language Models Better Few-shot Learners
  • Gao, Tianyu, (2021), Prompting: Better Ways of Using Language Models for NLP Tasks
  • Timo Schick and Hinrich Schütze (2021). Generating Datasets with Pretrained Language Models.
  • Timo Schick and Hinrich Schütze (2021). Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference
  • Bender et al., (2021), On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
  • Kirk et al., (2021), How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases
  • Timo Schick et al., (2021). Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
  • Le Scao et al. (2022), BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  • Suau et al., (2022), Self-conditioning Pre-Trained Language Models
  • Agüera et al. (2022) Do Large Language Models Understand Us?
  • Touvron et al. (2023), LLaMA: Open and Efficient Foundation Language Models
  • Manakul et al. (2023) SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
  • Kaswan et al. (2023), The (Ab)Use of Open Source Code to Train Large Language Models
  • Luccioni et al. (2023), Power Hungry Processing: Watts Driving the Cost of AI Deployment?
  • Yao et al. (2023), ReAct: Synergizing reasoning and acting in Language Models
  • Huyen (2025), AI Engineering
  • Warner et al. (2024) Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Fine Tuning and Inference
  • Chen et al. (2024) What is the Role of Small Models in the LLM Era: A Survey
  • Weng (2024) Extrinsic Hallucinations in LLMs
  • Mitchell (2025), LLMs and World Models
  • Vafa et al. (2024), Evaluating the World Model Implicit in a Generative Model
  • Feng et al. (2024), Were RNNs All We Needed?