📚 Embedding Aggregation for Text Classification: TF-IDF vs FastText vs ELMo¶
In this notebook, we explore how different vectorization strategies impact the performance of a logistic regression classifier on the IMDB sentiment dataset.
We will examine:
- TF-IDF bag-of-words
- FastText static word embeddings
- ELMo contextual embeddings
Our focus is on how to transform word embeddings into sentence embeddings and how these choices affect classification.
🧭 Goal¶
Implement logistic regression classifiers using three feature types:
- TF-IDF vectors
- FastText word embeddings aggregated into sentence vectors
- ELMo embeddings aggregated into sentence vectors
Compare three pooling strategies for embedding-based methods:
- Concatenation (fixed-length, truncated/padded)
- Mean pooling
- TF-IDF-weighted mean pooling
Benchmark against native ELMo sentence representations.
Analyze differences in word importance between TF-IDF and median-based embeddings.
📌 Table of Contents¶
- Environment Setup & Imports
- Data Loading & Preprocessing
- TF-IDF Baseline
- FastText Embeddings
- Loading FastText
- Aggregation Strategies
- Classification
- ELMo Embeddings
- Model Comparison & Interpretation
- Conclusion
0. 🧠 Introduction¶
Turning raw text into numerical features is a key step in NLP pipelines. While TF-IDF treats each word as an independent feature, word embeddings capture semantic relationships but require pooling to create fixed-length sentence vectors.
Pooling strategies can dramatically influence downstream performance:
- Concatenation preserves positional information up to a fixed length.
- Mean pooling offers a simple average representation.
- TF-IDF-weighted mean emphasizes more informative words.
Why this matters:
Understanding how pooling choices impact classification helps in selecting appropriate strategies for tasks like sentiment analysis, document retrieval, and beyond.
1. ⚙️ Environment Setup & Imports¶
First, we install and import all the libraries we’ll need:
- Data & Modeling:
datasets
,scikit-learn
,numpy
,pandas
- TF-IDF:
TfidfVectorizer
- FastText:
huggingface_hub
- ELMo:
tensorflow
andtensorflow_hub
- Classifier:
LogisticRegression
import os
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
# Hugging Face datasets
from datasets import load_dataset
# TF-IDF vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
# FastText downloader
from huggingface_hub import hf_hub_download
# TensorFlow & ELMo
import tensorflow as tf
import tensorflow_hub as hub
# Model and evaluation
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
# Ensure reproducibility
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
2. 🗄️ Loading & Preprocessing the IMDB Dataset¶
We'll load the IMDB sentiment dataset from Hugging Face and prepare our train/test splits.
- The dataset has 25,000 labeled movie reviews for training and 25,000 for testing.
- Each example has a
text
field (the review) and alabel
(0 = negative, 1 = positive). - We'll extract texts and labels into NumPy arrays for downstream feature extraction.
# Load IMDB dataset
imdb = load_dataset("imdb")
# Extract train and test splits
train_texts = imdb["train"]["text"]
train_labels = np.array(imdb["train"]["label"])
test_texts = imdb["test"]["text"]
test_labels = np.array(imdb["test"]["label"])
# Quick sanity check
print(f"Train examples: {len(train_texts)}")
print(f"Test examples: {len(test_texts)}\n")
print("Example review (train):")
print(train_texts[0][:500], "...\n")
print("Label:", train_labels[0])
Train examples: 25000 Test examples: 25000 Example review (train): I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attent ... Label: 0
2bis 🛠️ Evaluation Utilities¶
Before we dive into TF-IDF vectorization and logistic regression, let's define a helper class to compute and plot our classification metrics (accuracy, precision, recall, F1) for each method we test.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
class Metrics:
def __init__(self):
self.results = {}
def run(self, y_true, y_pred, method_name):
# Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
# Store results
self.results[method_name] = {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1': f1,
}
def plot(self):
# Create subplots
fig, axs = plt.subplots(2, 2, figsize=(15, 10))
# Plot each metric
for i, metric in enumerate(['accuracy', 'precision', 'recall', 'f1']):
ax = axs[i//2, i%2]
values = [res[metric] * 100 for res in self.results.values()]
ax.bar(self.results.keys(), values)
ax.set_title(metric.capitalize())
ax.set_ylim(0, 100)
# Add values on top of bars
for j, v in enumerate(values):
ax.text(j, v + 0.5, f"{v:.2f}%", ha='center', va='bottom')
plt.tight_layout()
plt.show()
3. TF-IDF Baseline¶
In this section, we’ll build our first extrinsic evaluation: train a Logistic Regression classifier on TF-IDF representations of IMDB movie reviews. This will serve as a strong baseline before we move on to embedding-based features.
🔧 TF-IDF Vectorization & Model Training¶
We’ll convert each review into a TF-IDF vector (unigrams + bigrams, capped at 50k features, min_df = 5, stop_words = 'english'), then train a Logistic Regression model.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
# Vectorize with TF-IDF
tfidf = TfidfVectorizer(max_features=50000, ngram_range=(1,2), min_df=5, stop_words='english')
X_train_tfidf = tfidf.fit_transform(train_texts)
X_test_tfidf = tfidf.transform(test_texts)
# Train Logistic Regression
clf_tfidf = LogisticRegression(max_iter=1000)
clf_tfidf.fit(X_train_tfidf, train_labels)
LogisticRegression(max_iter=1000)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(max_iter=1000)
📊 Evaluate on Test Set¶
Use our Metrics
helper to compute and plot accuracy, precision, recall, and F1 for the TF-IDF baseline.
# Predict & evaluate
y_pred_tfidf = clf_tfidf.predict(X_test_tfidf)
metrics = Metrics()
metrics.run(test_labels, y_pred_tfidf, "TF-IDF Baseline")
metrics.plot()
✏️ Quick Comment on TF-IDF Baseline¶
Our TF-IDF + Logistic Regression baseline achieves 88.30% across accuracy, precision, recall and F1. This is a strong starting point—showing that simple bag-of-ngrams still captures a lot of sentiment signal in IMDB reviews.
Now, let’s see how adding subword information via FastText embeddings changes our performance!
4. FastText Embedding Features¶
In this section, we’ll:
- Load the pre-trained FastText model from Hugging Face.
- Write helper functions to turn each review into:
- Concatenation of all word vectors
- Mean of word vectors
- TF-IDF-weighted average of word vectors
- Train a Logistic Regression classifier on each feature set.
- Compare results against our TF-IDF baseline.
from huggingface_hub import hf_hub_download
import fasttext
# Download & load FastText vectors
fasttext_model_path = hf_hub_download(repo_id="facebook/fasttext-en-vectors", filename="model.bin")
ft = fasttext.load_model(fasttext_model_path)
# Example: get vector for a word
print("Vector dim:", ft.get_word_vector("movie").shape)
Vector dim: (300,)
🔗 From Word Embeddings to Sentence Embeddings¶
In our sentiment classification task, each review is a whole sentence (or paragraph), but FastText (and other embedding models) give us one vector per word. To feed these into a Logistic Regression classifier, we need a fixed-length representation for each review.
We'll explore three common strategies to aggregate word vectors into a single sentence embedding:
- Concatenation – preserve word order (up to a limit)
- Mean pooling – simple average of all word vectors
- TF-IDF weighted average – emphasize important words by weighting
📦 FastText Sentence Embedding Strategies¶
Below are the helper functions we’ll use. They each take a list of raw texts and our FastText model, and output a 2D NumPy array where each row is the sentence embedding for one review.
4.1 ✂️ Concatenation of Word Vectors¶
- What it does:
- Split the review into words.
- Fetch each word's FastText vector, up to a fixed maximum length (
max_len
). - Concatenate them in order, padding with zeros if the review is shorter than
max_len
.
- Why: Keeps some sense of word order and fine-grained structure, at the cost of high dimensionality.
4.1.2 ⚖️ Mean Pooling of Word Vectors¶
- What it does:
- Split the review into words.
- Fetch each word's vector.
- Compute the arithmetic mean across all vectors.
- Why: Produces a compact, order-agnostic summary of the sentence. Often surprisingly effective despite its simplicity.
4.1.3 📝 TF-IDF Weighted Average of Word Vectors¶
- What it does:
- Precompute TF-IDF scores for each word in our corpus.
- For each review, split into words and look up each word’s TF-IDF weight.
- Compute a weighted average of the FastText vectors, using TF-IDF weights to upweight rare/informative words.
- Why: Balances the simplicity of averaging with a way to emphasize words that carry more sentiment information.
Next, we’ll implement these functions in code, generate feature matrices for our train and test sets, and train a Logistic Regression on each to see how they compare!
import scipy.sparse as sp
def ft_concat_embedding(texts, model):
"""Concatenate all word vectors for each text (pads/truncates to max_len)."""
max_len = 100 # for example
dims = model.get_word_vector("test").shape[0]
features = []
for doc in texts:
vecs = [model.get_word_vector(w) for w in doc.split()][:max_len]
# pad with zeros if shorter
if len(vecs) < max_len:
vecs.extend([np.zeros(dims)] * (max_len - len(vecs)))
features.append(np.concatenate(vecs))
return np.vstack(features)
def ft_mean_embedding(texts, model):
"""Mean of word vectors for each text."""
features = []
for doc in texts:
vecs = np.array([model.get_word_vector(w) for w in doc.split()])
features.append(vecs.mean(axis=0) if len(vecs) else np.zeros(model.get_word_vector("test").shape))
return np.vstack(features)
def ft_tfidf_weighted_embedding(texts, model, tfidf_vec, tfidf_matrix):
"""TF-IDF weighted average of word vectors."""
features = []
tfidf_vocab = tfidf_vec.vocabulary_
idf = tfidf_vec.idf_
for i, doc in enumerate(texts):
words = doc.split()
weights = []
vecs = []
for w in words:
if w in tfidf_vocab:
w_idx = tfidf_vocab[w]
weights.append(idf[w_idx])
vecs.append(model.get_word_vector(w))
if vecs:
wgt = np.average(np.array(vecs), axis=0, weights=weights)
else:
wgt = np.zeros(model.get_word_vector("test").shape)
features.append(wgt)
return np.vstack(features)
# Example texts
sample_texts = [
"This movie was outstanding and full of suspense",
"I did not enjoy the plot, it was too predictable",
"An absolute masterpiece with brilliant acting"
]
tfidf_matrix = tfidf.transform(sample_texts) #we already fit the vectorizer on the corpus previously
# 3. Generate embeddings
emb_concat = ft_concat_embedding(sample_texts, ft)
emb_mean = ft_mean_embedding(sample_texts, ft)
emb_tf = ft_tfidf_weighted_embedding(sample_texts, ft, tfidf, tfidf_matrix)
# 4. Inspect shapes and a snippet of the first vector
print("Concatenation:", emb_concat.shape, "— first 5 values:", emb_concat[0][:5])
print("Mean pooling:", emb_mean.shape, "— first 5 values:", emb_mean[0][:5])
print("TF-IDF wgt:", emb_tf.shape, "— first 5 values:", emb_tf[0][:5])
Concatenation: (3, 30000) — first 5 values: [-0.03508458 0.10469877 0.00859013 0.10003099 0.02248639] Mean pooling: (3, 300) — first 5 values: [-0.0043145 -0.00577394 -0.0089867 0.00266586 -0.00065803] TF-IDF wgt: (3, 300) — first 5 values: [-0.01418748 -0.00709886 0.03651825 0.03428475 0.01234921]
🏗️ Building FastText Pipelines, Training & Evaluation¶
Now that we have our three sentence‐embedding strategies for FastText, we’ll:
- Generate features for train and test sets via each strategy
- Train a Logistic Regression on each feature set
- Evaluate and visualize results with our
Metrics
helper
This will let us directly compare how concatenation, mean pooling, and TF-IDF‐weighted pooling fare on movie‐review sentiment.
from tqdm import tqdm
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
# 3. Build three pipelines using our embedding functions + logistic regression
pipelines = {
"FT_Concat": Pipeline([
("embed", FunctionTransformer(lambda X: ft_concat_embedding(X, ft), validate=False)),
("clf", LogisticRegression(max_iter=1000))
]),
"FT_Mean": Pipeline([
("embed", FunctionTransformer(lambda X: ft_mean_embedding(X, ft), validate=False)),
("clf", LogisticRegression(max_iter=1000))
]),
"FT_TFIDF": Pipeline([
("embed", FunctionTransformer(
lambda X: ft_tfidf_weighted_embedding(X, ft, tfidf, tfidf.transform(X)),
validate=False
)),
("clf", LogisticRegression(max_iter=1000))
]),
}
for name, pipe in tqdm(pipelines.items()):
pipe.fit(train_texts, train_labels)
100%|██████████| 3/3 [07:30<00:00, 150.33s/it]
for name, pipe in pipelines.items():
preds = pipe.predict(test_texts)
metrics.run(test_labels, preds, method_name=name)
# 5. Visualize all results
metrics.plot()
🧐 Interpreting the FastText Results¶
Our FastText pipelines yielded the following F1 scores on the IMDB test set:
- FT_Concat: 69%
- FT_Mean: 79%
- FT_TFIDF: 80%
- TF-IDF Baseline: 88%
Why do these static‐embedding strategies fall short of a pure TF-IDF model?
Concatenation (69% F1)
Concatenating every word vector up to a fixed length creates a very high‐dimensional feature space.- Every review becomes one enormous vector whose layout depends on word order and padding.
- The model must learn to ignore “empty” (zero) slots and discover complex positional patterns.
- Signal is diluted by noise—rare words, padding vectors, and varying lengths—making it hard for a simple linear classifier to find a robust decision boundary.
Mean Pooling (79% F1)
Taking the unweighted average of all word vectors collapses the entire review into one compact summary.- This reduces dimensionality and aggregates information, which helps a lot compared to concatenation.
- However, every word (including stop-words like “the”, “and”, “of”) contributes equally, so irrelevant tokens still dilute your signal.
- Word order and local contextual cues are lost—“not good” and “good not” become identical.
TF-IDF-Weighted Pooling (80% F1)
Weighting each word vector by its TF-IDF score improves slightly over the unweighted mean:- High-informativeness words (e.g. “horrible”, “masterpiece”) get more emphasis, while common words get down-weighted.
- Yet this still treats the review as a “bag of weighted vectors” with no sense of word adjacency or syntax.
- In practice, the marginal gain (<1%) shows that simple linear weighting can only go so far in capturing sentence‐level nuance.
🔑 Key Takeaway:
Static word embeddings—whether concatenated, averaged, or TF-IDF-weighted—cannot fully recover the rich structure, negation, and context that a bag-of-words TF-IDF representation already encodes very effectively. In the next section, we’ll see how contextual models like ELMo can dynamically adapt to word order and usage, potentially closing this performance gap.
5. 🌀 Contextual Embeddings with ELMo¶
ELMo (Embeddings from Language Models) generates contextualized word vectors: the same word has different representations depending on its sentence. This lets us capture negation, polysemy, and subtle syntactic cues that static methods miss.
In this section we will:
- Load the pre-trained ELMo model from TensorFlow Hub
- Wrap it in a scikit-learn transformer that mean-pools over tokens
- Train & evaluate a logistic regression on these sentence embeddings
from tqdm import tqdm
import tensorflow as tf
import tensorflow_hub as hub
from sklearn.base import BaseEstimator, TransformerMixin
# Load the ELMo model
elmo_model = hub.load("https://tfhub.dev/google/elmo/3")
class ELMoTransformer(BaseEstimator, TransformerMixin):
"""A scikit-learn transformer for extracting mean-pooled ELMo embeddings."""
def __init__(self, model):
self.model = model
def fit(self, X, y=None):
return self
def transform(self, X):
# Process in batches for better performance
batch_size = 32 # You can adjust this based on your available memory
num_samples = len(X)
num_batches = (num_samples + batch_size - 1) // batch_size # Ceiling division
embeddings_list = []
for i in tqdm(range(num_batches)):
# Get current batch
start_idx = i * batch_size
end_idx = min((i + 1) * batch_size, num_samples)
batch_texts = X[start_idx:end_idx]
# Process batch
embedding = self.model.signatures["default"](
tf.constant(batch_texts)
)
# Extract embeddings from batch
batch_embeddings = embedding["default"].numpy()
embeddings_list.append(batch_embeddings)
# Concatenate all batches
return np.vstack(embeddings_list)
🧪 Example: Using ELMoTransformer
on a Single Review¶
Below we take one IMDb review, pass it through our ELMoTransformer
, and inspect the resulting 1024-dimensional sentence embedding.
# Grab a single test review
single_review = train_texts[0]
print("Raw review text:\n", single_review, "\n")
# Instantiate and transform
elmo_trans = ELMoTransformer(elmo_model)
single_emb = elmo_trans.transform([single_review])
# Inspect the output
print("ELMo embedding shape:", single_emb.shape)
print("First 5 dimensions of embedding:\n", single_emb[0][:5])
Raw review text: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far between, even then it's not shot like some cheaply made porno. While my countrymen mind find it shocking, in reality sex and nudity are a major staple in Swedish cinema. Even Ingmar Bergman, arguably their answer to good old boy John Ford, had sex scenes in his films.<br /><br />I do commend the filmmakers for the fact that any sex shown in the film is shown for artistic purposes rather than just to shock people and make money to be shown in pornographic theaters in America. I AM CURIOUS-YELLOW is a good film for anyone wanting to study the meat and potatoes (no pun intended) of Swedish cinema. But really, this film doesn't have much of a plot. ELMo embedding shape: (1, 1024) First 5 dimensions of embedding: [-0.05402381 -0.11614838 -0.19337955 -0.11014549 0.09235258]
🏗️ Building the ELMo Pipeline, Training & Evaluation¶
We’ll now wire up our ELMoTransformer
with a LogisticRegression
and fit it on the IMDB training set, and evaluate on the test split using our Metrics
helper.
from sklearn.linear_model import LogisticRegression
# Assemble the pipeline
clf = LogisticRegression(max_iter=1000)
# First, create an instance of the ELMoTransformer class
elmo_transformer = ELMoTransformer(elmo_model)
# Then call the transform method on the instance
train_elmo_embeddings = elmo_transformer.transform(train_texts)
test_elmo_embeddings = elmo_transformer.transform(test_texts)
# Now train your classifier
clf = LogisticRegression(max_iter=1000)
clf.fit(train_elmo_embeddings, train_labels)
4%|▍ | 33/782 [2:14:25<50:50:52, 244.40s/it]
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[40], line 10 7 elmo_transformer = ELMoTransformer(elmo_model) 9 # Then call the transform method on the instance ---> 10 train_elmo_embeddings = elmo_transformer.transform(train_texts) 11 test_elmo_embeddings = elmo_transformer.transform(test_texts) 13 # Now train your classifier File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/sklearn/utils/_set_output.py:319, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs) 317 @wraps(f) 318 def wrapped(self, X, *args, **kwargs): --> 319 data_to_wrap = f(self, X, *args, **kwargs) 320 if isinstance(data_to_wrap, tuple): 321 # only wrap the first output for cross decomposition 322 return_tuple = ( 323 _wrap_data_with_container(method, data_to_wrap[0], X, self), 324 *data_to_wrap[1:], 325 ) Cell In[35], line 32, in ELMoTransformer.transform(self, X) 29 batch_texts = X[start_idx:end_idx] 31 # Process batch ---> 32 embedding = self.model.signatures["default"]( 33 tf.constant(batch_texts) 34 ) 36 # Extract embeddings from batch 37 batch_embeddings = embedding["default"].numpy() File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1170, in ConcreteFunction.__call__(self, *args, **kwargs) 1120 def __call__(self, *args, **kwargs): 1121 """Executes the wrapped function. 1122 1123 ConcreteFunctions have two signatures: (...) 1168 TypeError: If the arguments do not match the function's signature. 1169 """ -> 1170 return self._call_impl(args, kwargs) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/wrap_function.py:261, in WrappedFunction._call_impl(self, args, kwargs) 259 return self._call_flat(args, self.captured_inputs) 260 else: --> 261 return super()._call_impl(args, kwargs) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1179, in ConcreteFunction._call_impl(self, args, kwargs) 1177 if self.function_type is not None: 1178 try: -> 1179 return self._call_with_structured_signature(args, kwargs) 1180 except TypeError as structured_err: 1181 try: File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1263, in ConcreteFunction._call_with_structured_signature(self, args, kwargs) 1258 bound_args = ( 1259 function_type_utils.canonicalize_function_inputs( 1260 args, kwargs, self.function_type) 1261 ) 1262 filtered_flat_args = self.function_type.unpack_inputs(bound_args) -> 1263 return self._call_flat( 1264 filtered_flat_args, 1265 captured_inputs=self.captured_inputs) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1322, in ConcreteFunction._call_flat(self, tensor_inputs, captured_inputs) 1318 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args) 1319 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE 1320 and executing_eagerly): 1321 # No tape is watching; skip to running the function. -> 1322 return self._inference_function.call_preflattened(args) 1323 forward_backward = self._select_forward_and_backward_functions( 1324 args, 1325 possible_gradient_type, 1326 executing_eagerly) 1327 forward_function, args_with_tangents = forward_backward.forward() File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:216, in AtomicFunction.call_preflattened(self, args) 214 def call_preflattened(self, args: Sequence[core.Tensor]) -> Any: 215 """Calls with flattened tensor inputs and returns the structured output.""" --> 216 flat_outputs = self.call_flat(*args) 217 return self.function_type.pack_output(flat_outputs) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:251, in AtomicFunction.call_flat(self, *args) 249 with record.stop_recording(): 250 if self._bound_context.executing_eagerly(): --> 251 outputs = self._bound_context.call_function( 252 self.name, 253 list(args), 254 len(self.function_type.flat_outputs), 255 ) 256 else: 257 outputs = make_call_op_in_graph( 258 self, 259 list(args), 260 self._bound_context.function_call_options.as_attrs(), 261 ) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/context.py:1688, in Context.call_function(self, name, tensor_inputs, num_outputs) 1686 cancellation_context = cancellation.context() 1687 if cancellation_context is None: -> 1688 outputs = execute.execute( 1689 name.decode("utf-8"), 1690 num_outputs=num_outputs, 1691 inputs=tensor_inputs, 1692 attrs=attrs, 1693 ctx=self, 1694 ) 1695 else: 1696 outputs = execute.execute_with_cancellation( 1697 name.decode("utf-8"), 1698 num_outputs=num_outputs, (...) 1702 cancellation_manager=cancellation_context, 1703 ) File ~/Library/Caches/pypoetry/virtualenvs/bse-nlp-DetGwK6_-py3.11/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 51 try: 52 ctx.ensure_initialized() ---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 54 inputs, attrs, num_outputs) 55 except core._NotOkStatusException as e: 56 if name is not None: KeyboardInterrupt:
⏳ ELMo Training: Time & Resources Note¶
Training and fine-tuning models that leverage ELMo embeddings can be very time-consuming on a typical CPU-only setup—often taking several hours (or more) per epoch on the full IMDb dataset. To run this end-to-end in a reasonable time, we highly recommend using a GPU-enabled environment or a cloud instance with a GPU.
Tip: If you don’t have access to a GPU right now, you can go on colab or feel free to skip the local training step and instead refer to Notebook #2, where we’ve precomputed the ELMo sentence embeddings and already trained the logistic regression. You can inspect the classification results and interpret the performance without waiting for the full training run.
# Evaluate
y_elmo = clf.predict(test_elmo_embeddings)
metrics.run(test_labels, y_elmo, "ELMo")
metrics.plot()