Exploration of Represenation Models & Text Classification
Representation Models
Hugging Face
Sentiment Classification
Author
Arun Koundinya Parasa
Published
February 24, 2025
In this module, we will explore the basics of two approaches to text classification using Encoder Transformers: - Using BERT - Using Label Encodings (Sentence Transformers)
Encourage to explore this article to understand the background and intuition behind these two models.
In this article, we will also delve into sentiment classification through the following methods: - Without training - Using BERT LLM and Logistic Regression - Using Sentence Transformers LLM and Logistic Regression - Creating labels when they are not available
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.
All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
If we can observe all the related base files are loaded; that includes model configuration, model itself and vocab text
Here we converted the raw data into numerical format using tokenizer, which tokenizes the text into numbers using the downloaded vocab dictionary.
These tokens are passed into the model and output is captured.
Since, we are not training the model again we are tokenizing only the test data set.
from sklearn.metrics import classification_reporttf.keras.backend.clear_session()print(classification_report(raw_data['test']['class_index'], tf.argmax(model_output.logits,axis=1)))
Here, we can see that the default foundationmodel of BERT is giving us 80% accuracy. Which is very good :).
Bert with Logistic Regression
from transformers import TFAutoModelbert_model = TFAutoModel.from_pretrained(checkpoint)bert_model.summary()
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertModel: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.bias', 'classifier.weight']
- This IS expected if you are initializing TFDistilBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFDistilBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.
Since, we will be training the classifer layer we are loading the model without classifer layer using the command TFAutoModel. You can see the difference in outputs in both models
from sklearn.linear_model import LogisticRegressionlr = LogisticRegression()lr.fit(reshaped_output, raw_data['train']['class_index'])
LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
Feeding the BERT last layer output to the Logistic Regression and trained the Logistic Regression.
On Test Dataset we can see that the accuracy has jumped from 80% to 85% with a mere Logistic Classifier at the end. Isn’t it beautiful. However, only drawback of this is that is consumes lot of GPU memory.
I have reloaded the dataset to demonstrate that sentence transformers can handle larger datasets more efficiently compared to the BERT model shown earlier. Sentence transformers effortlessly convert text into embeddings, reducing memory usage for tokenization and subsequent model processing.
Although both models are based on BERT, sentence transformers offer better memory efficiency.
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loaded the model and converted both train data and test data into embeddings.
train_embeddings.shape
(100000, 768)
from sklearn.linear_model import LogisticRegressionlr = LogisticRegression(max_iter=1000)lr.fit(train_embeddings, raw_data['train']['class_index'])
LogisticRegression(max_iter=1000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(max_iter=1000)
Furthermore, we trained a lightweight logistic regression model using those embeddings.
from sklearn.metrics import classification_reporty_pred = lr.predict(test_embeddings)print(classification_report(raw_data['test']['class_index'], y_pred))
Here, we can see that our accuracy increased from 85% to 87%. However, we cannot directly attribute this improvement to the use of sentence transformers alone, as both BERT and sentence transformers capture the context of the information. That said, based on my understanding, sentence transformers are faster, more scalable, and reliable.
Creating Labels Using Sentence Transformers
Let’s assume that instead of predicting positive or negative sentiment, we want to classify sentiment on a 5-point Likert scale. Sentence transformers come in handy here, as they allow us to explore the similarity between the labels and the input text, helping us tag the input accordingly.
labels = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]y_pred_labels = [labels[i] for i in y_pred]test_df = Dataset.to_pandas(raw_data['test'])y_pred_df = pd.DataFrame(y_pred_labels, columns=['Predicted_Labels'])combined_df = pd.concat([test_df.reset_index(drop=True), y_pred_df.reset_index(drop=True)], axis=1)combined_df
class_index
review_combined_lemma
__index_level_0__
Predicted_Labels
0
1
great book must preface saying not religious l...
23218
Negative
1
0
huge disappointment big time long term trevani...
20731
Very Negative
2
1
wayne tight cant hang turk album hot want howe...
39555
Negative
3
1
excellent read book elementary school probably...
147506
Positive
4
0
not anusara although book touted several anusa...
314215
Negative
...
...
...
...
...
9995
0
left many question read book recently diagnose...
105263
Positive
9996
1
liked wontrom reading rest great book no doubt...
334968
Negative
9997
1
recorder product durable bought fourth grader ...
355111
Very Positive
9998
1
like book elizabeth von arnim enjoy gardening ...
95143
Negative
9999
0
disappointed copy book offered sale catalog wa...
158471
Negative
10000 rows × 4 columns
Wohooo!!! We have custom created our own Predicted Labels using sentence tranfomers. Although they might not be completely accurate but it helps us to arrive at a quick conclusion when we have no information about the input text.
This programming article enhanced my understanding of how to use representation models in practice, providing new insights and uncovering exciting possibilities for leveraging embedding models. More to come—stay tuned!