DistilBERT Text classification using Keras

Oct 19, 2020

In the previous blog, I covered the text classification task using BERT. In this blog let’s cover the smaller version of BERT and that is DistilBERT. DistilBERT is a smaller version of BERT developed and open-sourced by the team at HuggingFace. It’s a lighter and faster version of BERT that roughly matches its performance.

DistilBERT has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

DistilBERT Implementation in Keras.

First, the trained distilBERT was used to generate sentence embedding (768 dimensions) for the dataset.
Then a basic NN Architecture (with Dense and Dropout layers) was used for the further classification task and the training.
Finally, the evaluation of the model.

DistilBERT Implementation

The Tensorboard visualization is not clearly visible here, the output of Cell 23 will be something like this:

Happy Learning:)

References

DistilBERT - transformers 3.3.0 documentation

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a…

huggingface.co

A Visual Guide to Using BERT for the First Time

Translations: Russian Progress has been rapidly accelerating in machine learning models that process language over the…

jalammar.github.io

🏎 Smaller, faster, cheaper, lighter: Introducing DilBERT, a distilled version of BERT

You can find the code to reproduce the training of DilBERT along with pre-trained weights for DilBERT here.

medium.com