DistilBERT Text classification using Keras

Oct 19, 2020



In the previous blog, I covered the text classification task using BERT. In this blog let’s cover the smaller version of BERT and that is DistilBERT. DistilBERT is a smaller version of BERT developed and open-sourced by the team at HuggingFace. It’s a lighter and faster version of BERT that roughly matches its performance.

DistilBERT has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

DistilBERT Implementation in Keras.

  • First, the trained distilBERT was used to generate sentence embedding (768 dimensions) for the dataset.
  • Then a basic NN Architecture (with Dense and Dropout layers) was used for the further classification task and the training.
  • Finally, the evaluation of the model.
DistilBERT Implementation
  • The Tensorboard visualization is not clearly visible here, the output of Cell 23 will be something like this:
Tensorboard Visualization

Happy Learning:)





Data Scientist @Sprinklr | IIT Bombay | IIT (ISM) Dhanbad