In the previous blog, I covered the text classification task using BERT. In this blog let’s cover the smaller version of BERT and that is DistilBERT. DistilBERT is a smaller version of BERT developed and open-sourced by the team at HuggingFace. It’s a lighter and faster version of BERT that roughly matches its performance.
DistilBERT has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.
DistilBERT Implementation in Keras.
- First, the trained distilBERT was used to generate sentence embedding (768 dimensions) for the dataset.
- Then a basic NN Architecture (with Dense and Dropout layers) was used for the further classification task and the training.
- Finally, the evaluation of the model.
- The Tensorboard visualization is not clearly visible here, the output of Cell 23 will be something like this:
DistilBERT - transformers 3.3.0 documentation
The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a…
A Visual Guide to Using BERT for the First Time
Translations: Russian Progress has been rapidly accelerating in machine learning models that process language over the…