See how ONNX can be used for faster CPU inference performance using the Huggingface transformer NLP pipeline with few changes.
Now some overview about the terms here,
Transformers provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
Transformer pipeline is the simplest way to use pretrained SOTA model for different types of NLP task like sentiment-analysis, question-answering, zero-shot classification, feature-extraction, NER etc. using two lines of code.
ONNX stands for Open Neural Network Exchange
. ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn etc.
– Improve inference performance for a different types of ML models.
– Reduce time and cost of training large models
– Train in Python but deploy into a C#/C++/Java app
– Run on different hardware and operating systems
– Support models created in several different frameworks
Now you have a overview of ONNX.
Let’s see how to use ONNX for faster transformer NLP pipeline.
Install transformers and onnx_transformers in Colab.
!pip install transformers
!pip install git+https://github.com/patil-suraj/onnx_transformers
Import pipeline
function from onnx_transformers
from onnx_transformers import pipeline
Now, let’s use for various NLP tasks,
Sentiment-analysis
Add onnx = True
in pipeline function. This is similar to original pipeline, only onnx param is added.
nlp = pipeline("sentiment-analysis", onnx=True)
Lets use nlp to get sentiment of a text,
nlp("I like this combo of chicken starters!")
This, gives below result,
[{'label': 'POSITIVE', 'score': 0.9932807683944702}]
Now, lets see the inference speed,
%timeit nlp("I like this combo of chicken starters!")
10 loops, best of 3: 23.7 ms per loop
This is fast for a CPU inference 🙂
Question answering:
Question-answering is the task of extracting answers from a tuple of a candidate paragraph and a question.
Define the QA pipeline,
nlp_qa = pipeline('question-answering', onnx=True)
Let’s try this QA pipeline,
nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is Google based?')
This gives below answer,
{'answer': 'Mountain View, California,', 'end': 291, 'score': 0.4882817566394806, 'start': 265}
Now, lets check inference speed for QA ONNX pipeline,
%timeit nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is Google based?')
1 loop, best of 3: 230 ms per loop
Now let’s try a different transformer model called mrm8488/bert-tiny-finetuned-squadv2 for QA,
nlp_qa = pipeline('question-answering', model="mrm8488/bert-tiny-finetuned-squadv2", onnx=True) nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is Google based?')
Answer,
{'answer': 'Mountain View, California, United States.', 'end': 305, 'score': 0.017995649948716164, 'start': 265}
Bert-tiny model QA inference performance,
%timeit nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is Google based?')
10 loops, best of 3: 141 ms per loop
feature-extraction:
Feature-extraction pipeline extracts the hidden states from the base transformer, which can be used as features in downstream tasks.
Define the feature-extraction pipeline,
nlp = pipeline("feature-extraction", onnx= True)
Let’s extract features for this text,
nlp('Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.')
It gives a tensor representation for the above sequence.
Named Entity Recognition:-
This pipeline extracts named entities for each word in the input sequence.
Define the NER pipeline,
nlp = pipeline("ner", onnx=True)
Extract named entities for below sequence,
nlp('Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.')
Here are the entities,
[{'entity': 'I-ORG', 'index': 1, 'score': 0.9994143843650818, 'word': 'Google'}, {'entity': 'I-ORG', 'index': 2, 'score': 0.9844746589660645, 'word': ','}, {'entity': 'I-ORG', 'index': 3, 'score': 0.998744547367096, 'word': 'LLC'}, {'entity': 'I-MISC', 'index': 6, 'score': 0.9970664381980896, 'word': 'American'}, {'entity': 'I-MISC', 'index': 13, 'score': 0.9974018931388855, 'word': 'Internet'}, {'entity': 'I-ORG', 'index': 38, 'score': 0.9973472356796265, 'word': 'Google'}, {'entity': 'I-LOC', 'index': 43, 'score': 0.9949518442153931, 'word': 'Mountain'}, {'entity': 'I-LOC', 'index': 44, 'score': 0.9973859786987305, 'word': 'View'}, {'entity': 'I-LOC', 'index': 46, 'score': 0.9987567067146301, 'word': 'California'}, {'entity': 'I-LOC', 'index': 48, 'score': 0.9979965686798096, 'word': 'United'}, {'entity': 'I-LOC', 'index': 49, 'score': 0.9937295317649841, 'word': 'States'}]
Zero-shot-classification:
Zero-shot-classification model classify data which the model never seen.
Define zero-shot classification model,
classifier = pipeline("zero-shot-classification", onnx=True)
sequence = "For any budding cricketer, playing with or against MS Dhoni is a big deal" candidate_labels = ["cricket", "football", "basketball"] classifier(sequence, candidate_labels)
See how it correctly classifies the sequence as Cricket,
{'labels': ['cricket', 'basketball', 'football'], 'scores': [0.9873027801513672, 0.00657124537974596, 0.006125985644757748], 'sequence': 'For any budding cricketer, playing with or against MS Dhoni is a big deal'}
Here is the colab link,
My other articles about Huggingface/transformers/BERT,
Text2TextGeneration pipeline by Huggingface transformers
Question answering using transformers and BERT
How to cluster text documents using BERT
How to do semantic document similarity using BERT
Zero-shot classification using Huggingface transformers
Summarize text document using transformers and BERT
Follow me on Twitter, Instagram, Pinterest, and Tumblr for new post notification.
2 replies on “Faster transformer NLP pipeline using ONNX”
[…] Faster transformer NLP pipeline using ONNX […]
[…] Faster transformer NLP pipeline using ONNX […]