Question answering using transformers and BERT

How to do Question answering using Huggingface transformers and BERT? See how you can use the transformers pipeline for Question answering using BERT.

What are transformers?

Transformers provide general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pre-trained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.

What is question-answering in NLP?

Question-answering is the task of extracting answers from a tuple of a candidate paragraph and a question.

What are the different types of QA systems?

Two types: Closed domain and open domain.

What is a closed domain QA system?

Closed domain QA system extracts answer from a given paragraph or document.

What is an open domain QA system?

Open-domain QA system extracts answer from a large corpus of documents like Wikipedia for a given question.

Huggingface transformer has a pipeline called question answering we will use it here.

Question answering pipeline uses a model finetuned on Squad task.

Let’s see it in action.

  1. Install Transformers library in colab.
!pip install transformers

or, install it locally,

pip install transformers

2. Import transformers pipeline,

from transformers import pipeline

3. Set the pipeline.

nlp_qa = pipeline('question-answering')

4. Pass the context and question to the pipeline.

nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', 
       question='Where is based Google ?')

Which shows the output,

{'answer': 'Mountain View, California,',
 'end': 291,
 'score': 0.4971035420894623,
 'start': 265}

Now, let’s try a different model.

We will use a tiny transformer model called bert-tiny-finetuned-squadv2

Set the bert-tiny model in model and tokenizer as below,

# Alternative:# Alternative:
from transformers import *
tokenizer = AutoTokenizer.from_pretrained("mrm8488/bert-tiny-5-finetuned-squadv2")
model = AutoModelForQuestionAnswering.from_pretrained("mrm8488/bert-tiny-5-finetuned-squadv2")

bert_tiny_nlp_qa = pipeline('question-answering', 
                            model = model,
                            tokenizer = tokenizer)

Pass the same question and context to the bert_tiny_nlp_qa pipeline.

bert_tiny_nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', 
       question='Where is based Google ?')

which shows following output,

{'answer': 'Mountain View, California, United States.',
 'end': 305,
 'score': 0.5947868227958679,
 'start': 265}

As you can see both are giving same answer.

Performance comparison based on inference time:-

Now lets see the inference time for these two models pipeline on a CPU.

Inference time for default model QA pipeline,

%timeit nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is based Google ?')


1 loop, best of 3: 875 ms per loop

Inference time for bert-tiny model QA pipeline,

%timeit bert_tiny_nlp_qa(context='Google, LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware.Google corporate headquarters located at Mountain View, California, United States.', question='Where is based Google ?')


1 loop, best of 3: 233 ms per loop

As, you can see bert-tiny is 3.75 times faster than the default pipeline.

By Satyanarayan Bhanja

Machine learning engineer

