Categories
csv cURL dashboard dev tools Elasticsearch Kibana log management REST api Search engine

Extract data from Elasticsearch using Kibana – dev tools

Learn how to extract and write queries to fetch data from Elasticsearch using Kibana dev tools. Understand different components of Dev tools.

Kibana has a convenient tool called “dev tools” to extract data from Elasticsearch and interact with it. Kibana Dev Tools contains four different tools that you can use to play with your data in Elasticsearch.

1. Setup Elasticsearch and Kibana

Check if you have done Elasticsearch and Kibana setup. Follow these links if you have not done setups.

Install Elasticsearch.

Install Kibana.

Load csv file into elasticsearch.

2. What are dev tools components

Dev tools consist of four components. These are Console, Search Profiler, Grok Debugger and Painless lab.

1. Console:

Console interacts with Elasticsearch using REST API. The Console UI has two panes: an editor pane (left) and a response pane (right). 

Use the editor to type requests and submit them to Elasticsearch. The result display happens in the response pane.

Submit requests to ES using the green triangle button.

Use the wrench menu for other useful things like auto-indentation and copy as cURL.

This is how the UI look slike,

Kibana Dev tools UI - extract data from Elasticsearch using Dev tools.
KIbana Dev tools UI
Let’s run some queries and see the response.

We will write some requests which are similar to REST API. it is much simpler.

Open Kibana UI http://localhost:5601/

LIST all indices:

Run the below query to list all Indices in the Elasticsearch cluster.

GET /_cat/indices

This shows the below response. It has all indices along with their total documents and disk size taken.

yellow open tweets                         HXmH78GtTaaN4_QEeqvuMg 1 1  28276   0  15.9mb  15.9mb
Basic query to check documents of tweet index:
GET tweets/_search
{
  "size": 200,
  "query": {
    "match_all": {}
  }
}

Query explanation: I want 200 documents from tweets index.

Response as below. Only one response JSON shown here for simplicity.

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 200,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "c-lgnXQBhEpcX7VCh2b4",
        "_score" : 1.0,
        "_source" : {
          "id" : "1019696670777503700",
          "text" : "VIDEO: “I was in my office. I was minding my own business...” –David Solomon tells $GS interns how he learned he wa… https://t.co/QClAITywXV",
          "timestamp" : "Wed Jul 18 21:33:26 +0000 2018",
          "source" : "GoldmanSachs",
          "symbols" : "GS",
          "company_names" : "The Goldman Sachs",
          "url" : "https://twitter.com/i/web/status/1019696670777503745",
          "verified" : "True"
        }
      }
Get the tweets for a particular tweet source:

I want all the tweets where “source” is “GoldmanSachs”.

GET tweets/_search
{
  "query":{
    "match": {
      "source.keyword": "GoldmanSachs"
    }
  }
}

Response,

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.844339,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "c-lgnXQBhEpcX7VCh2b4",
        "_score" : 9.844339,
        "_source" : {
          "id" : "1019696670777503700",
          "text" : "VIDEO: “I was in my office. I was minding my own business...” –David Solomon tells $GS interns how he learned he wa… https://t.co/QClAITywXV",
          "timestamp" : "Wed Jul 18 21:33:26 +0000 2018",
          "source" : "GoldmanSachs",
          "symbols" : "GS",
          "company_names" : "The Goldman Sachs",
          "url" : "https://twitter.com/i/web/status/1019696670777503745",
          "verified" : "True"
        }
      }
    ]
  }
}
get the tweet from the source company “TheStalwart” and company_namesTwitter“:
GET tweets/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "source.keyword": "TheStalwart"
          }
        },
        {
          "term": {
            "company_names.keyword": "Twitter"
          }
        }
      ]
    }
  }
}

Response,

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 19.688679,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "jOlgnXQBhEpcX7VCh2b4",
        "_score" : 19.688679,
        "_source" : {
          "id" : "1019743063328030700",
          "text" : """RT @josheidelson: Exclusive: Elon Musk called Sierra Club's executive director Saturday and "asked for some help via Twitter" the green gr…""",
          "timestamp" : "Thu Jul 19 00:37:46 +0000 2018",
          "source" : "TheStalwart",
          "symbols" : "TWTR",
          "company_names" : "Twitter",
          "url" : "",
          "verified" : "False"
        }
      }
    ]
  }
}
Get all verfied “BTC” tweets.
GET tweets/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "symbols.keyword": "BTC"
          }
        },
        {
          "term": {
            "verified.keyword": "True"
          }
        }
      ]
    }
  }
}

Response,

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 10.158067,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "dulgnXQBhEpcX7VCh2b4",
        "_score" : 10.158067,
        "_source" : {
          "id" : "1019716662587740200",
          "text" : "Barry Silbert is extremely optimistic on bitcoin -- but predicts that 99% of new crypto entrants are “going to zero… https://t.co/mGMVo2cZgY",
          "timestamp" : "Wed Jul 18 22:52:52 +0000 2018",
          "source" : "MarketWatch",
          "symbols" : "BTC",
          "company_names" : "Bitcoin",
          "url" : "https://twitter.com/i/web/status/1019716662587740160",
          "verified" : "True"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "fOlgnXQBhEpcX7VCh2b4",
        "_score" : 10.158067,
        "_source" : {
          "id" : "1019721145396887600",
          "text" : "Hedge fund manager Marc Larsy says bitcoin $40K is possible https://t.co/54uPe0OWqT",
          "timestamp" : "Wed Jul 18 23:10:41 +0000 2018",
          "source" : "MarketWatch",
          "symbols" : "BTC",
          "company_names" : "Bitcoin",
          "url" : "https://on.mktw.net/2Ntr7k9",
          "verified" : "True"
        }
      }
    ]
  }
}
Text search example: Tweets which are about “bitcoin crypto asset”
GET tweets/_search
{
  "query": {
    "match": {
      "text": {
        "query": "bitcoin crypto asset"
      }
    }
  }
}

Response will be the documents whose “text” field is similar to “bitcoin crypto asset

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 790,
      "relation" : "eq"
    },
    "max_score" : 10.048518,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "JelgnXQBhEpcX7VCnsqB",
        "_score" : 10.048518,
        "_source" : {
          "id" : "1019678218834587600",
          "text" : "'Nuff said!  $TEL #telcoin #Telfam #crypto #Blockchain #ethereum #bitcoin $BTC $ETH https://t.co/dkRvaYzgcd",
          "timestamp" : "Wed Jul 18 20:20:06 +0000 2018",
          "source" : "invest_in_hd",
          "symbols" : "TEL",
          "company_names" : "TE Connectivity Ltd.",
          "url" : "https://twitter.com/CRYPTOVERLOAD/status/1017805976093822977",
          "verified" : "False"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "MOlgnXQBhEpcX7VCnsqB",
        "_score" : 9.573708,
        "_source" : {
          "id" : "1019678398640152600",
          "text" : "RT @invest_in_hd: 'Nuff said!  $TEL #telcoin #Telfam #crypto #Blockchain #ethereum #bitcoin $BTC $ETH https://t.co/dkRvaYzgcd",
          "timestamp" : "Wed Jul 18 20:20:49 +0000 2018",
          "source" : "vikvonm",
          "symbols" : "TEL",
          "company_names" : "TE Connectivity Ltd.",
          "url" : "https://twitter.com/CRYPTOVERLOAD/status/1017805976093822977",
          "verified" : "False"
        }
      }
more like this” : Lets see text more like “Good returns on Google stock”
GET tweets/_search
{
  "query": {
    "more_like_this" : {
      "fields" : ["text"],
      "like" : "Good returns on Google stock",
      "min_term_freq" : 1,
      "max_query_terms" : 12
    }
  }
}

The responses are text like “Good returns on Google stock”

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5436,
      "relation" : "eq"
    },
    "max_score" : 12.626834,
    "hits" : [
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "1-lgnXQBhEpcX7VCi3hf",
        "_score" : 12.626834,
        "_source" : {
          "id" : "1017721714644602900",
          "text" : "We combine indicators that give good returns when backtested. This one is Bearish on $AET https://t.co/6lQzEUnvca https://t.co/I9Ih54nJI8",
          "timestamp" : "Fri Jul 13 10:45:39 +0000 2018",
          "source" : "StockmetrixFeed",
          "symbols" : "AET",
          "company_names" : "Aetna Inc.",
          "url" : "https://stmx.me/1lrH",
          "verified" : "False"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "g-lgnXQBhEpcX7VCmrYI",
        "_score" : 10.996755,
        "_source" : {
          "id" : "1019570955725746200",
          "text" : "Stock price moves based on good news and analysts upgrade. $noc $lmt $ba so many good news on these clowns and get… https://t.co/tvbHYE26GH",
          "timestamp" : "Wed Jul 18 13:13:53 +0000 2018",
          "source" : "carmex212",
          "symbols" : "LMT",
          "company_names" : "Lockheed Martin Corporation",
          "url" : "https://twitter.com/i/web/status/1019570955725746179",
          "verified" : "False"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "tOlgnXQBhEpcX7VClZ9A",
        "_score" : 10.712406,
        "_source" : {
          "id" : "1019250580865323000",
          "text" : "Insider Trades Stock Picks Based on Artificial Intelligence: Returns up to 73.21% in 1 Year https://t.co/QI0ki5I613… https://t.co/KpO7CZDeGm",
          "timestamp" : "Tue Jul 17 16:00:49 +0000 2018",
          "source" : "i_Know_First",
          "symbols" : "CRM",
          "company_names" : "salesforce.com",
          "url" : "http://ow.ly/gmVr30kZmlv",
          "verified" : "False"
        }
      }
2. Search Profile

Elasticsearch has a powerful Profile API which can be used to inspect and analyze your search queries.

It shows a query profile like “Cumulative time” and time taken by each component of ealsticsearch.

Let’s see the profile for one of our previous queries,

Get all verified ‘BTC’ symbols tweets,

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "symbols.keyword": "BTC"
          }
        },
        {
          "term": {
            "verified.keyword": "True"
          }
        }
      ]
    }
  }
}

To profile, this query follow these steps,

i. Click on the “Search Profiler” tab.

ii. Write the index name in the “index” text-box on the left side.

iii. Write the query in the editor just below index text-box.

iv. Click on the “Profile” button on the left side bottom.

Now on the right-hand side, you can query performance like “Cumulative time”, “Self time” and “Total time” of sub-components of the elasticsearch query.

Here it is how looks like,

Search Profiler
Search Profiler
3. Grok Debugger

You can build and debug grok patterns in the Kibana Grok Debugger before you use them in your data processing pipelines. 

Here is how you can use this.

i. Click on Grok Debugger tab.

ii. Enter your log message into the field “Sample Data”.

iii. Write your grok pattern in the “Grok Pattern” field.

iv. Click on the “Simulate” button.

Let’s say my log record is as below,

58.3.244.1 GET /index.html 15824 0.043

I want to extract information like client_ip_address, REST method, what is the request, how many bytes, and duration.

The grok pattern should be like,

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

After extracting the structured data looks as below,

{
  "duration": "0.043",
  "request": "/index.html",
  "method": "GET",
  "bytes": "15824",
  "client": "55.3.244.1"
}

This is how it looks like,

Grok Debugger
Grok Debugger
4. Painless Lab

The Painless Lab is an interactive code editor that lets you test and debug Painless scripts in real-time. This is part of the x-Pack bundle, in which for some components you need to buy licensed.

References:

https://www.elastic.co/guide/en/kibana/current/devtools-kibana.html

So, that was all about Kibana dev tools.

Please share this article. Let me know in comments if you need any help.

Also, you can follow me on Twitter and Instagram to get notified when I post new content.

Happy learning. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

How to do semantic document similarity using BERT Zero-shot classification using Huggingface transformers