Elasticsearch is a search engine for different data types like text, numeric, geospatial, structured, and unstructured data.
It is “elastic“, which means it can be scaled to petabytes of data, and searching for information is superfast.
Searching for information is super fast and easy than the relational databases.
- It is built on Apache Lucene. Apache Lucene is an open-source, high-performance text search engine written in Java.
- It is the main part of “Elastic Stack” which has different tools for data ingestion, enrichment, storage, analysis, and visualization.
- The elastic stack consists of Elasticsearch, Logstash, and Kibana.
- It also has many lightweight data collection agents called Beats for pushing data to Elasticsearch.
How does it work?
- Data collection: Raw data collected from various data sources like logs, system metrics. from different web applications.
- Data ingestion: In this process, raw data is parsed, normalized, and enriched before it gets indexed into elasticsearch.
- Indexing is initiated with the index API, through which you can add or update a JSON document in a specific index.
- An index is a collection of documents that are related to each other.
- Elasticsearch stores data as a collection of JSON documents in these indices.
- The document that gets indexed in an index, correlates a set of keys with their corresponding value. (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).
- Elasticsearch uses a data structure called an inverted index.
- An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.
- During the indexing process, Elasticsearch stores documents and builds an inverted index to make the document data searchable in near real-time.
- Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data.
Why use elasticsearch?
- Elasticsearch is super fast:
- Elasticsearch is based on Lucene, which is a high-performance text search engine.
- Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short typically within one second.
- Because of this Elasticsearch has mostly used in time-sensitive use cases such as security analytics and infrastructure monitoring.
- Elasticsearch is distributed by nature:
- The documents stored in Elasticsearch are distributed across different containers known as shards.
- Shards are duplicated to provide redundant copies of the data in case of hardware failure.
- The distributed nature of Elasticsearch allows it to scale out to hundreds (or even thousands) of servers and handle petabytes of data.
- Elasticsearch comes with a wide set of features:
- In addition to its speed, scalability, and resiliency, Elasticsearch has a number of powerful built-in features that make storing and searching data even more efficient, such as data rollups and index lifecycle management.
- The Elastic Stack simplifies data ingest, visualization, and reporting:
- Integration with Beats and Logstash makes it easy to process data before indexing into Elasticsearch.
- Kibana provides real-time visualization of Elasticsearch data as well as UIs for quickly accessing application performance monitoring (APM), logs, and infrastructure metrics data.
What are the elasticsearch use cases?
- Add full-text search functionality to an app or website.
- Store and analyze logs, metrics, and security event data.
- Use machine learning to automatically model the behavior of your data in real-time i.e detect anomalies and forecast metrics.
- Automate business workflows using Elasticsearch as a storage engine
- Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS)
- Store and process genetic data using Elasticsearch as a bioinformatics research tool.
- Visualize data in elasticsearch using Kibana.
What is Elasticsearch is good for?
Elasticsearch is good for fast, scalable, near-real-time search which can search on different data types like text, logs, system metric data, geoinformation data easily.
Where is the data stored in elasticsearch?
The JSON documents are indexed in an index in Elasticsearch. These indices are distributed over different containers called shards. Each shard is a replica of Lucene index.
As data is written to a shard, it is periodically published into new immutable Lucene segments on disk, and it is at this time it becomes available for querying.
Why is Elasticsearch so fast?
Elasticsearch is a JSON document store built upon the Apache Lucene search engine. Lucene indexes documents in an inverted index, where it lists every unique word that appears in any document and identifies all of the documents each word occurs in. So, it quickly knows in which all documents, a particular word occurs instead of searching in all documents. This makes the query faster.
Why is Elasticsearch so popular?
- Elasticsearch provides low latency indexing of JSON documents and powerful search.
- Easy to set up and configure.
- Its search API/query DSL is simple to follow.
- The query DSL for aggregation, filtering data is very simple.
- ELK stack (Elasticsearch + Logstash + Kibana) helps end to end data analysis easier.
- It has built-in REST API for crud operations.
How to install Elasticsearch?
I have written simple steps to install Elasticsearch:- Install Elasticsearch in Ubuntu 20.04