Elasticsearch in Action: From Basics to Deploying a Search Service in a Docker Container

Welcome to this comprehensive guide on Elasticsearch, where we will explore the ins and outs of this powerful search and analytics engine, learn about its various use cases, and create a real-life search service using Docker containers. Whether you are a beginner looking to get started with Elasticsearch or an experienced developer seeking to enhance your knowledge, this blog post has something for everyone.

Elasticsearch has become an indispensable tool for many organizations, powering search and analytics for websites, applications, and infrastructure monitoring. Its robust capabilities, scalability, and ease of use make it a popular choice for handling large-scale data search and analysis tasks. In this post, we will first dive into the basics of Elasticsearch, understanding its key concepts and architecture. Next, we will discuss its diverse range of applications, from full-text search to log analysis and beyond.

After establishing a solid foundation, we will guide you through creating a real-life search service using Elasticsearch and Docker. We will cover the process step by step, from setting up the Elasticsearch Docker container to configuring the search service and integrating it with your application. By the end of this blog post, you will have a deeper understanding of Elasticsearch and the skills to build and deploy a search service in a Docker container.

Join us on this exciting journey as we unlock the full potential of Elasticsearch and learn how to harness its power to create efficient, scalable, and user-friendly search experiences.

What is Elasticsearch and when you need to use it?

Elasticsearch is an open-source, distributed search and analytics engine built on top of Apache Lucene. It is designed for high performance, reliability, and scalability, enabling you to search, analyze, and store large volumes of data quickly and efficiently. Elasticsearch provides near real-time search capabilities and advanced analytical features, making it suitable for various use cases across different industries and applications.

You may need to use Elasticsearch when:

  1. Full-text search: If you have a large amount of textual data and need to build search functionality with advanced features such as multi-language support, fuzzy matching, autocomplete, and faceted search, Elasticsearch is an excellent choice.
  2. Log and event data analysis: Elasticsearch is often used with Logstash and Kibana (the ELK Stack) for log data storage, processing, and visualization. If you need to analyze logs and event data from applications, systems, or infrastructure for monitoring, troubleshooting, or security purposes, Elasticsearch is a powerful solution.
  3. Analytics: Elasticsearch supports real-time analytics through its aggregation framework, allowing you to summarize and analyze large amounts of data quickly. This is useful for applications like business intelligence, data visualization, and reporting.
  4. Scalable data storage: If you require a highly scalable data storage solution that can handle structured and unstructured data, Elasticsearch’s distributed architecture enables horizontal scaling and easy management of large data volumes.
  5. Geospatial search and analysis: Elasticsearch supports geospatial data and provides geospatial search capabilities, making it suitable for location-based applications such as mapping, geofencing, and asset tracking.

Elasticsearch is a versatile and powerful search and analytics engine that can be used for a variety of applications where speed, scalability, and advanced search features are important. If you need to handle large-scale data search and analysis tasks, Elasticsearch is a popular and proven solution.

Is Elasticsearch an SQL Database?

Elasticsearch can be considered a type of NoSQL database, as it stores and retrieves data in a non-tabular, schema-free format using JSON documents. It is designed primarily as a search and analytics engine, but it can also function as a data store for specific use cases.

However, Elasticsearch is not a traditional relational database management system (RDBMS) like MySQL, PostgreSQL, or SQL Server, which use tables, rows, and columns to store and manage data. Instead, Elasticsearch uses a different data model, organizing data into indices, which are similar to tables, and documents, which are similar to rows in a table. Each document contains fields and their values, similar to columns in a table.

While Elasticsearch can be used as a data store, it’s important to note that it is not a drop-in replacement for traditional relational databases. Elasticsearch excels in search and analytics use cases, such as full-text search, log analysis, and data visualization. It may not be suitable for transactional or complex relational data management scenarios, where a traditional RDBMS would be a better fit.

Time for Some Action: Creating an Elasticsearch Service in a Docker Container

Now that we have a solid understanding of Elasticsearch and its capabilities, it’s time to put theory into practice and see it in action. In this section, we will walk you through the process of creating an Elasticsearch service running inside a Docker container. This approach simplifies the setup, configuration, and management of Elasticsearch, making it easy to create an isolated environment for your search service.

Without further ado, let’s dive into creating an Elasticsearch service inside a Docker container

First, Create a Docker Compose file in your project directory, create a file named docker-compose.yml and add the following content to set up a single Elasticsearch node

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - esdata:/usr/share/elasticsearch/data

volumes:
  esdata:

This configuration creates a single-node Elasticsearch cluster and binds the container’s ports 9200 and 9300 to the host machine, allowing you to access the Elasticsearch service using these ports.

Now start the Elasticsearch Docker container. In your terminal or command prompt, navigate to the directory containing the docker-compose.yml file and run the following command

docker-compose up

This command starts the Elasticsearch service inside a Docker container. After a few moments, you should see log messages indicating that Elasticsearch is running and ready to accept requests.

Congratulations! You have successfully created an Elasticsearch service inside a Docker container. You can now interact with the Elasticsearch REST API via port 9200 on your host machine. In the following sections, we will explore how to index data, execute search queries, and perform various operations using the Elasticsearch API.

Configuring Elasticsearch: Creating and Mapping an Index

In Elasticsearch, the equivalent of a table in traditional relational database is called an “index.” To create an index, you can use the Elasticsearch RESTful API.

Now we will create an index in Elasticsearch that will store and search the data based on the structure we will use our demo case. We’ll first define the index structure, explain each field, and then create the index and its mappings within Elasticsearch.

Index Structure

Our Elasticsearch index will have the following fields:

RelatedId: Integer (Primary key on our remote system)
CategoryId: Integer (Category IDs of our remote system)
LocationId: Integer (Location IDs of our remote system)
DateUpdated: Datetime (Last update date)
DateExpiration: Datetime (Expiration date)
Priority: Byte (Bigger numbers orders first on search)
SearchField: Text (Text field to search inside)

Creating the Index and Mapping

To create the index and configure its mappings, execute a PUT request to the Elasticsearch REST API with the following JSON payload.

Let’s call our index entity-search. So in our case you’ll need to send PUT this request to following URL.

http://localhost:9200/entity-search
{
  "mappings": {
    "properties": {
      "RelatedId": {
        "type": "integer"
      },
      "CategoryId": {
        "type": "integer",
        "null_value": -1
      },
      "LocationId": {
        "type": "integer",
        "null_value": -1
      },
      "DateUpdated": {
        "type": "date"
      },
      "DateExpiration": {
        "type": "date"
      },
      "Priority": {
        "type": "byte"
      },
      "SearchField": {
        "type": "text"
      }
    }
  }
}

This request creates an index with the specified fields and types. We use null_value to handle nullable fields like CategoryId and LocationId.

How can I list all existing indexes in Elasticsearch?

To list all the indexes in Elasticsearch, you can send a request to the /_cat/indices API endpoint. This endpoint provides information about your indexes in a human-readable, tabular format by default. You can also request the output in JSON format by adding the format=json query parameter.

http://localhost:9200/_cat/indices?format=json

The response of this call returns like below. You can see our newly created index entity-search and one other called .geoip_databases which comes with the Elasticsearch installation.

[
  {
    "health": "green",
    "status": "open",
    "index": ".geoip_databases",
    "uuid": "c7yK9hMdTiCrUIQDVCmAEg",
    "pri": "1",
    "rep": "0",
    "docs.count": "42",
    "docs.deleted": "0",
    "store.size": "40.5mb",
    "pri.store.size": "40.5mb"
  },
  {
    "health": "yellow",
    "status": "open",
    "index": "entity-search",
    "uuid": "FmimZcy5RXeuItByMvA6Kw",
    "pri": "1",
    "rep": "1",
    "docs.count": "0",
    "docs.deleted": "0",
    "store.size": "208b",
    "pri.store.size": "208b"
  }
]

Adding a Document to the Index in Elasticsearch

Each record on our Index called a document in Elasticsearch like other no-SQL systems. It’s similar to a row on relational systems. Now we’ll explain how to add and update documents in the Elasticsearch index. We’ll also add some dummy records to demonstrate the process.

To add a document to the index, execute a POST request to the Elasticsearch REST API with the following JSON payload.

http://localhost:9200/entity-search/_doc
{
  "RelatedId": 1,
  "CategoryId": 101,
  "LocationId": 201,
  "DateUpdated": "2023-03-26T10:00:00Z",
  "DateExpiration": "2023-04-26T10:00:00Z",
  "Priority": 5,
  "SearchField": "Sample text for searching"
}

This request adds a new document to the index with the specified field values and return a json object as below.

{
  "_index": "entity-search",
  "_type": "_doc",
  "_id": "Ob7_PIcBZYBCqOaqTbsI",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

You can see there is an _id field on the response of adding a new document to Elasticsearch. You need to keep this if you like to point directly to this document in the future.

Updating a Document in the Index in Elasticsearch

To update an existing document in the index, you need to know its _id, which is automatically generated by Elasticsearch when you add a document unless you explicitly specify one as we mentioned above. Execute a POST request to the Elasticsearch REST API with the following JSON payload to update a document. Replace <DOC_ID> with the document’s _id, and provide the updated values for the fields:

http://localhost:9200/entity-search/_update/Ob7_PIcBZYBCqOaqTbsI
{
  "doc": {
    "CategoryId": 102,
    "LocationId": 202,
    "DateUpdated": "2023-03-27T10:00:00Z",
    "Priority": 6,
    "SearchField": "Updated sample text for searching"
  }
}

Updating Elasticsearch documents by a query

If you didn’t save the _id and want to update your Elasticsearch document by let’s say RelatedId field, you first need to search for the document with a match query, and then use the Update API to update the document. Here’s the search request to find the document by RelatedId equals to 1

http://localhost:9200/entity-search/_search
{
  "query": {
    "term": {
      "RelatedId": 1
    }
  },
  "_source": false
}

After finding the document, you can use the Update API to update it. To do this, you’ll need the _id of the document from the search results. Let’s say the _id is abcd1234, and you want to update the Priority field to 5. The update request would look like this

http://localhost:9200/entity-search/_update/abcd1234
{
  "doc": {
    "Priority": 5
  }
}

Replace abcd1234 with the actual _id of the document you want to update. Modify the doc object to include the fields you want to update and their new values.

Note that this process involves two separate HTTP requests: one for searching the document by RelatedId, and another for updating it using its _id.

But, isn’t it possible to update Elasticsearch documents by a query on a single http call? Yes, you can also update documents by query in a single HTTP call using the Update By Query API.

Bulk Update Elasticsearch documents

The following example demonstrates how to update the Priority field of all documents with RelatedId equals to 1:

http://localhost:9200/entity-search/_update_by_query
{
  "query": {
    "term": {
      "RelatedId": 1
    }
  },
  "script": {
    "source": "ctx._source.Priority = params.newPriority",
    "params": {
      "newPriority": 5
    }
  }
}

Modify the query object to match the documents you want to update, and update the script object to set the new field values.

The Update By Query API allows you to update multiple documents that match a query in a single request. In this example, the query object finds all documents with RelatedId equals to 1, and the script object updates the Priority field to 5.

Keep in mind that the Update By Query API may have a performance impact if you are updating a large number of documents, as it needs to reindex the updated documents.

Additionally, the Update By Query API is part of the Elasticsearch reindex module, which might not be available in all Elasticsearch distributions. Make sure to check your Elasticsearch distribution’s documentation for details.

Searching the Elasticsearch Index

Now that we’ve set up our Elasticsearch index, it’s time to dive into the searching capabilities it offers. In this section, we’ll explore how to search within our index and retrieve the relevant results based on different scenarios. By understanding and leveraging Elasticsearch’s powerful search features, you can effectively query your data and discover valuable insights tailored to your specific needs. Let’s get started with searching our Elasticsearch index!

To search the index, execute a POST request to the Elasticsearch REST API with the following JSON payload. In this example, we’ll create a search request to find all documents with a CategoryId of either 1 or 2 and a LocationId of 3. To achieve this, we will use the bool query with must and should clauses in combination with the terms query.

http://localhost:9200/entity-search/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "LocationId": 3
          }
        }
      ],
      "should": [
        {
          "terms": {
            "CategoryId": [1, 2]
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

This search query combines a must clause to match documents with a LocationId of 3, and a should clause with a terms query to match documents with a CategoryId of either 1 or 2. The minimum_should_match parameter is set to 1, which means that at least one of the conditions in the should clause must be satisfied for a document to be considered a match.

Enhancing the Search

Now let’s develop our example further by adding a search term and sorting the results. We will modify our search query to include a match query for a search term and utilize the sort feature to order the results by Priority descending, followed by DateUpdated descending.

To perform this search, use the following query

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "LocationId": 3
          }
        },
        {
          "match": {
            "SearchField": "your_search_term"
          }
        }
      ],
      "should": [
        {
          "terms": {
            "CategoryId": [1, 2]
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "sort": [
    { "Priority": "desc" },
    { "DateUpdated": "desc" }
  ]
}

This search query extends the previous example by adding a match query inside the must clause to match documents containing the specified search term in the SearchField. The sort field orders the results by Priority descending, followed by DateUpdated descending.

This request will return all documents in the entity-search index that match the specified criteria, ordered by Priority and DateUpdated.

Adding Pagination and Retrieving Total Count of an Elasticsearch Index Search

Finally let’s again develop our example further by implementing pagination and retrieving the total count of search results. We will modify our search query to include the from and size parameters for pagination and use the _source field to control which fields are returned in the search results.

To perform this search, use the following query

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "LocationId": 3
          }
        },
        {
          "match": {
            "SearchField": "your_search_term"
          }
        }
      ],
      "should": [
        {
          "terms": {
            "CategoryId": [1, 2]
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "from": 0,
  "size": 10,
  "sort": [
    { "Priority": "desc" },
    { "DateUpdated": "desc" }
  ],
  "_source": ["RelatedId"]
}

This search query extends the previous example by adding the from and size parameters for pagination. The from parameter specifies the starting document index, and the size parameter determines the number of documents to return per page. The _source field is set to return only the RelatedId field in the search results.

This request will return a paginated list of documents in the entity-search index that match the specified criteria, ordered by Priority and DateUpdated, along with the total count of matching documents.

To access the total count of matching documents, you can check the hits.total.value field in the search response.

What are the Other Elasticsearch Alternatives

While Elasticsearch is a powerful and popular search and analytics engine, there are other options available that may better suit your specific requirements or preferences. In this section, we will briefly introduce some Elasticsearch alternatives and compare their pros and cons to help you make an informed decision when choosing a search engine for your projects.

NameDescriptionProsCons
Apache SolrAn open-source search platform built on Apache Lucene, designed for scalability and performanceHighly scalable, feature-rich, mature, and widely usedSteeper learning curve, complex configuration, slower development pace
Amazon CloudSearchA fully managed search service provided by Amazon Web Services (AWS)Easy to set up and maintain, automatically scales, integrates with other AWS servicesLimited features compared to Elasticsearch, vendor lock-in, potentially higher cost
AlgoliaA hosted search-as-a-service solution that focuses on providing fast and relevant search resultsExcellent performance, easy to use, feature-rich, good documentation, strong community supportLimited customization, vendor lock-in, potentially higher cost for large-scale applications
Azure Cognitive SearchA fully managed search service provided by Microsoft AzureEasy to set up and maintain, scales automatically, integrates with other Azure servicesLimited features compared to Elasticsearch, vendor lock-in, potentially higher cost
A comparision of Elasticsearch alternatives

Each alternative has its own strengths and weaknesses, so it’s essential to carefully consider your project’s requirements and constraints before making a decision. Factors such as performance, ease of use, scalability, customization, cost, and integration with your existing technology stack are all important considerations when evaluating search engine alternatives. By understanding the pros and cons of each option, you can choose the search engine that best aligns with your project’s needs and goals.

Conclusion: Unleashing the Power of Elasticsearch for Effective Search

In this blog post, we’ve covered the fundamentals of Elasticsearch, from understanding its core concepts to deploying and configuring a search service in a Docker container. We’ve demonstrated how to create an index, add documents, update documents with queries, and perform complex searches using Elasticsearch’s powerful query DSL.

We’ve also explored some alternatives to Elasticsearch, helping you make informed decisions when choosing a search engine for your projects. Elasticsearch has proven to be a powerful and flexible search engine, suitable for a wide range of use cases and applications. By combining Elasticsearch with Docker, you can quickly and easily deploy a robust search service that scales with your needs.

As you move forward, don’t hesitate to explore the Elasticsearch documentation and community resources to further expand your knowledge and skills. By investing time in learning and mastering Elasticsearch, you can unlock its full potential and transform the search experience for your users, helping them find the information they need quickly and efficiently.

Leave a comment

Your email address will not be published. Required fields are marked *