Bird
Raised Fist0
Elasticsearchquery~30 mins

Discover for data exploration in Elasticsearch - Mini Project: Build & Apply

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Discover for data exploration
📖 Scenario: You are working with a collection of sales data stored in Elasticsearch. You want to explore this data to find useful insights like total sales per product category.
🎯 Goal: Build a simple Elasticsearch query to discover and aggregate sales data by product category.
📋 What You'll Learn
Create an Elasticsearch index with sample sales data
Define a filter to select sales from the year 2023
Write an aggregation query to sum sales amounts by product category
Print the aggregation results
💡 Why This Matters
🌍 Real World
Exploring sales data helps businesses understand which product categories perform best and make informed decisions.
💼 Career
Data analysts and developers use Elasticsearch queries to explore and summarize large datasets quickly.
Progress0 / 4 steps
1
Create sample sales data index
Create an Elasticsearch index called sales with documents containing fields product_category (string), sale_amount (number), and sale_year (number). Add exactly these three documents: {"product_category": "Books", "sale_amount": 100, "sale_year": 2023}, {"product_category": "Electronics", "sale_amount": 250, "sale_year": 2023}, {"product_category": "Books", "sale_amount": 150, "sale_year": 2022}.
Elasticsearch
Hint

Use the PUT method to add documents to the sales index with the exact fields and values.

2
Define filter for sales in 2023
Create a variable called filter_2023 that contains a filter to select documents where sale_year is exactly 2023.
Elasticsearch
Hint

Use a term filter to match sale_year exactly 2023.

3
Write aggregation query for total sales by category
Create a variable called agg_query that contains an Elasticsearch query to filter by filter_2023 and aggregate total sale_amount by product_category using a terms aggregation and a sum sub-aggregation.
Elasticsearch
Hint

Use a bool query with filter and a terms aggregation on product_category with a sum sub-aggregation on sale_amount.

4
Print aggregation results
Print the agg_query variable to display the full Elasticsearch query.
Elasticsearch
Hint

Use print(agg_query) to display the query.

Practice

(1/5)
1. What is the main purpose of the Discover feature in Elasticsearch?
easy
A. To explore and filter raw data in indexes
B. To create visual dashboards
C. To manage Elasticsearch cluster settings
D. To write complex aggregation queries

Solution

  1. Step 1: Understand Discover's role

    Discover is designed to let users explore raw data quickly and easily.
  2. Step 2: Compare with other features

    Dashboard creation and cluster management are separate features, not Discover's focus.
  3. Final Answer:

    To explore and filter raw data in indexes -> Option A
  4. Quick Check:

    Discover = Data exploration [OK]
Hint: Discover = explore raw data quickly [OK]
Common Mistakes:
  • Confusing Discover with Dashboard
  • Thinking Discover manages cluster settings
  • Assuming Discover creates complex queries
2. Which of the following is the correct syntax to filter data in Discover using a simple query?
easy
A. filter(status=200, extension=jpg)
B. WHERE status=200 AND extension=jpg
C. status:200 AND extension:jpg
D. SELECT * FROM index WHERE status=200

Solution

  1. Step 1: Identify Discover query syntax

    Discover uses Lucene query syntax like field:value and logical operators like AND.
  2. Step 2: Eliminate SQL and function syntax

    Options A, C, and D use SQL or function style, which is not valid in Discover queries.
  3. Final Answer:

    status:200 AND extension:jpg -> Option C
  4. Quick Check:

    Lucene syntax = status:200 AND extension:jpg [OK]
Hint: Use field:value with AND/OR in Discover queries [OK]
Common Mistakes:
  • Using SQL syntax instead of Lucene
  • Using function calls for filtering
  • Mixing query languages
3. Given the following Discover query: response:404 OR response:500, what data will be shown?
medium
A. All documents except those with response 404 or 500
B. Only documents with response code 404
C. Documents with response code 404 and 500 at the same time
D. Documents with response code 404 or 500

Solution

  1. Step 1: Understand OR operator in query

    The OR operator returns documents matching either condition, not both simultaneously.
  2. Step 2: Apply to response codes

    Documents with response 404 or response 500 will be included in results.
  3. Final Answer:

    Documents with response code 404 or 500 -> Option D
  4. Quick Check:

    OR means either condition matches [OK]
Hint: OR returns either condition matches [OK]
Common Mistakes:
  • Thinking OR means both conditions together
  • Confusing OR with AND
  • Assuming exclusion of matching documents
4. You wrote this Discover query: status:200 AND extension=jpg. Why does it cause an error?
medium
A. Because '=' is not valid; use ':' for field-value pairs
B. Because AND cannot be used between conditions
C. Because 'status' is not a valid field name
D. Because 'jpg' should be in quotes

Solution

  1. Step 1: Check field-value syntax

    Discover uses field:value syntax, not field=value.
  2. Step 2: Validate operators and values

    AND is valid, 'status' is a common field, and quotes are optional for simple values.
  3. Final Answer:

    Because '=' is not valid; use ':' for field-value pairs -> Option A
  4. Quick Check:

    Use ':' not '=' in queries [OK]
Hint: Use colon ':' for field-value, not equals '=' [OK]
Common Mistakes:
  • Using '=' instead of ':'
  • Misunderstanding AND operator usage
  • Adding unnecessary quotes
5. You want to explore documents where the field user exists and the bytes field is greater than 1000. Which Discover query achieves this?
hard
A. _exists_:user AND bytes >1000
B. _exists_:user AND bytes:{1000 TO *}
C. _exists_:user AND bytes:>=1000
D. user:* AND bytes:>1000

Solution

  1. Step 1: Check existence syntax

    Use _exists_:user to find documents where 'user' field exists.
  2. Step 2: Use range query for bytes > 1000

    Range syntax bytes:{1000 TO *} means bytes greater than 1000 (exclusive).
  3. Step 3: Verify other options

    _exists_:user AND bytes:>1000 and C have invalid range syntax; user:* AND bytes:>1000 uses wildcard incorrectly for existence.
  4. Final Answer:

    _exists_:user AND bytes:{1000 TO *} -> Option B
  5. Quick Check:

    Existence + range query = _exists_:user AND bytes:{1000 TO *} [OK]
Hint: Use _exists_ for field and range syntax for > value [OK]
Common Mistakes:
  • Using wildcard * for existence check
  • Incorrect range syntax for greater than
  • Confusing inclusive and exclusive ranges