Elasticsearchquery~30 mins

Machine learning anomaly detection in Elasticsearch - Mini Project: Build & Apply

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Machine learning anomaly detection

📖 Scenario: You work for a company that collects website traffic data. You want to find unusual spikes or drops in the number of visitors using Elasticsearch's machine learning anomaly detection.

🎯 Goal: Create a simple anomaly detection job in Elasticsearch to identify unusual visitor counts over time.

📋 What You'll Learn

Create an index with sample visitor count data

Define a machine learning job configuration

Use the job to detect anomalies in visitor counts

Output the anomaly detection results

💡 Why This Matters

🌍 Real World

Detecting unusual website traffic spikes helps companies respond quickly to potential issues or opportunities.

💼 Career

Anomaly detection skills are valuable for data scientists and engineers working with monitoring, security, and business intelligence.

Progress0 / 4 steps

Create sample visitor count data index

Create an Elasticsearch index called visitor_counts with documents containing timestamp and count fields. Insert these exact 5 documents: {"timestamp": "2024-01-01T00:00:00Z", "count": 100}, {"timestamp": "2024-01-02T00:00:00Z", "count": 110}, {"timestamp": "2024-01-03T00:00:00Z", "count": 95}, {"timestamp": "2024-01-04T00:00:00Z", "count": 300}, {"timestamp": "2024-01-05T00:00:00Z", "count": 105}.

Elasticsearch

# Create index visitor_counts and insert 5 documents with timestamp and count
# Your code here

Hint

Use PUT to create the index with mappings, then POST _bulk to insert documents.

Define anomaly detection job configuration

Create a machine learning job configuration called visitor_count_anomaly_job that analyzes the visitor_counts index. Use timestamp as the time field and count as the analysis field.

Elasticsearch

PUT /visitor_counts
{
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "count": {"type": "integer"}
    }
  }
}

POST /visitor_counts/_bulk
{ "index": {} }
{ "timestamp": "2024-01-01T00:00:00Z", "count": 100 }
{ "index": {} }
{ "timestamp": "2024-01-02T00:00:00Z", "count": 110 }
{ "index": {} }
{ "timestamp": "2024-01-03T00:00:00Z", "count": 95 }
{ "index": {} }
{ "timestamp": "2024-01-04T00:00:00Z", "count": 300 }
{ "index": {} }
{ "timestamp": "2024-01-05T00:00:00Z", "count": 105 }

# Create ML job visitor_count_anomaly_job with timestamp and count fields
# Your code here

Hint

Use PUT _ml/anomaly_detectors/visitor_count_anomaly_job with analysis_config and data_description.

Start the datafeed to run anomaly detection

Start the datafeed called datafeed-visitor_count_anomaly_job to begin analyzing data from the visitor_counts index.

Elasticsearch

PUT /visitor_counts
{
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "count": {"type": "integer"}
    }
  }
}

POST /visitor_counts/_bulk
{ "index": {} }
{ "timestamp": "2024-01-01T00:00:00Z", "count": 100 }
{ "index": {} }
{ "timestamp": "2024-01-02T00:00:00Z", "count": 110 }
{ "index": {} }
{ "timestamp": "2024-01-03T00:00:00Z", "count": 95 }
{ "index": {} }
{ "timestamp": "2024-01-04T00:00:00Z", "count": 300 }
{ "index": {} }
{ "timestamp": "2024-01-05T00:00:00Z", "count": 105 }

PUT _ml/anomaly_detectors/visitor_count_anomaly_job
{
  "description": "Detect anomalies in visitor counts",
  "analysis_config": {
    "bucket_span": "1d",
    "detectors": [
      {
        "function": "mean",
        "field_name": "count"
      }
    ]
  },
  "data_description": {
    "time_field": "timestamp"
  },
  "datafeed_config": {
    "indices": ["visitor_counts"]
  }
}

# Start the datafeed datafeed-visitor_count_anomaly_job
# Your code here

Hint

Use POST _ml/datafeeds/datafeed-visitor_count_anomaly_job/_start to start the datafeed.

Get anomaly detection results

Retrieve the anomaly detection results for the job visitor_count_anomaly_job and print the bucket_score for each bucket.

Elasticsearch

PUT /visitor_counts
{
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "count": {"type": "integer"}
    }
  }
}

POST /visitor_counts/_bulk
{ "index": {} }
{ "timestamp": "2024-01-01T00:00:00Z", "count": 100 }
{ "index": {} }
{ "timestamp": "2024-01-02T00:00:00Z", "count": 110 }
{ "index": {} }
{ "timestamp": "2024-01-03T00:00:00Z", "count": 95 }
{ "index": {} }
{ "timestamp": "2024-01-04T00:00:00Z", "count": 300 }
{ "index": {} }
{ "timestamp": "2024-01-05T00:00:00Z", "count": 105 }

PUT _ml/anomaly_detectors/visitor_count_anomaly_job
{
  "description": "Detect anomalies in visitor counts",
  "analysis_config": {
    "bucket_span": "1d",
    "detectors": [
      {
        "function": "mean",
        "field_name": "count"
      }
    ]
  },
  "data_description": {
    "time_field": "timestamp"
  },
  "datafeed_config": {
    "indices": ["visitor_counts"]
  }
}

POST _ml/datafeeds/datafeed-visitor_count_anomaly_job/_start

# Retrieve anomaly results and print bucket_score
# Your code here

Hint

Use GET _ml/anomaly_detectors/visitor_count_anomaly_job/results/buckets to get anomaly scores.

Practice

(1/5)

1. What is the main purpose of machine learning anomaly detection in Elasticsearch?

easy

A. To automatically find unusual patterns in data

B. To store large amounts of data efficiently

C. To create visual dashboards for data

D. To backup Elasticsearch clusters

Machine learning anomaly detection in Elasticsearch - Mini Project: Build & Apply

Start learning this pattern below

Practice

Solution

Step 1: Understand anomaly detection goal

Step 2: Compare options with purpose

Final Answer:

Quick Check:

Solution

Step 1: Identify datafeed start API

Step 2: Eliminate other options

Final Answer:

Quick Check:

Solution

Step 1: Understand anomaly score meaning

Step 2: Identify timestamp with high score

Final Answer:

Quick Check:

Solution

Step 1: Check datafeed status

Step 2: Evaluate other options

Final Answer:

Quick Check:

Solution

Step 1: Create ML job with traffic data

Step 2: Start the datafeed to feed data into the job

Step 3: Analyze the anomaly detection results

Final Answer:

Quick Check: