Bird
Raised Fist0
Elasticsearchquery~10 mins

Machine learning anomaly detection in Elasticsearch - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create an anomaly detection job with a unique ID.

Elasticsearch
PUT _ml/anomaly_detectors/[1]
{
  "description": "Detect anomalies in web traffic",
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [{ "function": "mean", "field_name": "response_time" }]
  },
  "data_description": { "time_field": "timestamp" }
}
Drag options to blanks, or click blank then click option'
Aweb_traffic_anomaly
Bcreate_job
Canomaly_detector
Dml_job_01
Attempts:
3 left
💡 Hint
Common Mistakes
Using reserved keywords like 'create_job' as the job ID.
Including spaces or special characters in the job ID.
2fill in blank
medium

Complete the code to start datafeed for the anomaly detection job.

Elasticsearch
POST _ml/datafeeds/datafeed-[1]/_start
Drag options to blanks, or click blank then click option'
Ajob_02
Bdatafeed_01
Canomaly_job
Dweb_traffic_anomaly
Attempts:
3 left
💡 Hint
Common Mistakes
Using a datafeed name that does not match the job ID.
Starting the datafeed before creating the job.
3fill in blank
hard

Fix the error in the anomaly detection job creation by completing the missing field name.

Elasticsearch
PUT _ml/anomaly_detectors/web_traffic_anomaly
{
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [{ "function": "mean", "field_name": "[1]" }]
  },
  "data_description": { "time_field": "timestamp" }
}
Drag options to blanks, or click blank then click option'
Auser_id
Bresponse_time
Ctimestamp
Dstatus_code
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'timestamp' as a field to analyze, which is a time field.
Using categorical fields like 'user_id' or 'status_code' for mean function.
4fill in blank
hard

Fill both blanks to filter data for anomaly detection to only include status code 500 errors.

Elasticsearch
PUT _ml/anomaly_detectors/error_500_anomaly
{
  "analysis_config": {
    "bucket_span": "10m",
    "detectors": [{ "function": "count", "by_field_name": "[1]" }]
  },
  "data_description": {
    "time_field": "timestamp",
    "filter": { "term": { "[2]": 500 } }
  }
}
Drag options to blanks, or click blank then click option'
Astatus_code
Bresponse_time
Duser_id
Attempts:
3 left
💡 Hint
Common Mistakes
Using different fields for grouping and filtering.
Filtering on a field that is not numeric or relevant.
5fill in blank
hard

Fill all three blanks to create a job that detects high average CPU usage by host.

Elasticsearch
PUT _ml/anomaly_detectors/cpu_usage_anomaly
{
  "description": "Detect high CPU usage",
  "analysis_config": {
    "bucket_span": "5m",
    "detectors": [{ "function": "mean", "field_name": "[1]", "by_field_name": "[2]" }]
  },
  "data_description": { "time_field": "[3]" }
}
Drag options to blanks, or click blank then click option'
Acpu_percent
Bhost.name
Ctimestamp
Dmemory_usage
Attempts:
3 left
💡 Hint
Common Mistakes
Using memory usage instead of CPU percent for CPU anomaly detection.
Using incorrect time field name.

Practice

(1/5)
1. What is the main purpose of machine learning anomaly detection in Elasticsearch?
easy
A. To automatically find unusual patterns in data
B. To store large amounts of data efficiently
C. To create visual dashboards for data
D. To backup Elasticsearch clusters

Solution

  1. Step 1: Understand anomaly detection goal

    Machine learning anomaly detection is designed to find unusual or unexpected patterns in data automatically.
  2. Step 2: Compare options with purpose

    Options B, C, and D describe other Elasticsearch features, not anomaly detection.
  3. Final Answer:

    To automatically find unusual patterns in data -> Option A
  4. Quick Check:

    Purpose of anomaly detection = find unusual patterns [OK]
Hint: Anomaly detection finds unusual data automatically [OK]
Common Mistakes:
  • Confusing anomaly detection with data storage
  • Thinking anomaly detection creates dashboards
  • Mixing anomaly detection with backup tasks
2. Which Elasticsearch API call starts the anomaly detection process by feeding data to the job?
easy
A. POST _ml/anomaly_detectors/<job_id>/_start_datafeed
B. GET _ml/anomaly_detectors/<job_id>/results
C. PUT _ml/anomaly_detectors/<job_id>
D. DELETE _ml/anomaly_detectors/<job_id>

Solution

  1. Step 1: Identify datafeed start API

    The API to start feeding data to an anomaly detection job is POST _ml/anomaly_detectors/<job_id>/_start_datafeed.
  2. Step 2: Eliminate other options

    GET retrieves results, PUT creates or updates jobs, DELETE removes jobs.
  3. Final Answer:

    POST _ml/anomaly_detectors/<job_id>/_start_datafeed -> Option A
  4. Quick Check:

    Start datafeed = POST _start_datafeed [OK]
Hint: Start datafeed uses POST with _start_datafeed endpoint [OK]
Common Mistakes:
  • Using GET instead of POST to start datafeed
  • Confusing job creation with starting datafeed
  • Deleting job instead of starting datafeed
3. Given this Elasticsearch ML job result snippet:
{"job_id":"sales_anomaly","results":[{"timestamp":1680000000000,"anomaly_score":75},{"timestamp":1680003600000,"anomaly_score":5}]}

Which timestamp shows a likely anomaly?
medium
A. Neither timestamp
B. 1680003600000
C. Both timestamps
D. 1680000000000

Solution

  1. Step 1: Understand anomaly score meaning

    Higher anomaly scores indicate more unusual data points. A score of 75 is high, 5 is low.
  2. Step 2: Identify timestamp with high score

    The timestamp 1680000000000 has anomaly_score 75, indicating a likely anomaly.
  3. Final Answer:

    1680000000000 -> Option D
  4. Quick Check:

    High anomaly score = likely anomaly [OK]
Hint: Higher anomaly_score means more likely anomaly [OK]
Common Mistakes:
  • Choosing low anomaly score as anomaly
  • Selecting both timestamps without checking scores
  • Ignoring anomaly_score values
4. You created an anomaly detection job but see no results after starting the datafeed. What is a likely cause?
medium
A. The job was deleted before starting
B. The Elasticsearch cluster is offline
C. The datafeed is not running or has stopped
D. The anomaly scores are all zero

Solution

  1. Step 1: Check datafeed status

    If no results appear, the datafeed may not be running or has stopped feeding data to the job.
  2. Step 2: Evaluate other options

    Job deletion would prevent starting datafeed; cluster offline causes broader failures; zero scores still produce results.
  3. Final Answer:

    The datafeed is not running or has stopped -> Option C
  4. Quick Check:

    No results usually mean datafeed stopped [OK]
Hint: No results? Check if datafeed is running [OK]
Common Mistakes:
  • Assuming zero scores mean no results
  • Ignoring datafeed status
  • Blaming cluster offline without checking datafeed
5. You want to detect unusual spikes in website traffic using Elasticsearch ML anomaly detection. Which steps should you follow to set this up correctly?
hard
A. Backup traffic data, create index pattern, then visualize spikes
B. Create a job with traffic data, start datafeed, then analyze anomaly results
C. Create a dashboard, upload traffic logs, then run anomaly detection manually
D. Delete old data, create job without datafeed, then check results

Solution

  1. Step 1: Create ML job with traffic data

    Define an anomaly detection job using the website traffic data to analyze patterns.
  2. Step 2: Start the datafeed to feed data into the job

    Start the datafeed so the job can process incoming traffic data continuously.
  3. Step 3: Analyze the anomaly detection results

    Review the results to identify unusual spikes or anomalies in traffic.
  4. Final Answer:

    Create a job with traffic data, start datafeed, then analyze anomaly results -> Option B
  5. Quick Check:

    Job + datafeed + analyze = correct setup [OK]
Hint: Job creation + datafeed start + result check = setup [OK]
Common Mistakes:
  • Skipping datafeed start step
  • Confusing dashboards with anomaly detection setup
  • Deleting data before analysis