0
0
Elasticsearchquery~5 mins

Character filters in Elasticsearch

Choose your learning style9 modes available
Introduction

Character filters change or clean text before it is analyzed. They help fix or remove unwanted characters early.

You want to remove HTML tags from text before searching.
You need to replace special characters like & or @ with words.
You want to fix common typos or symbols in user input before indexing.
You want to normalize text by changing accented letters to plain ones.
You want to remove invisible or control characters that cause errors.
Syntax
Elasticsearch
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": ["&=> and", "@=> at"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["my_char_filter"],
          "tokenizer": "standard"
        }
      }
    }
  }
}

Character filters run before tokenizing the text.

Common types include mapping and pattern_replace.

Examples
This removes HTML tags from text.
Elasticsearch
{
  "char_filter": {
    "html_strip": {
      "type": "html_strip"
    }
  }
}
This replaces the & character with the word 'and'.
Elasticsearch
{
  "char_filter": {
    "replace_amp": {
      "type": "mapping",
      "mappings": ["&=> and"]
    }
  }
}
This removes control characters from the text.
Elasticsearch
{
  "char_filter": {
    "remove_control": {
      "type": "pattern_replace",
      "pattern": "\\p{Cntrl}",
      "replacement": ""
    }
  }
}
Sample Program

This example shows how to replace & and @ symbols before tokenizing text. It helps search understand these symbols as words.

Elasticsearch
{
  "settings": {
    "analysis": {
      "char_filter": {
        "replace_symbols": {
          "type": "mapping",
          "mappings": ["&=> and", "@=> at"]
        }
      },
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "char_filter": ["replace_symbols"],
          "tokenizer": "standard"
        }
      }
    }
  }
}


# Example text to analyze: "Email me at user@example.com & stay safe!"

# After character filter, text becomes: "Email me at user at example.com and stay safe!"

# Tokens produced: ["Email", "me", "at", "user", "at", "example", "com", "and", "stay", "safe"]
OutputSuccess
Important Notes

Character filters only change the raw text before tokenizing.

They do not remove tokens but can change how tokens are created.

Use them to fix or clean text early in the analysis process.

Summary

Character filters modify text before tokenizing.

They help clean or replace unwanted characters.

Common types include mapping and pattern_replace.