Information extraction patterns help computers find useful facts from text. They make it easier to pick out names, dates, or places automatically.
0
0
Information extraction patterns in NLP
Introduction
You want to find all names of people mentioned in news articles.
You need to extract dates and times from emails to schedule meetings.
You want to pull out product names and prices from online reviews.
You want to identify locations mentioned in travel blogs.
You want to organize large documents by extracting key facts like company names or events.
Syntax
NLP
pattern = [{'LOWER': 'name'}, {'IS_PUNCT': True, 'OP': '?'}, {'ENT_TYPE': 'PERSON'}]Patterns are lists of dictionaries describing word features.
Each dictionary can check word text, punctuation, or entity types.
Examples
This pattern finds phrases like 'born 1990 in Paris'.
NLP
pattern = [{'LOWER': 'born'}, {'IS_DIGIT': True}, {'LOWER': 'in'}, {'ENT_TYPE': 'GPE'}]This finds organization names followed by the word 'headquarters'.
NLP
pattern = [{'ENT_TYPE': 'ORG'}, {'LOWER': 'headquarters'}]This matches one or more words followed by 'street', useful for addresses.
NLP
pattern = [{'IS_ALPHA': True, 'OP': '+'}, {'LOWER': 'street'}]Sample Model
This program uses a pattern to find phrases like 'born 1879 in Ulm' in text. It prints each matched phrase.
NLP
import spacy from spacy.matcher import Matcher # Load small English model nlp = spacy.load('en_core_web_sm') # Create matcher object matcher = Matcher(nlp.vocab) # Define pattern to find 'born' followed by a year and a place pattern = [ {'LOWER': 'born'}, {'IS_DIGIT': True}, {'LOWER': 'in'}, {'ENT_TYPE': 'GPE'} ] matcher.add('BORN_PATTERN', [pattern]) text = "Albert Einstein was born 1879 in Ulm. Marie Curie was born 1867 in Warsaw." # Process text doc = nlp(text) # Find matches matches = matcher(doc) # Print matched spans for match_id, start, end in matches: span = doc[start:end] print(f"Matched phrase: '{span.text}'")
OutputSuccess
Important Notes
Patterns are case sensitive unless you use 'LOWER' to match lowercase words.
Use 'OP' to specify how many times a pattern part can repeat (e.g., '?', '+').
Patterns work best with a good language model that recognizes entities like people or places.
Summary
Information extraction patterns help find specific facts in text automatically.
They use lists of word features to describe what to look for.
Patterns can find names, dates, places, and more by matching text and entity types.