Overview - cut (extract columns)

What is it?

The 'cut' command in Linux extracts specific sections from each line of text input. It is commonly used to select columns or fields from text files or command outputs. You specify which parts to extract by character position, byte position, or delimiter-separated fields. This helps you quickly get just the data you need from larger text streams.

Why it matters

Without 'cut', extracting specific columns from text files or command outputs would require more complex tools or manual editing. This command saves time and effort by letting you slice data easily, which is essential for processing logs, CSV files, or any structured text. It makes automation and scripting more efficient and less error-prone.

Where it fits

Before learning 'cut', you should understand basic Linux command line usage and text file formats. After mastering 'cut', you can explore more powerful text processing tools like 'awk' and 'sed' for advanced data manipulation.

Mental Model

Core Idea

Cut slices out specific columns or fields from lines of text, like using scissors to cut out parts of a paper strip.

Think of it like...

Imagine a paper strip with several colored sections side by side. Using scissors, you cut out only the red and blue sections to keep, ignoring the rest. 'cut' works the same way on text lines, selecting only the parts you want.

Input line:  | A | B | C | D | E |
cut command: |---|   |---|   |   |
Output line: | A |   | C |   |   |

Build-Up - 7 Steps

1

FoundationBasic character extraction

Concept: Extract specific characters by position from each line.

The simplest use of 'cut' is to select characters by their position. For example, 'cut -c 1-5' extracts characters 1 to 5 from each line. Example: $ echo "Hello World" | cut -c 1-5 Hello

Result

Hello

Understanding character-based extraction helps you grasp how 'cut' slices text precisely by position.

2

FoundationField extraction with delimiters

3

IntermediateCombining multiple fields

4

IntermediateUsing ranges and open-ended fields

5

IntermediateByte vs character mode

6

AdvancedHandling missing delimiters gracefully

7

ExpertLimitations and alternatives to 'cut'

Under the Hood

'cut' reads input line by line, then slices each line based on the specified mode: character positions, byte positions, or delimiter-separated fields. It uses simple parsing logic without loading the entire file into memory, making it fast and memory-efficient. For delimiter mode, it splits the line into fields by the delimiter and outputs the selected fields joined by the delimiter.

Why designed this way?

'cut' was designed as a lightweight, fast tool for simple column extraction in Unix pipelines. Its simplicity allows it to be used in scripts and chained commands without overhead. More complex parsing was left to other tools like 'awk' to keep 'cut' focused and efficient.

Input line
  │
  ▼
[cut reads line]
  │
  ├─ If -c or -b: slice characters or bytes by position
  │
  └─ If -d and -f: split line by delimiter into fields
       │
       ▼
  Select specified fields
       │
       ▼
  Output selected parts

Myth Busters - 4 Common Misconceptions

Quick: Does 'cut -c' always work correctly with all text characters? Commit to yes or no.

Common Belief:Many believe 'cut -c' works perfectly on any text, including emojis and accented letters.

Tap to reveal reality

Quick: Does 'cut' remove the delimiter from output when extracting fields? Commit to yes or no.

Common Belief:Some think 'cut' removes delimiters between fields in output.

Tap to reveal reality

Quick: Will 'cut' output an error if a line lacks the delimiter? Commit to yes or no.

Common Belief:People often expect 'cut' to error or skip lines missing the delimiter.

Tap to reveal reality

Quick: Can 'cut' extract fields based on content or patterns? Commit to yes or no.

Common Belief:Some believe 'cut' can select fields conditionally based on content.

Tap to reveal reality

Expert Zone

1

When using 'cut -b' with multi-byte characters, output may break characters unless locale is UTF-8 and input is clean.

2

The '-s' option is critical in pipelines to avoid mixing lines without delimiters, which can silently corrupt data.

3

Combining 'cut' with other tools like 'tr' or 'paste' can create powerful text transformations beyond simple extraction.

When NOT to use

'cut' is not suitable for parsing complex or irregular data formats, conditional extraction, or when fields contain the delimiter inside quotes. In such cases, use 'awk', 'sed', or specialized CSV parsers.

Production Patterns

In production, 'cut' is often used in shell scripts to quickly extract fixed columns from logs or CSV files before passing data to other tools. It is commonly combined with 'grep' and 'sort' in pipelines for efficient data filtering and reporting.

Connections

awk

'awk' builds on 'cut' by adding pattern matching and conditional logic for field extraction.

Understanding 'cut' makes learning 'awk' easier because 'awk' can do everything 'cut' does and much more.

Spreadsheet column filtering

'cut' is like filtering columns in a spreadsheet but done via command line on text files.

Knowing how 'cut' works helps you automate tasks that you might otherwise do manually in spreadsheet software.

Scissors and paper cutting

Both involve selecting and removing parts from a whole based on position or markers.

This cross-domain connection helps visualize how 'cut' slices text lines into pieces.

Common Pitfalls

#1Using 'cut -b' on UTF-8 text with multi-byte characters causes broken output.

Wrong approach:echo "é" | cut -b 1 (prints broken character)

Correct approach:echo "é" | cut -c 1 é

Root cause:Confusing bytes with characters in multi-byte encodings.

#2Not using '-s' when input lines may lack the delimiter causes unexpected full-line output.

Wrong approach:echo -e "a,b,c\nd" | cut -d ',' -f 2 b d

Correct approach:echo -e "a,b,c\nd" | cut -s -d ',' -f 2 b

Root cause:Assuming 'cut' skips lines without delimiter by default.

#3Expecting 'cut' to remove delimiters between extracted fields in output.

Wrong approach:echo "a,b,c" | cut -d ',' -f 1,3 outputs 'a,c' but learner expects 'ac'

Correct approach:echo "a,b,c" | cut -d ',' -f 1,3 outputs 'a,c'

Root cause:Misunderstanding how 'cut' preserves delimiters in output.

Key Takeaways

'cut' is a simple, fast tool to extract specific characters or fields from text lines using positions or delimiters.

It works best on fixed-format data and is limited to slicing, not complex parsing or conditional logic.

Understanding the difference between byte and character modes is crucial for handling multi-byte text correctly.

The '-s' option helps avoid unexpected output when lines lack the delimiter, preventing data errors.

For advanced text processing, tools like 'awk' or 'sed' complement or replace 'cut' in scripts.