0
0
Linux CLIscripting~15 mins

cut (extract columns) in Linux CLI - Deep Dive

Choose your learning style9 modes available
Overview - cut (extract columns)
What is it?
The 'cut' command in Linux extracts specific sections from each line of text input. It is commonly used to select columns or fields from text files or command outputs. You specify which parts to extract by character position, byte position, or delimiter-separated fields. This helps you quickly get just the data you need from larger text streams.
Why it matters
Without 'cut', extracting specific columns from text files or command outputs would require more complex tools or manual editing. This command saves time and effort by letting you slice data easily, which is essential for processing logs, CSV files, or any structured text. It makes automation and scripting more efficient and less error-prone.
Where it fits
Before learning 'cut', you should understand basic Linux command line usage and text file formats. After mastering 'cut', you can explore more powerful text processing tools like 'awk' and 'sed' for advanced data manipulation.
Mental Model
Core Idea
Cut slices out specific columns or fields from lines of text, like using scissors to cut out parts of a paper strip.
Think of it like...
Imagine a paper strip with several colored sections side by side. Using scissors, you cut out only the red and blue sections to keep, ignoring the rest. 'cut' works the same way on text lines, selecting only the parts you want.
Input line:  | A | B | C | D | E |
cut command: |---|   |---|   |   |
Output line: | A |   | C |   |   |
Build-Up - 7 Steps
1
FoundationBasic character extraction
🤔
Concept: Extract specific characters by position from each line.
The simplest use of 'cut' is to select characters by their position. For example, 'cut -c 1-5' extracts characters 1 to 5 from each line. Example: $ echo "Hello World" | cut -c 1-5 Hello
Result
Hello
Understanding character-based extraction helps you grasp how 'cut' slices text precisely by position.
2
FoundationField extraction with delimiters
🤔
Concept: Extract fields separated by a delimiter like a comma or tab.
Most text files use delimiters like commas or tabs to separate fields. Using 'cut -d' you specify the delimiter, and '-f' to select fields. Example: $ echo "name,age,city" | cut -d ',' -f 2 age
Result
age
Knowing how to extract fields by delimiter is key for working with CSV or tabular data.
3
IntermediateCombining multiple fields
🤔Before reading on: do you think 'cut' can extract multiple non-adjacent fields at once? Commit to yes or no.
Concept: Select multiple fields, including non-adjacent ones, by listing them separated by commas.
You can extract several fields at once by listing them with commas. Example: $ echo "a,b,c,d,e" | cut -d ',' -f 1,3,5 a,c,e
Result
a,c,e
Knowing you can pick multiple fields at once makes 'cut' flexible for complex data extraction.
4
IntermediateUsing ranges and open-ended fields
🤔Before reading on: does 'cut' allow selecting from a field to the end, like 'field 3 to end'? Commit to yes or no.
Concept: You can specify ranges like '3-5' or open-ended ranges like '3-' to select fields from a start point to the end.
Ranges let you select consecutive fields easily. Example: $ echo "1,2,3,4,5" | cut -d ',' -f 3- 3,4,5
Result
3,4,5
Ranges simplify extracting blocks of fields without listing each one.
5
IntermediateByte vs character mode
🤔Before reading on: do you think 'cut -b' and 'cut -c' behave the same for all text? Commit to yes or no.
Concept: 'cut' can extract by bytes (-b) or characters (-c), which differ for multi-byte characters like emojis or accented letters.
Byte mode counts raw bytes, which can split multi-byte characters incorrectly. Character mode counts user-visible characters. Example: $ echo "é" | cut -b 1 (prints part of the character, broken) $ echo "é" | cut -c 1 é
Result
Character mode preserves whole characters; byte mode may break them.
Understanding this prevents data corruption when working with non-ASCII text.
6
AdvancedHandling missing delimiters gracefully
🤔Before reading on: does 'cut' output empty fields or errors when a line lacks the delimiter? Commit to yes or no.
Concept: 'cut' outputs the whole line if the delimiter is missing unless you use the '-s' option to suppress such lines.
By default, lines without the delimiter are printed whole, which can cause unexpected output. Example: $ echo -e "a,b,c\nd" | cut -d ',' -f 2 b d $ echo -e "a,b,c\nd" | cut -s -d ',' -f 2 b
Result
Suppressing lines without delimiter avoids mixing unmatched lines.
Knowing this helps avoid bugs when processing inconsistent data.
7
ExpertLimitations and alternatives to 'cut'
🤔Before reading on: can 'cut' handle complex patterns or conditional extraction? Commit to yes or no.
Concept: 'cut' is simple and fast but limited; it cannot handle complex patterns or conditional logic. Tools like 'awk' or 'sed' are better for advanced tasks.
'cut' only slices by fixed positions or delimiters. For example, it cannot extract fields based on content or multiple delimiters. Example limitation: Extracting the last field when fields vary in number is tricky with 'cut'. Alternative: $ echo "a,b,c" | awk -F',' '{print $NF}' c
Result
'cut' is best for simple, fixed-format data; use other tools for complex parsing.
Recognizing 'cut's limits guides you to the right tool for the job, improving script reliability.
Under the Hood
'cut' reads input line by line, then slices each line based on the specified mode: character positions, byte positions, or delimiter-separated fields. It uses simple parsing logic without loading the entire file into memory, making it fast and memory-efficient. For delimiter mode, it splits the line into fields by the delimiter and outputs the selected fields joined by the delimiter.
Why designed this way?
'cut' was designed as a lightweight, fast tool for simple column extraction in Unix pipelines. Its simplicity allows it to be used in scripts and chained commands without overhead. More complex parsing was left to other tools like 'awk' to keep 'cut' focused and efficient.
Input line
  │
  ▼
[cut reads line]
  │
  ├─ If -c or -b: slice characters or bytes by position
  │
  └─ If -d and -f: split line by delimiter into fields
       │
       ▼
  Select specified fields
       │
       ▼
  Output selected parts
Myth Busters - 4 Common Misconceptions
Quick: Does 'cut -c' always work correctly with all text characters? Commit to yes or no.
Common Belief:Many believe 'cut -c' works perfectly on any text, including emojis and accented letters.
Tap to reveal reality
Reality:'cut -c' counts characters but may fail on multi-byte characters if locale is not set properly, causing broken output.
Why it matters:This can corrupt data or cause scripts to fail when processing international text.
Quick: Does 'cut' remove the delimiter from output when extracting fields? Commit to yes or no.
Common Belief:Some think 'cut' removes delimiters between fields in output.
Tap to reveal reality
Reality:'cut' preserves the delimiter between extracted fields in output.
Why it matters:Misunderstanding this leads to incorrect assumptions about output format and downstream parsing errors.
Quick: Will 'cut' output an error if a line lacks the delimiter? Commit to yes or no.
Common Belief:People often expect 'cut' to error or skip lines missing the delimiter.
Tap to reveal reality
Reality:By default, 'cut' outputs the entire line if the delimiter is missing, unless '-s' is used.
Why it matters:Unexpected output can cause data corruption or logic errors in scripts.
Quick: Can 'cut' extract fields based on content or patterns? Commit to yes or no.
Common Belief:Some believe 'cut' can select fields conditionally based on content.
Tap to reveal reality
Reality:'cut' cannot do conditional extraction; it only slices by fixed positions or delimiters.
Why it matters:Trying to use 'cut' for complex parsing wastes time and leads to fragile scripts.
Expert Zone
1
When using 'cut -b' with multi-byte characters, output may break characters unless locale is UTF-8 and input is clean.
2
The '-s' option is critical in pipelines to avoid mixing lines without delimiters, which can silently corrupt data.
3
Combining 'cut' with other tools like 'tr' or 'paste' can create powerful text transformations beyond simple extraction.
When NOT to use
'cut' is not suitable for parsing complex or irregular data formats, conditional extraction, or when fields contain the delimiter inside quotes. In such cases, use 'awk', 'sed', or specialized CSV parsers.
Production Patterns
In production, 'cut' is often used in shell scripts to quickly extract fixed columns from logs or CSV files before passing data to other tools. It is commonly combined with 'grep' and 'sort' in pipelines for efficient data filtering and reporting.
Connections
awk
'awk' builds on 'cut' by adding pattern matching and conditional logic for field extraction.
Understanding 'cut' makes learning 'awk' easier because 'awk' can do everything 'cut' does and much more.
Spreadsheet column filtering
'cut' is like filtering columns in a spreadsheet but done via command line on text files.
Knowing how 'cut' works helps you automate tasks that you might otherwise do manually in spreadsheet software.
Scissors and paper cutting
Both involve selecting and removing parts from a whole based on position or markers.
This cross-domain connection helps visualize how 'cut' slices text lines into pieces.
Common Pitfalls
#1Using 'cut -b' on UTF-8 text with multi-byte characters causes broken output.
Wrong approach:echo "é" | cut -b 1 (prints broken character)
Correct approach:echo "é" | cut -c 1 é
Root cause:Confusing bytes with characters in multi-byte encodings.
#2Not using '-s' when input lines may lack the delimiter causes unexpected full-line output.
Wrong approach:echo -e "a,b,c\nd" | cut -d ',' -f 2 b d
Correct approach:echo -e "a,b,c\nd" | cut -s -d ',' -f 2 b
Root cause:Assuming 'cut' skips lines without delimiter by default.
#3Expecting 'cut' to remove delimiters between extracted fields in output.
Wrong approach:echo "a,b,c" | cut -d ',' -f 1,3 outputs 'a,c' but learner expects 'ac'
Correct approach:echo "a,b,c" | cut -d ',' -f 1,3 outputs 'a,c'
Root cause:Misunderstanding how 'cut' preserves delimiters in output.
Key Takeaways
'cut' is a simple, fast tool to extract specific characters or fields from text lines using positions or delimiters.
It works best on fixed-format data and is limited to slicing, not complex parsing or conditional logic.
Understanding the difference between byte and character modes is crucial for handling multi-byte text correctly.
The '-s' option helps avoid unexpected output when lines lack the delimiter, preventing data errors.
For advanced text processing, tools like 'awk' or 'sed' complement or replace 'cut' in scripts.