0
0
Linux CLIscripting~15 mins

du (disk usage by directory) in Linux CLI - Deep Dive

Choose your learning style9 modes available
Overview - du (disk usage by directory)
What is it?
The 'du' command in Linux shows how much disk space files and directories use. It helps you see which folders or files take up the most space on your storage. You can use it to check disk usage for a single directory or many directories at once. It works by adding up the sizes of all files inside the directories you specify.
Why it matters
Without 'du', it would be hard to find out what is filling up your disk. This can cause your computer to slow down or stop working properly. 'du' helps you clean up space by showing where large files or folders are hiding. This saves time and prevents storage problems that can interrupt your work or system tasks.
Where it fits
Before learning 'du', you should know basic Linux commands like 'ls' to list files and 'cd' to change directories. After mastering 'du', you can learn about disk cleanup tools and scripting to automate space management. It fits into the journey of managing files and system resources efficiently on Linux.
Mental Model
Core Idea
'du' measures the total disk space used by files inside directories, helping you find where your storage is going.
Think of it like...
Imagine your disk as a big closet. 'du' is like a flashlight that shines inside each box (directory) to show how much space the stuff inside takes up, so you know which boxes are the heaviest.
Disk
└── Directory A (5MB)
    ├── File 1 (2MB)
    └── File 2 (3MB)
└── Directory B (10MB)
    ├── File 3 (6MB)
    └── Subdirectory B1 (4MB)
        └── File 4 (4MB)

'du' sums sizes bottom-up to show total per directory.
Build-Up - 7 Steps
1
FoundationBasic disk usage with du
🤔
Concept: Learn how to run 'du' to see disk usage of a directory.
Run 'du' followed by a directory name to see how much space it uses. For example, 'du /home/user' shows sizes of all folders inside '/home/user'. By default, it shows sizes in kilobytes for each subdirectory.
Result
/home/user/docs 4096 /home/user/music 8192 /home/user 12288
Understanding the default output helps you see disk usage per folder, which is the core purpose of 'du'.
2
FoundationHuman-readable sizes with -h option
🤔
Concept: Use the '-h' flag to make sizes easier to read by showing KB, MB, or GB.
Run 'du -h /home/user' to see sizes like '4.0K', '8.0M', or '1.2G' instead of raw numbers. This makes it easier to understand disk usage without converting units yourself.
Result
/home/user/docs 4.0K /home/user/music 8.0M /home/user 12.0M
Human-readable output saves time and reduces errors when interpreting disk usage.
3
IntermediateSummarize total size with -s
🤔Before reading on: Do you think 'du -s' shows sizes of all subdirectories or just the total for the directory? Commit to your answer.
Concept: The '-s' option shows only the total size of the directory, not each subfolder separately.
Run 'du -sh /home/user' to get a single line with the total size of '/home/user' and all its contents combined.
Result
12M /home/user
Knowing how to get just the total size helps when you want a quick summary without clutter.
4
IntermediateLimit depth with --max-depth
🤔Before reading on: Does '--max-depth=1' show sizes of all nested folders or only top-level ones? Commit to your answer.
Concept: The '--max-depth' option controls how deep 'du' looks inside directories.
Run 'du -h --max-depth=1 /home/user' to see sizes of '/home/user' and its immediate subdirectories only, not deeper nested folders.
Result
4.0K /home/user/docs 8.0M /home/user/music 12.0M /home/user
Controlling depth helps focus on important directory levels and avoids overwhelming output.
5
IntermediateExclude files or directories with --exclude
🤔
Concept: You can tell 'du' to ignore certain files or folders using '--exclude'.
Run 'du -h --exclude=*.tmp /home/user' to skip all files ending with '.tmp' when calculating sizes. This helps ignore temporary or unwanted files.
Result
Shows disk usage without counting '.tmp' files.
Excluding files refines results to focus on meaningful data and saves time.
6
AdvancedUsing du in scripts for cleanup
🤔Before reading on: Can 'du' output be easily used in scripts to automate disk cleanup? Commit to your answer.
Concept: 'du' output can be parsed in scripts to find large directories and automate actions like deleting or archiving.
Example script snippet: large_dirs=$(du -h --max-depth=1 /home/user | sort -hr | head -n 3) echo "Top 3 largest directories:\n$large_dirs" This finds the biggest folders to target for cleanup.
Result
Top 3 largest directories: 12.0M /home/user 8.0M /home/user/music 4.0K /home/user/docs
Using 'du' in automation saves manual effort and prevents disk space issues proactively.
7
ExpertUnderstanding disk blocks and du accuracy
🤔Before reading on: Does 'du' measure exact file sizes or disk blocks used? Commit to your answer.
Concept: 'du' reports disk blocks used, which may differ from file sizes due to filesystem allocation and sparse files.
'du' counts the actual disk space allocated, not just file size. For example, a sparse file with 1GB size but only 10MB allocated will show 10MB in 'du'. This is because filesystems allocate space in blocks, and some files don't use all blocks fully.
Result
Sparse files show smaller sizes in 'du' than their apparent size.
Knowing this prevents confusion when 'du' output seems smaller than expected and helps diagnose storage issues.
Under the Hood
'du' works by walking through the directory tree recursively. For each file, it checks the disk blocks allocated and sums them up for each directory. It uses system calls to read file metadata and counts blocks, not just file sizes. This means it reflects actual disk space used, including filesystem overhead and sparse files.
Why designed this way?
'du' was designed to show real disk usage rather than just file sizes because filesystems allocate space in blocks. Reporting allocated blocks helps users understand true storage consumption. Alternatives like just summing file sizes would mislead users about actual disk space used, especially with sparse or compressed files.
Start
  │
  ▼
Read directory
  │
  ├─ For each file:
  │     └─ Get allocated disk blocks
  │
  ├─ For each subdirectory:
  │     └─ Recursively repeat
  │
  ▼
Sum blocks per directory
  │
  ▼
Display results
  │
  ▼
End
Myth Busters - 4 Common Misconceptions
Quick: Does 'du' show the exact file size or the disk space used? Commit to your answer.
Common Belief:'du' shows the exact size of files as you see in file properties.
Tap to reveal reality
Reality:'du' shows the disk space actually used, which can be less or more than file size due to filesystem block allocation.
Why it matters:Misunderstanding this leads to confusion when 'du' reports less space than expected, causing incorrect cleanup decisions.
Quick: Does 'du -s' list sizes of all subdirectories or just the total? Commit to your answer.
Common Belief:'du -s' lists sizes of all subdirectories separately.
Tap to reveal reality
Reality:'du -s' shows only the total size of the specified directory, not individual subfolders.
Why it matters:Expecting detailed output from '-s' can cause missed insights about which subfolders use space.
Quick: Does 'du' follow symbolic links by default? Commit to your answer.
Common Belief:'du' counts sizes of files linked by symbolic links as if they were inside the directory.
Tap to reveal reality
Reality:'du' does not follow symbolic links by default to avoid double counting or infinite loops.
Why it matters:Assuming links are counted can lead to overestimating disk usage or confusion about totals.
Quick: Can 'du' output be used directly in scripts without parsing? Commit to your answer.
Common Belief:'du' output is always easy to parse and use in scripts without extra options.
Tap to reveal reality
Reality:Without options like '--block-size' or '--apparent-size', 'du' output can be hard to parse due to varying units or formatting.
Why it matters:Scripts may fail or misinterpret data if output is not standardized, causing automation errors.
Expert Zone
1
'du' counts disk blocks, so files with holes (sparse files) appear smaller than their logical size.
2
Using '--apparent-size' changes 'du' to report file sizes ignoring block allocation, useful for comparing logical vs physical size.
3
Filesystem compression or deduplication can cause 'du' to report sizes that differ from actual physical storage used.
When NOT to use
'du' is not suitable for real-time monitoring of disk usage changes or for network filesystems with latency. Tools like 'ncdu' or filesystem-specific utilities may be better. For exact file size analysis ignoring block allocation, use 'ls -l' or 'stat'.
Production Patterns
System administrators use 'du' combined with scripts to generate disk usage reports, trigger alerts when space is low, or automate cleanup of large temporary files. It is often paired with 'cron' jobs and log rotation to maintain healthy storage.
Connections
Filesystem Allocation
'du' builds on understanding how filesystems allocate disk blocks.
Knowing filesystem allocation helps interpret 'du' output correctly, especially for sparse or compressed files.
Automation Scripting
'du' output is often used as input for scripts that automate disk cleanup.
Understanding 'du' enables writing smarter scripts that manage disk space proactively.
Project Management Resource Tracking
'du' is like tracking resource usage in projects to find bottlenecks.
Just as 'du' finds disk space hogs, project managers find tasks consuming most time or budget to optimize resources.
Common Pitfalls
#1Confusing file size with disk usage
Wrong approach:du -h /path/to/file # Output shows smaller size than 'ls -lh' file size
Correct approach:du --apparent-size -h /path/to/file # Output matches logical file size
Root cause:Not knowing 'du' reports allocated blocks, not logical file size.
#2Expecting 'du -s' to list all subdirectories
Wrong approach:du -s /home/user/* # Only totals per item, no recursive breakdown
Correct approach:du -h --max-depth=1 /home/user # Shows sizes of immediate subdirectories
Root cause:Misunderstanding what '-s' summarizes.
#3Counting symbolic links as full size
Wrong approach:du -h /path/with/symlinks # Symlink targets counted, inflating size
Correct approach:du -h --no-dereference /path/with/symlinks # Symlinks counted as small links, not targets
Root cause:Not realizing 'du' does not follow symlinks by default but some options or scripts may cause confusion.
Key Takeaways
'du' measures disk space used by files and directories based on allocated blocks, not just file sizes.
Using options like '-h', '-s', and '--max-depth' tailors output for easier understanding and focused analysis.
'du' is essential for finding disk space hogs and managing storage efficiently on Linux systems.
Understanding filesystem allocation and sparse files helps interpret 'du' output accurately.
'du' output can be integrated into scripts to automate disk cleanup and monitoring tasks.