[Day 5] I Ran rm, So Why Isn't the Disk Space Back?

After Day 4 on fork/exec/file descriptors/pipes, one thing still nagged me.

In that post I wrote:

"The child duplicates that fd onto fd 1 with dup2(3, 1) — then closes the original fd 3."

Why close fd 3, though? What actually breaks if I don't?

For a single ls run, nothing breaks — the process exits and all fds close anyway. But pulling on this thread led somewhere much bigger: what a file's actual lifetime looks like, and why fd leaks crash production servers.

That's today's story.

1. A Weird Experiment: `rm` Didn't Free the Disk?

Let's start with the experiment. I made a 500MB file, had tail -f keep reading it, and then ran rm.

$ dd if=/dev/zero of=big2.txt bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.299413 s, 1.8 GB/s

$ df .
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root      122637288 30974300  86608232  27% /

$ tail -f big2.txt &
[2] 12345

$ rm big2.txt
$ ls big2.txt
ls: cannot access 'big2.txt': No such file or directory    ← file is "gone"

$ df .
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root      122637288 30974300  86608232  27% /         ← but the usage is identical?

ls says the file doesn't exist. But the disk usage is exactly the same as before the rm.

So where did 500MB go?

2. Hunting for Clues in `/proc/PID/fd/`

Linux exposes every fd a process is holding as a symlink under /proc/PID/fd/. Let me check the tail process.

$ ls -l /proc/12345/fd/
total 0
lrwx------ 1 user user 64 Apr 20 16:58 0 -> /dev/pts/0
lrwx------ 1 user user 64 Apr 20 16:58 1 -> /dev/pts/0
lrwx------ 1 user user 64 Apr 20 16:58 2 -> /dev/pts/0
lr-x------ 1 user user 64 Apr 20 16:58 3 -> '/home/user/workspace/big2.txt (deleted)'

There's a (deleted) tag next to fd 3.

The file is "deleted" — but tail is still holding onto it via fd 3.

3. What `rm` Actually Does

Here's where it clicked. rm doesn't do what I thought it did.

What rm actually does:

Removes the name "big2.txt" from the directory.

It doesn't touch the data. It just detaches the label.

And rm doesn't send tail a memo saying "hey, I deleted the file you're reading." It just does its own job and exits.

So the state is:

big2.txt's inode (the real 500MB of data):
  - directory entries (names): 0       ← rm just removed this
  - open fds: 1                         ← tail is still holding fd 3
  
  total refs: 1

The reference count is 1. That's why the kernel won't reclaim the data.

4. The Kernel's Rule: Reference Counting

This is the kernel's actual rule:

inode reference count = (number of names) + (number of open fds)

The data isn't freed until this count hits zero.

So right now:

names: 0 (just removed by rm)
fds: 1 (held by tail)
total: 1 → data stays alive

To verify, I can kill tail and drop the count to 0.

5. Kill the Process, Get 500MB Back

$ df .
/dev/root      122637288 30974300  86608232  27% /      ← before kill

$ kill 12345

$ df .
/dev/root      122637288 30462292  87120240  26% /      ← after kill

Used: 30,974,300 → 30,462,292 KB.

Exactly 512,008 KB ≈ 500 MB freed. The size of big2.txt, to the byte.

The moment tail died, fd 3 closed, the reference count hit zero, and the kernel finally released the disk blocks.

6. Back to Day 4's Question

The thing that nagged me from Day 4 — "why do we have to close fd 3?" — I can answer it now.

For short-lived programs like ls, it doesn't really matter. The process ends, all fds close automatically.

It matters for long-running processes — i.e., servers.

7. Same Principle, Different Face: fd Leaks

$ ulimit -n
1024

This is the maximum number of fds a process can hold open at once. Usually 1024.

Imagine a server in pseudocode:

server loop:
  while (true):
    request = accept new HTTP request
    logfile = open("access.log")     ← 1 fd used
    write to log
    send response
    # forgot to close!

Request 1 → 1 fd. Request 2 → 2 fds. … Request 1024 → 1024 fds.

On the 1025th request, open() returns EMFILE (Too many open files). The server can't open the log file anymore. Outage.

This is an fd leak.

fd leak: fds pile up because opened resources are never closed. Once the process hits its limit (ulimit -n), new open() calls fail with "Too many open files."

8. Java Devs Already Know This Story

If you've written Java, you've seen try-with-resources:

// Old way — leak risk
FileInputStream fis = new FileInputStream("file.txt");
// ... process
fis.close();   // forget this → leak. exception thrown mid-block → leak.

// JDK 7+ — auto-close
try (FileInputStream fis = new FileInputStream("file.txt")) {
    // ... process
}   // close() called automatically when the block exits

What's the "resource" that Java warns you about leaking?

FileInputStream.close() ultimately makes a kernel close(fd) syscall.

Java:    fis.close()
   ↓
JVM:     native close0()
   ↓
Kernel:  close(fd)   ← ref count -1

Java's resource leak = the kernel's fd leak. Same phenomenon, viewed from a different layer.

DB connection pool exhausted because connections weren't closed? Same thing. From Java's perspective it's a "connection leak," but at the kernel level it's a socket fd that didn't close. A socket is an fd.

9. Three Phenomena, One Principle

Everything I learned today is the same rule wearing different masks:

Phenomenon	Through the reference-count lens
`rm` didn't free disk	names=0 + fds=1 = 1. Not yet freed.
Pipe doesn't hit EOF (Day 4)	A write-end fd is still held. Not EOF yet.
"Too many open files" error	Process's open-fd count exceeded the limit.

The kernel runs on one consistent rule: "Don't release a resource while something still references it." Files, pipes, sockets — all of them.

If you opened it, close it. The Unix first principle.

10. Why This Matters in Practice

Once you see the rule, a bunch of things start connecting:

/proc/PID/fd/ with (deleted) entries tells you why disk space isn't coming back when you expect it to.
When a server dies with "Too many open files," you can find which fds are leaking via /proc/PID/fd/.
Java's try-with-resources stops looking like language sugar and starts looking like a necessary answer to how kernels manage resources.

Coming from backend development into infrastructure, this connection is the edge. Being able to tell one coherent story from the close() in your application code, to the close(fd) syscall, to the ls -l /proc/PID/fd/ output — that's what the job needs.

Quick Reference

Command	What it does
`ls -l /proc/PID/fd/`	List every fd a process is holding
`ulimit -n`	Max fds this process can open simultaneously
`df .` (or `df -h .`)	Disk usage
`jobs -p`	PIDs of current background jobs
`kill PID`	Terminate a process

The core formula:

inode ref count = (number of names) + (number of open fds)
count == 0 → data is finally freed

Next: Why TTY shows up as ? — the story of parent-child process relationships.

[Day 5] I Ran rm, So Why Isn't the Disk Space Back?

1. A Weird Experiment: `rm` Didn't Free the Disk?

2. Hunting for Clues in `/proc/PID/fd/`

3. What `rm` Actually Does

4. The Kernel's Rule: Reference Counting

5. Kill the Process, Get 500MB Back

6. Back to Day 4's Question

7. Same Principle, Different Face: fd Leaks

8. Java Devs Already Know This Story

9. Three Phenomena, One Principle

10. Why This Matters in Practice

Quick Reference

Comments

Linux & Infra Journey

[Day 6] A Connection Came In — and a New fd Appeared

More from this blog

[Day 11] cron & Bash Scripting — Automating Tasks in Linux

[Day 10] journalctl — Finding Problems Through Logs

[Day 9] systemd — Who Wakes Up All These Processes?

[Day 8] Linux Pipes Are Just Plumbing — And I Already Had the Wrenches

[Day 7] The Process Went to Sleep — and the Kernel Kept Working

Command Palette

1. A Weird Experiment: rm Didn't Free the Disk?

2. Hunting for Clues in /proc/PID/fd/

3. What rm Actually Does

4. The Kernel's Rule: Reference Counting

5. Kill the Process, Get 500MB Back

6. Back to Day 4's Question

7. Same Principle, Different Face: fd Leaks

8. Java Devs Already Know This Story

9. Three Phenomena, One Principle

10. Why This Matters in Practice

Quick Reference

Comments

Linux & Infra Journey

[Day 6] A Connection Came In — and a New fd Appeared

More from this blog

1. A Weird Experiment: `rm` Didn't Free the Disk?

2. Hunting for Clues in `/proc/PID/fd/`

3. What `rm` Actually Does