Text processing in Linux is a core skill for managing logs, configuration files, and data like CSVs. This chapter explores essential tools like cut, sort, uniq, wc, awk, and fzf to help you process text efficiently.
Summary
Introduction to text processing in Linux with cut, sort, uniq, wc, awk, fzf, and jq equips you to handle logs, CSVs, and JSON efficiently. These tools are essential for Linux workflows.
Text processing in Linux
In this chapter, we’ll explore tools and techniques for working with text in Linux. You’ll learn how to use commands like cut, sort, uniq, and wc to manipulate and analyze text files. We’ll also cover modern tools like fzf for interactive text searching and nroff for document formatting. By the end of this chapter, you’ll be able to efficiently process and extract meaningful information from text data.
Why Text Processing in Linux Matters
Text processing is vital for extracting data, sorting information, and analyzing logs. Whether you’re a system administrator or a beginner, mastering these tools simplifies tasks like parsing CSVs or finding errors in log files.
The cut Command
cut extracts specific columns or fields from files, ideal for structured data like CSVs.
Basic Usage of cut for Text Processing in Linux
$ cut -d',' -f1 file.csv
-d',': Sets comma as the delimiter.-f1: Extracts the first field.
Examples
- Extract first column from a CSV:
$ cut -d',' -f1 file.csv
- Extract second and third columns:
$ cut -d',' -f2,3 file.csv
- Extract characters 1-5 from each line:
$ cut -c1-5 file.txt
The sort Command
sort arranges lines in a file, useful for organizing data.
Basic Usage
$ sort file.txt
Common Options
-r: Reverse sort order.-n: Sort numerically.-k: Sort by specific column.
Examples
- Sort alphabetically:
$ sort file.txt
- Sort numerically:
$ sort -n numbers.txt
- Sort CSV by second column:
$ sort -t',' -k2 file.csv
The uniq Command
uniq removes duplicate lines from a sorted file, often used with sort.
Basic Usage
$ sort file.txt | uniq
Common Options
-c: Count occurrences.-d: Show only duplicates.-u: Show only unique lines.
Examples
- Remove duplicates:
$ sort file.txt | uniq
- Count occurrences:
$ sort file.txt | uniq -c
- Show duplicates:
$ sort file.txt | uniq -d

The wc Command
wc (word count) counts lines, words, and characters in files.
Basic Usage
$ wc file.txt
Common Options
-l: Count lines.-w: Count words.-c: Count characters.
Examples
- Count lines:
$ wc -l file.txt
- Count words:
$ wc -w file.txt
- Count characters:
$ wc -c file.txt
Combining Commands for Text Processing in Linux
Pipelines combine commands for powerful text processing.
Example: Count Unique Words
$ cat file.txt | tr ' ' '\n' | sort | uniq -c | sort -nr
tr ' ' '\n': Splits words into lines.sort | uniq -c: Counts unique words.sort -nr: Sorts by count, descending.
Example: Find Most Common Error
$ grep "ERROR" logfile.txt | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -n 1
grep "ERROR": Filters error lines.cut -d' ' -f4-: Extracts error messages.head -n 1: Shows top error.
Advanced Text Editing with awk
awk is a versatile tool for complex text processing (covered in Chapter 13).
Examples
- Print second column of CSV:
$ awk -F',' '{ print $2 }' file.csv - Sum numbers:
$ awk '{ sum += $1 } END { print sum }' numbers.txt
Searching Files with find and fzf
Search tools enhance text editing in Linux by locating files and content.
A more comprehensive note and tutorial on finding files in linux terminal is included in Next page on this blog.
find
Search files by name, size, or time.
- Find
.txtfiles:$ find . -name "*.txt"
- Find recent files:
$ find . -mtime -7
fzf (Fuzzy Finder)
Interactive search tool for files and text.
- Search file content:
$ cat file.txt | fzf
- Search files:
$ fzf
Note: Install fzf if not present:
$ sudo apt install fzf
Document Formatting with groff and nroff
Format documents for man pages or reports.
groff
Typesetting system for professional documents.
- Create PostScript file:
$ groff -T ps file > output.ps
nroff
Lightweight formatting for terminals.
- Format man page:
$ nroff -man file.1 | less
Modern Tool: jq for JSON
jq processes JSON data, common in APIs.
- Extract JSON key:
$ echo '{"key": "value"}' | jq '.key' - Note: Install
jq:$ sudo apt install jq
Practice Time!
Test your skills with these tasks:
- Extract third column of a CSV using
cut. - Find unique lines with
sortanduniq. - Count words in a file with
wc. - Find most frequent error using
grep,cut, andsort. - Locate
.logfiles withfind. - Search interactively with
fzf.
Try This: Run
sort file.txt | uniq -cand share results on X with #LinuxCommandLine!
Glossary of Commands, Tools, and Shortcuts
| Command/Tool | Description |
|---|---|
| cut | Extracts fields or characters from files using delimiters. |
| sort | Sorts lines in a file alphabetically or numerically. |
| uniq | Removes or counts duplicate lines in a sorted file. |
| wc | Counts lines, words, or characters in a file. |
| awk | Processes and manipulates text with pattern matching. |
| find | Searches for files based on name, size, or time. |
| fzf | Interactive fuzzy finder for files and text. |
| jq | Processes and queries JSON data. |
| groff | Typesetting system for formatting documents. |
| nroff | Lightweight tool for formatting text, especially man pages. |
| tr | Translates or replaces characters in text. |
Previous: Chapter 12 | Next: Chapter 14