Ch 13 : Text Processing in Linux

Text processing in Linux is a core skill for managing logs, configuration files, and data like CSVs. This chapter explores essential tools like cut, sort, uniq, wc, awk, and fzf to help you process text efficiently.

Table of Contents

Summary

Introduction to text processing in Linux with cut, sort, uniq, wc, awk, fzf, and jq equips you to handle logs, CSVs, and JSON efficiently. These tools are essential for Linux workflows.

Text processing in Linux

In this chapter, we’ll explore tools and techniques for working with text in Linux. You’ll learn how to use commands like cut, sort, uniq, and wc to manipulate and analyze text files. We’ll also cover modern tools like fzf for interactive text searching and nroff for document formatting. By the end of this chapter, you’ll be able to efficiently process and extract meaningful information from text data.

Why Text Processing in Linux Matters

Text processing is vital for extracting data, sorting information, and analyzing logs. Whether you’re a system administrator or a beginner, mastering these tools simplifies tasks like parsing CSVs or finding errors in log files.

The `cut` Command

cut extracts specific columns or fields from files, ideal for structured data like CSVs.

Basic Usage of cut for Text Processing in Linux

$ cut -d',' -f1 file.csv

-d',': Sets comma as the delimiter.
-f1: Extracts the first field.

Examples

Extract first column from a CSV:
```
$ cut -d',' -f1 file.csv
```
Extract second and third columns:
```
$ cut -d',' -f2,3 file.csv
```
Extract characters 1-5 from each line:
```
$ cut -c1-5 file.txt
```

The `sort` Command

sort arranges lines in a file, useful for organizing data.

Basic Usage

$ sort file.txt

Common Options

-r: Reverse sort order.
-n: Sort numerically.
-k: Sort by specific column.

Examples

Sort alphabetically:
```
$ sort file.txt
```
Sort numerically:
```
$ sort -n numbers.txt
```
Sort CSV by second column:
```
$ sort -t',' -k2 file.csv
```

The `uniq` Command

uniq removes duplicate lines from a sorted file, often used with sort.

Basic Usage

$ sort file.txt | uniq

Common Options

-c: Count occurrences.
-d: Show only duplicates.
-u: Show only unique lines.

Examples

Remove duplicates:
```
$ sort file.txt | uniq
```
Count occurrences:
```
$ sort file.txt | uniq -c
```
Show duplicates:
```
$ sort file.txt | uniq -d
```

Text Processing in Linux. AI generated image

The `wc` Command

wc (word count) counts lines, words, and characters in files.

Basic Usage

$ wc file.txt

Common Options

-l: Count lines.
-w: Count words.
-c: Count characters.

Examples

Count lines:
```
$ wc -l file.txt
```
Count words:
```
$ wc -w file.txt
```
Count characters:
```
$ wc -c file.txt
```

Combining Commands for Text Processing in Linux

Pipelines combine commands for powerful text processing.

Example: Count Unique Words

$ cat file.txt | tr ' ' '\n' | sort | uniq -c | sort -nr

tr ' ' '\n': Splits words into lines.
sort | uniq -c: Counts unique words.
sort -nr: Sorts by count, descending.

Example: Find Most Common Error

$ grep "ERROR" logfile.txt | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -n 1

grep "ERROR": Filters error lines.
cut -d' ' -f4-: Extracts error messages.
head -n 1: Shows top error.

Advanced Text Editing with `awk`

awk is a versatile tool for complex text processing (covered in Chapter 13).

Examples

Print second column of CSV:
```
$ awk -F',' '{ print $2 }' file.csv
```

Sum numbers:

$ awk '{ sum += $1 } END { print sum }' numbers.txt

Searching Files with `find` and `fzf`

Search tools enhance text editing in Linux by locating files and content.

A more comprehensive note and tutorial on finding files in linux terminal is included in Next page on this blog.

`find`

Search files by name, size, or time.

Find .txt files:
```
$ find . -name "*.txt"
```
Find recent files:
```
$ find . -mtime -7
```

`fzf` (Fuzzy Finder)

Interactive search tool for files and text.

Search file content:
```
$ cat file.txt | fzf
```
Search files:
```
$ fzf
```

Note: Install fzf if not present:

$ sudo apt install fzf

Document Formatting with `groff` and `nroff`

Format documents for man pages or reports.

`groff`

Typesetting system for professional documents.

Create PostScript file:
```
$ groff -T ps file > output.ps
```

`nroff`

Lightweight formatting for terminals.

Format man page:
```
$ nroff -man file.1 | less
```

Modern Tool: `jq` for JSON

jq processes JSON data, common in APIs.

Extract JSON key:
```
$ echo '{"key": "value"}' | jq '.key'
```
Note: Install jq:
```
$ sudo apt install jq
```

Practice Time!

Test your skills with these tasks:

Extract third column of a CSV using cut.
Find unique lines with sort and uniq.
Count words in a file with wc.
Find most frequent error using grep, cut, and sort.
Locate .log files with find.
Search interactively with fzf.

Try This: Run sort file.txt | uniq -c and share results on X with #LinuxCommandLine!

Glossary of Commands, Tools, and Shortcuts

Command/Tool	Description
cut	Extracts fields or characters from files using delimiters.
sort	Sorts lines in a file alphabetically or numerically.
uniq	Removes or counts duplicate lines in a sorted file.
wc	Counts lines, words, or characters in a file.
awk	Processes and manipulates text with pattern matching.
find	Searches for files based on name, size, or time.
fzf	Interactive fuzzy finder for files and text.
jq	Processes and queries JSON data.
groff	Typesetting system for formatting documents.
nroff	Lightweight tool for formatting text, especially man pages.
tr	Translates or replaces characters in text.

Previous: Chapter 12 | Next: Chapter 14

Ch 13 : Text Processing in Linux

Summary

Text processing in Linux

Why Text Processing in Linux Matters

The cut Command

Basic Usage of cut for Text Processing in Linux

Examples

The sort Command

Basic Usage

Common Options

Examples

The uniq Command

Basic Usage

Common Options

Examples

The wc Command

Basic Usage

Common Options

Examples

Combining Commands for Text Processing in Linux

Example: Count Unique Words

Example: Find Most Common Error

Advanced Text Editing with awk

Examples

Searching Files with find and fzf

find

fzf (Fuzzy Finder)

Document Formatting with groff and nroff

groff

nroff

Modern Tool: jq for JSON

Practice Time!

Glossary of Commands, Tools, and Shortcuts

The `cut` Command

The `sort` Command

The `uniq` Command

The `wc` Command

Advanced Text Editing with `awk`

Searching Files with `find` and `fzf`

`find`

`fzf` (Fuzzy Finder)

Document Formatting with `groff` and `nroff`

`groff`

`nroff`

Modern Tool: `jq` for JSON