Ch 14 : Regular Expressions in Linux

In this chapter, we’ll explore regular expressions (regex), a powerful tool for matching and manipulating text. You’ll learn how to use regex with tools like grep, sed, and awk to search, filter, and transform text efficiently. By the end of this chapter, you’ll be able to harness the full power of regex in your scripts and command-line workflows.

Summary

Mastering regular expressions in Linux with grep, egrep, sed, awk, and tools like ripgrep enables efficient text processing. These skills are vital for Linux users handling logs, scripts, and data.

In second part of this chapter, we will learn about tools like grep, and in final segment of regular expressions in Linux, we will learn how to compare two or more files or folders using the cmp and diff commands.

Regular expressions in Linux are powerful tools for searching, filtering, and manipulating text. This chapter introduces regex with commands like grep, sed, and awk, plus modern alternatives like ripgrep, to help beginners process logs, scripts, and data efficiently.

Note: Most tools (grep, sed, awk) are pre-installed on Debian 12. For ripgrep or sd, install via:

$ sudo apt install ripgrep sd

Understanding Regular Expressions in Linux

Regular expressions (regex) are patterns for matching text, used in search, text processing, and validation. They combine literals, metacharacters, character classes, and anchors to define complex patterns.

Basic Regex Components

Literals: Exact characters (e.g., a matches “a”).
Metacharacters: Special characters (e.g., . matches any character, * matches zero or more, + matches one or more, ? matches zero or one).
Character Classes: Sets of characters (e.g., [a-z] matches lowercase letters).
Anchors: Positions (e.g., ^ for line start, $ for line end).
Escaping: Use \ for literal metacharacters (e.g., \. matches a period).

Example: a+ matches one or more as, while a* matches zero or more.

Using `grep` for Regular Expressions in Linux

grep searches text files for lines matching a regex pattern, ideal for log analysis.

Basic Usage

$ grep "pattern" file.txt

Common Options

-i: Ignore case.
-v: Show non-matching lines.
-E: Use extended regex (+, ?, |).
-o: Show only matched parts.
-c: Count matches.
-n: Show line numbers.
-w: Match whole words.

Examples

Find “error”:
```
$ grep "error" logfile.txt
```
Lines starting with “warning”:
```
$ grep "^warning" logfile.txt
```
Count “error” lines:
```
$ grep -c "error" logfile.txt
```
Match whole word “error”:
```
$ grep -w "error" logfile.txt
```
Search recursively:
```
$ grep -r "pattern" /path
```
Match “error” or “warning”:
```
$ grep -E "error|warning" logfile.txt
```

Using `egrep` with Regex

egrep is grep -E, supporting extended regex (+, ?, |) without escaping.

Examples

Match “error” or “warning”:
```
$ egrep "error|warning" logfile.txt
```
Match one or more digits:
```
$ egrep "[0-9]+" file.txt
```

Using `sed` with Regex

sed (stream editor) transforms text, often for search-and-replace tasks.

Basic Usage

$ sed 's/pattern/replacement/' file.txt

Common Commands

s: Substitute text.
d: Delete lines.
p: Print lines.

Examples

Replace “foo” with “bar”:
```
$ sed 's/foo/bar/' file.txt
```
Delete “error” lines:
```
$ sed '/error/d' file.txt
```
Global replace:
```
$ sed 's/foo/bar/g' file.txt
```
In-place edit with backup:
```
$ sed -i.bak 's/foo/bar/' file.txt
```
Delete third line:
```
$ sed '3d' file.txt
```

Using `awk` with Regex

awk processes structured data (e.g., CSVs) using regex for pattern matching.

Basic Usage

$ awk '/pattern/ { action }' file.txt

Common Actions

print: Print lines/fields.
gsub: Global substitute.

Examples

Print “error” lines:
```
$ awk '/error/ { print }' logfile.txt
```

Print first and third CSV fields:

$ awk -F ',' '{ print $1, $3 }' data.csv

Sum first column:

$ awk '{ total += $1 } END { print total }' numbers.txt

Common Regex Patterns

Useful patterns for real-world tasks:

Email: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
URL: https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Date (YYYY-MM-DD): \d{4}-\d{2}-\d{2}
Phone: \+?\d{1,3}[-.\s]?$?\d{1,4}$?[-.\s]?\d{1,4}[-.\s]?\d{1,9}

Combining Tools for Regular Expressions in Linux

Pipelines combine regex tools for complex tasks.

Examples

Extract emails:

$ grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

Replace dates:

$ sed -E 's/\d{4}-\d{2}-\d{2}/[DATE]/g' file.txt

Modern Alternatives

Modern tools enhance regex tasks with better performance and usability.

ripgrep (rg): Faster grep. Example:
```
$ rg "error"
```
sd: Simpler sed. Example:
```
$ echo "foo" | sd "foo" "bar"
```
jq: JSON processing. Example:
```
$ echo '{"name": "Alice"}' | jq '.name'
```

Ch 14 : Regular Expressions in Linux

Summary

Understanding Regular Expressions in Linux

Basic Regex Components

Using grep for Regular Expressions in Linux

Basic Usage

Common Options

Examples

Using egrep with Regex

Examples

Using sed with Regex

Basic Usage

Common Commands

Examples

Using awk with Regex

Basic Usage

Common Actions

Examples

Common Regex Patterns

Combining Tools for Regular Expressions in Linux

Examples

Modern Alternatives

Using `grep` for Regular Expressions in Linux

Using `egrep` with Regex

Using `sed` with Regex

Using `awk` with Regex