In this chapter, we’ll explore regular expressions (regex), a powerful tool for matching and manipulating text. You’ll learn how to use regex with tools like grep, sed, and awk to search, filter, and transform text efficiently. By the end of this chapter, you’ll be able to harness the full power of regex in your scripts and command-line workflows.
Summary
Mastering regular expressions in Linux with grep, egrep, sed, awk, and tools like ripgrep enables efficient text processing. These skills are vital for Linux users handling logs, scripts, and data.
In second part of this chapter, we will learn about tools like grep, and in final segment of regular expressions in Linux, we will learn how to compare two or more files or folders using the cmp and diff commands.
Regular expressions in Linux are powerful tools for searching, filtering, and manipulating text. This chapter introduces regex with commands like grep, sed, and awk, plus modern alternatives like ripgrep, to help beginners process logs, scripts, and data efficiently.
Note: Most tools (grep, sed, awk) are pre-installed on Debian 12. For ripgrep or sd, install via:
$ sudo apt install ripgrep sd
Understanding Regular Expressions in Linux
Regular expressions (regex) are patterns for matching text, used in search, text processing, and validation. They combine literals, metacharacters, character classes, and anchors to define complex patterns.
Basic Regex Components
- Literals: Exact characters (e.g.,
amatches “a”). - Metacharacters: Special characters (e.g.,
.matches any character,*matches zero or more,+matches one or more,?matches zero or one). - Character Classes: Sets of characters (e.g.,
[a-z]matches lowercase letters). - Anchors: Positions (e.g.,
^for line start,$for line end). - Escaping: Use
\for literal metacharacters (e.g.,\.matches a period).
Example: a+ matches one or more as, while a* matches zero or more.
Using grep for Regular Expressions in Linux
grep searches text files for lines matching a regex pattern, ideal for log analysis.
Basic Usage
$ grep "pattern" file.txt
Common Options
-i: Ignore case.-v: Show non-matching lines.-E: Use extended regex (+,?,|).-o: Show only matched parts.-c: Count matches.-n: Show line numbers.-w: Match whole words.
Examples
- Find “error”:
$ grep "error" logfile.txt
- Lines starting with “warning”:
$ grep "^warning" logfile.txt
- Count “error” lines:
$ grep -c "error" logfile.txt
- Match whole word “error”:
$ grep -w "error" logfile.txt
- Search recursively:
$ grep -r "pattern" /path
- Match “error” or “warning”:
$ grep -E "error|warning" logfile.txt
Using egrep with Regex
egrep is grep -E, supporting extended regex (+, ?, |) without escaping.
Examples
- Match “error” or “warning”:
$ egrep "error|warning" logfile.txt
- Match one or more digits:
$ egrep "[0-9]+" file.txt
Using sed with Regex
sed (stream editor) transforms text, often for search-and-replace tasks.
Basic Usage
$ sed 's/pattern/replacement/' file.txt
Common Commands
s: Substitute text.d: Delete lines.p: Print lines.
Examples
- Replace “foo” with “bar”:
$ sed 's/foo/bar/' file.txt
- Delete “error” lines:
$ sed '/error/d' file.txt
- Global replace:
$ sed 's/foo/bar/g' file.txt
- In-place edit with backup:
$ sed -i.bak 's/foo/bar/' file.txt
- Delete third line:
$ sed '3d' file.txt
Using awk with Regex
awk processes structured data (e.g., CSVs) using regex for pattern matching.
Basic Usage
$ awk '/pattern/ { action }' file.txt
Common Actions
print: Print lines/fields.gsub: Global substitute.
Examples
- Print “error” lines:
$ awk '/error/ { print }' logfile.txt - Print first and third CSV fields:
$ awk -F ',' '{ print $1, $3 }' data.csv - Sum first column:
$ awk '{ total += $1 } END { print total }' numbers.txt
Common Regex Patterns
Useful patterns for real-world tasks:
- Email:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} - URL:
https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} - Date (YYYY-MM-DD):
\d{4}-\d{2}-\d{2} - Phone:
\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}
Combining Tools for Regular Expressions in Linux
Pipelines combine regex tools for complex tasks.
Examples
- Extract emails:
$ grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt - Replace dates:
$ sed -E 's/\d{4}-\d{2}-\d{2}/[DATE]/g' file.txt
Modern Alternatives
Modern tools enhance regex tasks with better performance and usability.
ripgrep (rg): Fastergrep. Example:$ rg "error"
sd: Simplersed. Example:$ echo "foo" | sd "foo" "bar"
jq: JSON processing. Example:$ echo '{"name": "Alice"}' | jq '.name'