Ch 14 : Regular Expressions in Linux

In this section, we will take a look at text processing awk and sed. They are useful for advanced uses of command line, who want to use regular expressions in Linux for tasks such as transforming and analyzing data efficiently for logs, reports, and more.

Summary

AWK and SED are powerful Linux tools for text manipulation. This chapter covers advanced AWK arrays, SED regex, combining both tools, and real-world applications, empowering beginners to process text like pros.

Learning Objectives: Learn advanced AWK and SED techniques, combine them for complex tasks, and apply best practices for efficient text processing.

Use of Text Processing with AWK and SED?

AWK and SED streamline data extraction, transformation, and analysis, essential for log parsing, data cleaning, and automation in Linux workflows.

Review of Basic AWK and SED

AWK: Processes structured data:
```
$ awk '{print $1}' file.txt
```
SED: Edits text streams:
```
$ sed 's/old/new/g' file.txt
```

Advanced AWK Topics

Using AWK Arrays

Count occurrences in a file:

$ awk '{count[$1]++} END {for (i in count) print i, count[i]}' file.txt

Working with Multiple Files

Compare two files:

$ awk 'NR==FNR{a[$1];next} $1 in a' file1.txt file2.txt

Advanced SED Topics

SED Regular Expressions

Replace digits with asterisks:

$ sed 's/[0-9]/*/g' file.txt

Working with Multiple Lines

Delete blank lines:

$ sed '/^$/d' file.txt

Using AWK and SED Together

Extract and format data:

$ awk '{print $1,$2}' file.txt | sed 's/ /,/g' > output.csv

Real-World Examples

Data Processing

Convert log to CSV:

$ awk '{print $1","$3}' access.log | sed 's/ /,/g' > report.csv

Log File Analysis

Count unique IPs in logs:

$ awk '{print $1}' access.log | sort | uniq -c | awk '{print $2,$1}'

Best Practices and Tips

Use single quotes to avoid shell interpolation.
Test commands on sample data first.
Optimize AWK with next to skip unnecessary processing.
Comment complex SED scripts: # Replace digits.

Practical Examples

Filter and format logs:

$ awk '$2=="ERROR" {print $1,$3}' error.log | sed 's/ /: /g'

Extract specific columns:

$ awk -F',' '{print $2}' data.csv | sed 's/^ *//g'

Practice Time!

Test your skills:
1. Count word frequencies with AWK.
2. Replace text patterns with SED.
3. Combine AWK and SED to process a log file. 4. Analyze a CSV file for specific data.

Try This: Run awk '{print $1}' file.txt | sed 's/a/b/g' and share your success on X with #LinuxCommandLine!

AWK and SED Command Reference

Command	Description
`awk '{print $1}'`	Prints first column.
`sed 's/old/new/g'`	Replaces text globally.
`awk -F','`	Sets field separator.
`sed '/^$/d'`	Deletes empty lines.

Practice Time!

Test your skills:

Find email addresses with grep.
Replace “foo” with “bar” using sed.
Print second CSV column with awk.
Search recursively with ripgrep.

Try This: Run grep -E "error|warning" logfile.txt and share results on X with #LinuxCommandLine!

Glossary of Commands, Tools, and Shortcuts

Reference: For detailed documentation, visit Linux Manpages. For package installation, search on Debian APT.

Command/Tool	Description
grep	Searches text for regex patterns.
egrep	Extended grep for advanced regex.
sed	Stream editor for text transformations.
awk	Processes structured text with regex.
ripgrep	Faster alternative to grep.
sd	Modern search-and-replace tool.
jq	Processes JSON data.

That’s it for this chapter 15, regular expressions in Linux ! You’ve now learned how to use regular expressions with grep, sed, and awk to search, filter, and transform text. In the next chapter, we’ll dive into text processing—using tools like cut, sort, uniq, and wc to manipulate text files. Until then, practice using regex to become more comfortable with its powerful capabilities.

Previous: Chapter 13 | Next: Chapter 15