Chapter 8. Awk Syntax and Basic Commands

Awk is a powerful language to manipulate and process text files. It is especially helpful when the lines in a text files are in a record format, i.e, when each line (record) contains multiple fields separated by a delimiter. Even when the input file is not in a record format, you can still use awk to do some basic file and data processing. You can also write programming logic using awk even when there are no input files that needs to be processed. In short, AWK is a powerful language that can come in handy to do daily routine jobs. The learning curve on AWK is much smaller than the learning curve on any other language. If you know C programming already, you'll appreciate how simple and easy it is to learn AWK. AWK was originally written by three developers -- A. Aho, B. W. Kernighan and P. Weinberger. So, the name AWK came from the initials of those three developers. The following are the three variations of AWK:
  • AWK is original AWK.
  • NAWK is new AWK.
  • GAWK is GNU AWK. All Linux distributions comes with GAWK. This is fully compatible with AWK and NAWK.
This book covers all the fundamentals of original AWK, and some advanced features available only in GAWK. On the systems that have either NAWK, or GAWK installed, you can still type awk, which will invoke nawk or gawk correspondingly. For example, on Linux, you'll see that awk is a symbolic link to gawk. So, executing awk (or) gawk on Linux system will invoke gawk.
$ ls -l /bin/awk /bin/gawk
lrwxrwxrwx 1 root root 4 Sep 1 07:38 /bin/awk -> gawk
-rwxr-xr-x 1 root root 320416 Mar 14 2007 /bin/gawk
For most of the awk examples in this book, the following three sample files are used. Please create these sample files in your home directory, and use them to try out all the awk examples shown in this book.

employee.txt sample file

employee.txt is a comma delimited file that contains 5 employee records in the following format:
employee-number,employee-name,employee-title
Create the file:
$ vi employee.txt
101,John Doe,CEO
102,Jason Smith,IT Manager
103,Raj Reddy,Sysadmin
104,Anand Ram,Developer
105,Jane Miller,Sales Manager

items.txt sample file

items.txt is a comma delimited text file that contains 5 item records in the following format:
item-number,item-description,item-category,cost,quantityavailable
Create the file:
$ vi items.txt
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5

items-sold.txt sample file

items-sold.txt is a space delimited text file that contains 5 item records. Each record is for one particular item that contains the item number followed by number of items sold for that month (during the last 6 months). So, you'll see 7 fields in every record. Field 1 is the item-number. Field 2 through Field 7 are the total number of items sold in every month during the last 6 months. Following is the format of the items-sold.txt file.
item-number qty-sold-month1 qty-sold-month2 qty-sold-month3 qty-sold-month4 qty-sold-month5 qty-sold-month6
Create the file:
$ vi items-sold.txt
101 2 10 5 8 10 12
102 0 1 4 3 0 2
103 10 6 11 20 5 13
104 2 3 4 0 6 5
105 10 2 5 7 12 6

51. Awk Command Syntax

Basic Awk Syntax:
awk -Fs '/pattern/ {action}' input-file
(or)
awk -Fs '{action}' intput-file
In the above syntax:
  • -F is the field separator. If you don't specify, it will use an empty space as field delimiter.
  • The /pattern/ and the {action} should be enclosed inside single quotes.
  • /pattern/ is optional. If you don't provide it, awk will process all the records from the input-file. If you specify a pattern, it will process only those records from the input-file that match the given pattern.
  • {action} - These are the awk programming commands, which can be one or multiple awk commands. The whole action block (including all the awk commands together) should be closed between { and }
  • input-file - The input file that needs to be processed.
Following is a very simple example demonstrating the awk syntax:
$ awk -F: '/mail/ {print $1}' /etc/passwd
mail
mailnull
In the above simple example:
  • -F: This indicates that the field separator in the input-file is colon :, i.e. the fields are separated by a colon. Please note that you can also enclose the field separator within double quotes. -F ":" is also valid.
  • /mail/ - This is the pattern. awk will process only the records that contains the keyword mail.
  • {print $1} - This is the action. This action block contains only one awk command, that prints the 1st field of the record that matches the pattern "mail"
  • /etc/passwd - This is the input file.

Awk Commands in a Separate File

When you have to process a lot of awk commands, you can specify the '/pattern/ {action}' inside an awk script file and invoke it as shown below.
awk -Fs -f myscript.awk input-file
The myscript.awk can have any file extension (or no extension). But, it is easier to keep the extension as .awk for easy maintenance. You can also specify the field separator in script file itself (more on this later), and just invoke it as shown below.
awk -f myscript.awk input-file

52. Awk Program Structure (BEGIN, body, END block)

A typical awk program has following three blocks.

BEGIN Block

Syntax of begin block:
BEGIN { awk-commands }
The begin block gets executed only once at the beginning, before awk starts executing the body block for all the lines in the input file.
  • The begin block is a good place to print report headers, and initialize variables.
  • You can have one or more awk commands in the begin block.
  • The keyword BEGIN should be specified in upper case.
  • Begin block is optional.

Body Block

Syntax of body block:
/pattern/ {action}
The body block gets executed once for every line in the input file.
  • If the input file has 10 records, the commands in the body block will be executed 10 times (once for each record in the input file).
  • There is no keyword for the body block. We discussed pattern and action previously.

END Block

Syntax of end block:
END { awk-commands }
The end block gets executed only once at the end, after awk completes executing the body block for all the lines in the input-file.
  • • The end block is a good place to print a report footer and do any clean-up activities.
  • • You can have one or more awk commands in the end block.
  • • The keyword END should be specified in upper case.
  • • End block is optional.
awk work example Awk work example The following simple example shows the three awk blocks in action.
$ awk 'BEGIN { FS=":";print "---header---" } \
/mail/ {print $1} \
END { print "---footer---"}' /etc/passwd
---header---
mail
mailnull
---footer---
Note: When you have a very long command, you can either type is on a single line, or split it to multiple lines by specifying a \ at the end of each line. The above example is typed in 3 lines with a \ at the end of line 1 and line 2. In the above example:
  • BEGIN { FS=":";print "---header---" } is the begin block, that sets the field separator variable FS (more on this later), and prints the header. This gets executed only once before the body loop.
  • /mail/ {print $1} is the body loop, that contains a pattern and an action. i.e. This searches for the keyword "mail" in the input file and prints the 1st field.
  • END { print "---footer---"}' is the end block, that prints the footer.
  • /etc/passwd is the input file. The body loop gets executed for every records in this file.
Instead of executing the above simple example from the command line, you can also execute it from a file. First, create the following myscript.awk file that contains the begin, body, and end loop:
$ vi myscript.awk
BEGIN {
  FS=":"
  print "---header---"
}
  /mail/ {
  print $1
}
  END {
  print "---footer---"
}
Next, execute the myscript.awk as shown below for the input file /etc/passwd:
$ awk -f myscript.awk /etc/passwd
---header---
mail
mailnull
---footer---
Please note that a comment inside a awk script starts with #. If you are writing a complex awk script, follow the best practice: write enough comments inside the *.awk file so that it will be easier for you to understand when you look at the file later. Following are some random simple examples that show you various combinations of awk blocks. Only the body block:
awk -F: '{ print $1 }' /etc/passwd
Begin, body, and end block:
awk -F: 'BEGIN { printf "username\n------\n"} \
{ print $1 } \
END { print "------" }' /etc/passwd
Begin, and body block:
awk -F: 'BEGIN { print "UID"} { print $3 }' /etc/passwd

A Note on using only a BEGIN Block:

Specifying only the begin block is valid awk syntax. When you don't specify a body loop, there is no point in specifying a input file, since only the body loop gets executed for the lines in the input file. So, use only the BEGIN block when you want to use an awk program to do things not related to file processing. In many of our examples below, we'll have only the BEGIN block, to explain how some of the awk programming components work. You can use this idea for anything that you see fit. A simple begin only example:
$ awk 'BEGIN { print "Hello World!" }'
Hello World!

Multiple Input Files

Please note that you can specify multiple input files. If you specify two input files, first the body block will be executed for all the lines in input-file1, next the body block will be executed for all the lines in input-file2. Multiple input file example:
$ awk 'BEGIN { FS=":";print "---header---" } \
/mail/ {print $1} \
END { print "---footer---"}' /etc/passwd /etc/group
---header---
mail
mailnull
mail
mailnull
---footer---
Please note that the BEGIN block and the END block will be executed only once, even when you specify multiple input-files.

53. Print Command

By default, the awk print command (without any argument) prints the full record as shown. The following example is equivalent to "cat employee.txt" command.
$ awk '{print}' employee.txt
101,John Doe,CEO
102,Jason Smith,IT Manager
103,Raj Reddy,Sysadmin
104,Anand Ram,Developer
105,Jane Miller,Sales Manager
You can also print specific fields in a record by passing $field-number as a print command argument. The following example is supposed to print only the employee name (field number 2) of every record.
$ awk '{print $2}' employee.txt
Doe,CEO
Smith,IT
Reddy,Sysadmin
Ram,Developer
Miller,Sales
Wait. It didn't work as expected. It printed from the last name until the end of the record. This is because the default field delimiter in Awk is space. Awk did exactly what we asked; it did print the 2nd field considering space as a delimiter. When the default space is used as delimiter, "101,John" became field-1 and "Doe,CEO" became field- 2 of the 1st record. So, the above awk example printed "Doe,CEO" as field-2. To solve this issue, we should instruct Awk to use comma (,) as field delimiter. Use option -F to indicate the field separator.
awk -F ',' '{print $2}' employee.txt
John Doe
Jason Smith
Raj Reddy
Anand Ram
Jane Miller
When there is only one character used for delimiter, any of the following forms works, i.e. you can specify the field delimiter within single quotes, or double quotes, or without any quotes as shown below.
awk -F ',' '{print $2}' employee.txt
awk -F "," '{print $2}' employee.txt
awk -F, '{print $2}' employee.txt
Note: You can also use the FS variable for this purpose. We'll review that in the awk built-in variables section. A simple report that prints employee name and title with a header and footer:
$ awk -F ',' 'BEGIN \
{ print "-------------\nName\tTitle\n-------------"} \
{ print $2,"\t",$3;} \
END { print "-------------"; }' employee.txt
-------------
Name Title
-------------
John Doe CEO
Jason Smith IT Manager
Raj Reddy Sysadmin
Anand Ram Developer
Jane Miller Sales Manager
-------------
In the above report the fields are not aligned properly. We'll look at how to do that in later sections. The above example does show how you can use BEGIN to print a header, and END to print a footer. Please note that field $0 represents the whole record. Both of the following examples are the same; each prints the whole lines from employee.txt.
awk '{print}' employee.txt
awk '{print $0}' employee.txt

54. AWK Pattern Matching

You can execute awk commands only for lines that match a particular pattern. For example, the following prints the names and titles of the Managers:
$ awk -F ',' '/Manager/ {print $2, $3}' employee.txt
Jason Smith IT Manager
Jane Miller Sales Manager
The following example prints the employee name whose Emp id is 102:
$ awk -F ',' '/^102/ {print "Emp id 102 is", $2}' employee.txt
Emp id 102 is Jason Smith