Chapter 9. Awk Builtin Variables

55. FS - Input Field Separator

The default field separator recognized by awk is space. If the records in your input file are delimited by anything other than space, you already know that you can specify the input field separator in the awk command line using option -F as shown below.

awk -F ',' '{print $2, $3}' employee.txt

You can also do the same using the FS (field separator) Awk built-in variable. You have to specify the FS in the BEGIN block as shown below.

awk 'BEGIN {FS=","} {print $2, $3}' employee.txt

You can have multiple awk statements in the BEGIN block. In the following example, we have both FS and a print command to print the headers inside the BEGIN block. Multiple commands inside the BEGIN or END block are separated by semi-colon.

awk 'BEGIN { FS=","; \
print "-------------\nName\tTitle\n-------------" } \
{ print $2,"\t",$3; } \
END {print "-------------"}' employee.txt

Please note that the default field separator is not just a single space. It actually matches one or more whitespace characters. The following employee-multiple-fs.txt file contains three different field separators in each record:

, Comma is the field separator after emp id
: Colon is the field separator after name
% Percentage is the field separator after title

Create the file:

$ vi employee-multiple-fs.txt
101,John Doe:CEO%10000
102,Jason Smith:IT Manager%5000
103,Raj Reddy:Sysadmin%4500
104,Anand Ram:Developer%4500
105,Jane Miller:Sales Manager%3000

When you encounter a file that contains different field separators, don't worry, FS can come to your rescue. You can specify MULTIPLE field separators using a regular expression. For example FS = "[,:%]" indicates that the field separator can be "," or ":" or "%". So, the following example will print the name and the title from the employee-multiple-fs.txt file that contains different field separators.

$ awk 'BEGIN {FS="[,:%]"} {print $2, $3}' employee-multiple-fs.txt
John Doe CEO
Jason Smith IT Manager
Raj Reddy Sysadmin
Anand Ram Developer
Jane Miller Sales Manager

56. OFS - Output Field Separator

FS is for input field separator. OFS is for output field separator. OFS is printed between consecutive fields in the output. By default, awk prints the output fields with space between the fields. Please note that we don't specify IFS for input field separator, we simply refer to it as FS. The following example prints the name and the salary with space between them. When you use a single print statement to print two variables by separating them with comma (as shown below), it will print the values of those two variables separated by space.

$ awk -F ',' '{print $2, $3}' employee.txt
John Doe CEO
Jason Smith IT Manager
Raj Reddy Sysadmin
Anand Ram Developer
Jane Miller Sales Manager

If you try to include a colon manually in the print statement between the fields, following will the output. Please note how there is an additional space before and after the colon. That is because, awk is still using space as the output field separator. The following print statement really printing three values (that are separated by comma) -- $2, :, and $4. As you already know when you use one print statement to print multiple values, the output will contain space in between them.

$ awk -F ',' '{print $2, ":", $3}' employee.txt
John Doe : CEO
Jason Smith : IT Manager
Raj Reddy : Sysadmin
Anand Ram : Developer
Jane Miller : Sales Manager

The right way to do is use the awk built-in variable OFS (output field separator), as shown below. Please note that there is no space before and after the colon in this example, as OFS replaces the default awk OFS (which is space) with the colon. The following print statement is printing two variables ($2 and $4) separated by comma, however the output will have colon separating them (instead of space), as our OFS is set to colon.

$ awk -F ',' 'BEGIN { OFS=":" } \
{ print $2, $3 }' employee.txt
John Doe:CEO
Jason Smith:IT Manager
Raj Reddy:Sysadmin
Anand Ram:Developer
Jane Miller:Sales Manager

Please also note the subtle difference between including a comma vs not including a comma in the print statement (when printing multiple variables). When you specify a comma in the print statement between different print values, awk will use the OFS. In the following example, the default OFS is used, so you'll see a space between the values in the output.

$ awk 'BEGIN { print "test1","test2" }'
test1 test2

When you don't separate values with a comma in the print statement, awk will not use the OFS; instead it will print the values with nothing in between.

$ awk 'BEGIN { print "test1" "test2" }'
test1test2

Let us assume that you have the following text file which contains the employee ids and names in a single line.

$ vi employee-one-line.txt
101,John Doe:102,Jason Smith:103,Raj Reddy:104,Anand
Ram:105,Jane Miller

In the above example, every record contains two fields (empid and name), and every record is separated by : (instead of a new line). The individual fields (empid and name) in the records are separated by comma. The default record separator used by awk is new line. If you are trying to print only the employee name, the following will not work for this example.

$ awk -F, '{print $2}' employee-one-line.txt
John Doe:102

In the above example, it is treating employee-one-line.txt as one single record, and comma as field delimiter. So, it prints "John Doe:102", as the 2nd field. If you want awk to treat this as 5 different lines (instead of a single line), and print employee name from each record, then you must specify the record separator as colon : as shown below.

$ awk -F, 'BEGIN { RS=":" } \
{ print $2 }' employee-one-line.txt
John Doe
Jason Smith
Raj Reddy
Anand Ram
Jane Miller

Let us assume that you have the following input file, where the records are separated by a "-" on it's own line. All the fields are on a separate line.

$ vi employee-change-fs-ofs.txt
101
John Doe
CEO
-
102
Jason Smith
IT Manager
-
103
Raj Reddy
Sysadmin
-
104
Anand Ram
Developer
-
105
Jane Miller
Sales Manager

In the above example, the field separator FS is new line, the record separator RS is "-" followed by a new line. So, if you want to print employee name and salary, you should do the following.

$ awk 'BEGIN { FS="\n"; RS="-\n"; OFS=":" } \
{print $2, $3}' employee-change-fs-ofs.txt
John Doe:CEO
Jason Smith:IT Manager
Raj Reddy:Sysadmin
Anand Ram:Developer
Jane Miller:Sales Manager

58. ORS - Output Record Separator

RS is for input record separator. ORS is for output record separator. Please note that we don't specify IRS for input record separator, we simply refer to it as RS. The following example adds a new line with "---" after each and every line output that is printed. By default, awk uses "\n" as ORS. In this example, we are using "\n---\n" as ORS to get the output as shown below.

$ awk 'BEGIN { FS=","; ORS="\n---\n" } \
{print $2, $3}' employee.txt
John Doe CEO
---
Jason Smith IT Manager
---
Raj Reddy Sysadmin
---
Anand Ram Developer
---
Jane Miller Sales Manager
---

The following example takes the records in employee.txt, and prints every field in its own line, separating each record with a separate line with "---".

$ awk 'BEGIN { FS=","; OFS="\n";ORS="\n---\n" } \
{print $1,$2,$3}' employee.txt
101
John Doe
CEO
---
102
Jason Smith
IT Manager
---
103
Raj Reddy
Sysadmin
---
104
Anand Ram
Developer
---
105
Jane Miller
Sales Manager
---

59. NR - Number of Records

NR is very helpful. When used inside the loop, this gives the line number. When used in the END block, this gives the total number of records in the file. Even thought NR stands for "Number of Records", it might be appropriate to call this as "Number of the Record", as it really gives you the line number of the current record. The following example shows how NR works in the body block, and in the END block:

$ awk 'BEGIN {FS=","} \
{print "Emp Id of record number",NR,"is",$1;} \
END {print "Total number of records:",NR}' employee.txt
Emp Id of record number 1 is 101
Emp Id of record number 2 is 102
Emp Id of record number 3 is 103
Emp Id of record number 4 is 104
Emp Id of record number 5 is 105
Total number of records: 5

60. FILENAME – Current File Name

$ awk '{ print FILENAME }' \
employee.txt employee-multiple-fs.txt
employee.txt
employee.txt
employee.txt
employee.txt
employee.txt
employee-multiple-fs.txt
employee-multiple-fs.txt
employee-multiple-fs.txt
employee-multiple-fs.txt
employee-multiple-fs.txt

When you read the values from the standard input, FILENAME variable will be set to the value of "-" as shown below. In the following example, since we didn't give any input-file, you should type the record in the standard input. In this example, I typed the 1st line "John Doe", and awk printed the last two lines. You have to press “Ctrl-C” to stop reading from stdin.

$ awk '{print "Last name:", $2; \
print "Filename:", FILENAME}'
John Doe
Last name: Doe
Filename: -

The above is also true when you pipe the input to awk from another program, as shown below. The following also will print FILENAME as "-".

$ echo "John Doe" | awk '{print "Last name:", $2; print "Filename:", FILENAME}'
Last name: Doe
Filename: -

Note: FILENAME inside the BEGIN block will return empty value "", as the BEGIN block is for the whole awk program, and not for any specific file.

61. FNR - File "Number of Record"

We already know that "NR" is "Number of Records" (or "Number of the Record"), which prints the current line number of the file that is getting processed. How will NR behave when we give have two input files? NR keeps growing between multiple files. When the body block starts processing the 2nd file, NR will not be reset to 1, instead it will continue from the last NR number value of the previous file. In the following example 1st file has 5 records, 2nd file has 5 records. As you see below, when the body loop is processing the 2nd file, NR starts from 6 (instead of 1). Finally, in the END block, NR gives the total number of records of both the files combined.

$ awk 'BEGIN {FS=","} \
{print FILENAME ": record number",NR,"is",$1;} \
END {print "Total number of records:",NR}' \
employee.txt employee-multiple-fs.txt
employee.txt: record number 1 is 101
employee.txt: record number 2 is 102
employee.txt: record number 3 is 103
employee.txt: record number 4 is 104
employee.txt: record number 5 is 105
employee-multiple-fs.txt: record number 6 is 101
employee-multiple-fs.txt: record number 7 is 102
employee-multiple-fs.txt: record number 8 is 103
employee-multiple-fs.txt: record number 9 is 104
employee-multiple-fs.txt: record number 10 is 105
Total number of records: 10

In the above example, we have two input files (employee.txt and employee-multiple-fs.txt). Each file has 5 records each. So, NR continued incrementing after the 1st file is processed. FNR will give you record number within the current file. So, when awk finishes executing the body block for the 1st file and starts the body block the next file, FNR will start from 1 again.

$ awk 'BEGIN {FS=","} \
{print FILENAME ": record number",FNR,"is",$1;} \
END {print "Total number of records:",NR}' \
employee.txt employee-multiple-fs.txt
employee.txt: record number 1 is 101
employee.txt: record number 2 is 102
employee.txt: record number 3 is 103
employee.txt: record number 4 is 104
employee.txt: record number 5 is 105
employee-multiple-fs.txt: record number 1 is 101
employee-multiple-fs.txt: record number 2 is 102
employee-multiple-fs.txt: record number 3 is 103
employee-multiple-fs.txt: record number 4 is 104
employee-multiple-fs.txt: record number 5 is 105
Total number of records: 10

The following example shows both NR and FNR:

$ vi fnr.awk
BEGIN { FS="," }
{ printf "FILENAME=%s NR=%s FNR=%s\n", FILENAME, NR, FNR; }
END { printf "END Block: NR=%s FNR=%s\n", NR, FNR }

$ awk -f fnr.awk employee.txt employee-multiple-fs.txt
FILENAME=employee.txt NR=1 FNR=1
FILENAME=employee.txt NR=2 FNR=2
FILENAME=employee.txt NR=3 FNR=3
FILENAME=employee.txt NR=4 FNR=4
FILENAME=employee.txt NR=5 FNR=5
FILENAME=employee-multiple-fs.txt NR=6 FNR=1
FILENAME=employee-multiple-fs.txt NR=7 FNR=2
FILENAME=employee-multiple-fs.txt NR=8 FNR=3
FILENAME=employee-multiple-fs.txt NR=9 FNR=4
FILENAME=employee-multiple-fs.txt NR=10 FNR=5
END Block: NR=10 FNR=5

Удаленное администрирование серверов. Настройка, оптимизация, поддержка.