Chapter 13. Additional Awk Commands

Table of content

85. Pretty Printing Using printf

Printf is very flexible and makes report printing job relatively easier by allowing you to print the output in the way you want it.

Syntax:

printf "print format", variable1, variable2, etc.

Special Characters in the printf Format

Following are some of the special characters that can be used inside a printf.

\n -New Line
\t -Tab
\v -Vertical Tab
\b -Backspace
\r -Carriage Return
\f -Form Feed

The following prints "Line 1" and "Line 2" in separate lines using newline:

$ awk 'BEGIN { printf "Line 1\nLine 2\n" }'
Line 1
Line 2

The following prints different fields separated by tabs, with 2 tabs after "Field 1":

$ awk 'BEGIN { printf "Field 1\t\tField 2\tField 3\tField 4\n" }'
Field 1 Field 2 Field 3 Field 4

The following prints vertical tabs after every field:

$ awk 'BEGIN { printf "Field 1\vField 2\vField 3\vField 4\n" }'
Field 1
  Field 2
    Field 3
      Field 4

The following prints a backspace after every field except Field4. This erases the last number in each of the first three fields. For example "Field 1" is displayed as "Field ", because the last character is erased with backspace. However the last field "Field 4" is displayed as it is, as we didn't have a \b after "Field 4".

$ awk 'BEGIN \
{ printf "Field 1\bField 2\bField 3\bField 4\n" }'
Field Field Field Field 4

In the following example, after printing every field, we do a "Carriage Return" and print the next value on top of the current printed value. This means, in the final output you see is only "Field 4", as it was the last thing to be printed on top of all the previous fields.

$ awk 'BEGIN \
{ printf "Field 1\rField 2\rField 3\rField 4\n" }'
Field 4

Print Uses OFS, ORS Values

When you print multiple values separated by comma using print command (not printf), it uses the OFS and RS built-in variable values to decide how to print the fields.

The following example show how the simple print statement "print $2,$3" gets affected by using OFS and ORS values.

$ cat print.awk
BEGIN {
  FS=",";
  OFS=":";
  ORS="\n--\n";
}
{
  print $2,$3
}

$ awk -f print.awk items.txt
HD Camcorder:Video
--
Refrigerator:Appliance
--
MP3 Player:Audio
--
Tennis Racket:Sports
--
Laser Printer:Office
--

Printf doesn't Use OFS, ORS Values

Printf doesn't use the OFS and ORS values. It uses only what is specified in the "format" field of the printf command as shown in the example below.

$ cat printf1.awk
BEGIN {
  FS=",";
  OFS=":";
  ORS="\n--\n";
}
{
  printf "%s^^%s\n\n", $2, $3
}

$ awk -f printf1.awk items.txt
HD Camcorder^^Video
Refrigerator^^Appliance
MP3 Player^^Audio
Tennis Racket^^Sports
Laser Printer^^Office

Printf Format Specifiers

s -String
c -Single Character
d -Decimal
e -Exponential Floating point
f -Fixed Floating point
g -Uses either e or f depending on which is smaller for the given input
o -Octal
x -Hexadecimal
% -Prints the percentage symbol

The following example shows the basic usage of the format specifiers:

$ cat printf-format.awk
BEGIN {
  printf "s--> %s\n", "String"
  printf "c--> %c\n", "String"
  printf "s--> %s\n", 101.23
  printf "d--> %d\n", 101.23
  printf "e--> %e\n", 101.23
  printf "f--> %f\n", 101.23
  printf "g--> %g\n", 101.23
  printf "o--> %o\n", 0x8
  printf "x--> %x\n", 16
  printf "percentage--> %%\n", 17
}

$ awk -f printf-format.awk
s--> String
c--> S
s--> 101.23
d--> 101
e--> 1.012300e+02
f--> 101.230000
g--> 101.23
o--> 10
x--> 10
percentage--> %

Print with Fixed Column Width (Basic)

To create a fixed column width report, you have to specify a number immediately after the % in the format specifier. This number indicates the minimum number of character to be printed. When the input-string is smaller than the specified number, spaces are added to the left to make it fixed width.

The following example displays the basic use of the printf statement with number specified immediately after %

$ cat printf-width.awk
BEGIN {
  FS=","
  printf "%3s\t%10s\t%10s\t%5s\t%3s\n","Num","Description","Type","Price","Qty"
  printf
  "-----------------------------------------------------\n"
}
{
  printf "%3d\t%10s\t%10s\t%g\t%d\n", $1,$2,$3,$4,$5
}

$ awk -f printf-width.awk items.txt
Num Description Type Price Qty
--------------------------------------------------
101 HD Camcorder Video 210 10
102 Refrigerator Appliance 850 2
103 MP3 Player Audio 270 15
104 Tennis Racket Sports 190 20
105 Laser Printer Office 475 5

Notice that the output is a bit ragged, even though we specified the exact width. That's because the width we specify is actually the minimum width, not the absolute size; if the input string has more characters than that, the whole string will be printed. So, you should really pay attention to how many characters you want to print.

If you want to print a fixed column width even when the input string is longer than the number specified, you should use the substr function (or) add a decimal before the number in the format identifier (as explained later).

In the previous example, the second field was wider than the 10 character width specified, so the result was not what was intended. Spaces are added to the left to print “Good” as a 6 character string:

$ awk 'BEGIN { printf "%6s\n", "Good" }'
Good

The whole string is printed here even though you specified 6 character width:

$ awk 'BEGIN { printf "%6s\n", "Good Boy!" }'
Good Boy!

Print with Fixed Width (Left Justified)

When the input-string is less than the number of characters specified, and you would like it to be left justified (by adding spaces to the right), use a minus symbol (-) immediately after the % and before the number.

"%6s" is right justified as shown below:

$ awk 'BEGIN { printf "|%6s|\n", "Good" }'
| Good|

"%-6s" is left justified as shown below:

$ awk 'BEGIN { printf "|%-6s|\n", "Good" }'
|Good |

Print with Dollar Amount

To add a dollar symbol before the price value, just add the dollar symbol before the identifier in the printf as shown below.

$ cat printf-width2.awk
BEGIN {
  FS=","
  printf "%-3s\t%-10s\t%-10s\t%-5s\t%-3s\n",
  "Num","Description","Type","Price","Qty"
  printf "-----------------------------------------------------\n"
}
{
  printf "%-3d\t%-10s\t%-10s\t$%-.2f\t%-d\n", $1,$2,$3,$4,$5
}

$ awk -f printf-width2.awk items.txt
Num Description Type Price Qty
-------------------------------------------------
101 HD Camcorder Video $210.00 10
102 Refrigerator Appliance $850.00 2
103 MP3 Player Audio $270.00 15
104 Tennis Racket Sports $190.00 20
105 Laser Printer Office $475.00 5

Print with Leading Zeros

By default values are right justified with space added to the left

$ awk 'BEGIN { printf "|%5s|\n", "100" }'
| 100|

For right justified with 0's in front of the number (instead of the space), add a zero (0) before the number. i.e. Instead of "%5s", use "%05s" as the format identifier.

$ awk 'BEGIN { printf "|%05s|\n", "100" }'
|00100|

The following example uses the leading zero format identifier for the Qty field.

$ cat printf-width3.awk
BEGIN {
  FS=","
  printf "%-3s\t%-10s\t%-10s\t%-5s\t%-3s\n",
  "Num","Description","Type","Price","Qty"
  printf "-----------------------------------------------------\n"
}
{
  printf "%-3d\t%-10s\t%-10s\t$%-.2f\t%03d\n", $1,$2,$3,$4,$5
}

$ awk -f printf-width3.awk items.txt
Num Description Type Price Qty
-------------------------------------------------
101 HD Camcorder Video $210.00 010
102 Refrigerator Appliance $850.00 002
103 MP3 Player Audio $270.00 015
104 Tennis Racket Sports $190.00 020
105 Laser Printer Office $475.00 005

Print Absolute Fixed Width String Value

As we already shown you, when the input string contains more characters than what is specified in the format specifier it prints the whole thing as shown below.

$ awk 'BEGIN { printf "%6s\n", "Good Boy!" }'
Good Boy!

To print maximum of ONLY 6 characters, add a decimal before the number. i.e. Instead of "%6s", give "%.6s", which will print only 6 characters from the input string, even when the input string is longer than that as shown below.

$ awk 'BEGIN { printf "%.6s\n", "Good Boy!" }'
Good B

The above doesn't work on all versions of awk. On GAWK 3.1.5 it worked. But on GAWK 3.1.7 it didn't work.

So, the reliable way to print a fixed character might be to use the substr function as shown below.

$ awk 'BEGIN \
{ printf "%6s\n", substr("Good Boy!",1,6) }'
Good B

Dot . Precision

A dot before the number in format identifier indicates the precision.

The following example shows how a dot before a number for the numeric format identifier works. This example shows how the number "101.23" is printed differently when using using .1 and .4 (using d, e, f, and g format specifier).

$ cat dot.awk
BEGIN {
  print "----Using .1----"
  printf ".1d--> %.1d\n", 101.23
  printf ".1e--> %.1e\n", 101.23
  printf ".1f--> %.1f\n", 101.23
  printf ".1g--> %.1g\n", 101.23
  print "----Using .4----"
  printf ".4d--> %.4d\n", 101.23
  printf ".4e--> %.4e\n", 101.23
  printf ".4f--> %.4f\n", 101.23
  printf ".4g--> %.4g\n", 101.23
}

$ awk -f dot.awk
----Using .1----
.1d--> 101
.1e--> 1.0e+02
.1f--> 101.2
.1g--> 1e+02
----Using .4----
.4d--> 0101
.4e--> 1.0123e+02
.4f--> 101.2300
.4g--> 101.2

Print Report to File

You can redirect the output of a print statement to a specific output file inside the awk script. In the following example the 1st print statement has "> report.txt", which creates the report.txt file and sends the output of the prints statement to it. All the subsequent print statements have ">> report.txt", which appends the output to the existing report.txt file.

$ cat printf-width4.awk
BEGIN {
  FS=","
  printf "%-3s\t%-10s\t%-10s\t%-5s\t%-3s\n",
  "Num","Description","Type","Price","Qty" > "report.txt"
  printf "-----------------------------------------------------\n" >> "report.txt"
}
{
  if ($5 > 10)
  printf "%-3d\t%-10s\t%-10s\t$%-.2f\t%03d\n",
  $1,$2,$3,$4,$5 >> "report.txt"
}

$ awk -f printf-width4.awk items.txt
$ cat report.txt
Num Description Type Price Qty
-------------------------------------------------
103 MP3 Player Audio $270.00 015
104 Tennis Racket Sports $190.00 020

The other method is not to specify the "> report.txt" or ">> report.txt" in the print statement. Instead, while executing the awk script, redirect the output to the report.xt as shown below.

$ cat printf-width5.awk
BEGIN {
  FS=","
  printf "%-3s\t%-10s\t%-10s\t%-5s\t%-3s\n",
  "Num","Description","Type","Price","Qty"
  printf "-----------------------------------------------------\n"
}
{
  if ($5 > 10)
  printf "%-3d\t%-10s\t%-10s\t$%-.2f\t%03d\n",
  $1,$2,$3,$4,$5
}

$ awk -f printf-width5.awk items.txt > report.txt
$ cat report.txt
Num Description Type Price Qty
-------------------------------------------------
103 MP3 Player Audio $270.00 015
104 Tennis Racket Sports $190.00 020

86. Built-in Awk Numeric Functions

Awk has built-in functions for several numeric, string, input, and output operations. We discuss some of them here.

Awk int(n) Function

int() function gives you the integer part of the given argument. This produces the lowest integer part of given n. n is any number with or with out floating point. If you give a whole number as an argument, this function returns the same number; for a floating point number, it truncates.

Init Function Example:

$ awk 'BEGIN{
  print int(3.534);
  print int(4);
  print int(-5.223);
  print int(-5);
}'

The above command produces the following output.

3
4
-5
-5

Awk log(n) Function

The log(n) function provides the natural logarithm of given argument n. The number n must be positive, or an error will be thrown.

Log Function Example:

$ awk 'BEGIN{
  print log(12);
  print log(0);
  print log(1);
  print log(-1);
}'
2.48491
-inf
0
nan

In the above output you can identify that log(0) is infinity which was shown as -inf, and log(-1) gives you the error nan (Not a Number).

Note: You might also get the following warning message for the log(-1): awk: cmd. line:4: warning: log: received negative argument -1

Awk sqrt(n) Function

sqrt function gives the positive square root for the given integer n. This function also requires a positive number, and it returns nan error if you give the negative number as an argument.

Sqrt Function Example:

$ awk 'BEGIN{
  print sqrt(16);
  print sqrt(0);
  print sqrt(-12);
}'
4
0
nan

Awk exp(n) Function

The exp(n) function provides e to the power of n.

Exp Function Example:

$ awk 'BEGIN{
  print exp(123434346);
  print exp(0);
  print exp(-12);
}'
inf
1
6.14421e-06

In the above output, for exp(1234346), it gives you the output infinity, because this is out of range.

Awk sin(n) Function

The sin(n) function gives the sine of n, with n in radians.

Sine Function Example:

$ awk 'BEGIN {
  print sin(90);
  print sin(45);
}'
0.893997
0.850904

Awk cos(n) Function

The cos(n) returns the cosine of n, with n in radians.

Cosine Function Example:

$ awk 'BEGIN {
  print cos(90);
  print cos(45);
}'
-0.448074
0.525322

Awk atan2(m,n) Function

This function gives you the arc-tangent of m/n in radians.

Atan2 Function Example:

$ awk 'BEGIN { print atan2(30,45) }'
0.588003

87. Awk Random Number Generator

Awk rand() Function

rand() is used to generate a random number between 0 and 1. It never returns 0 or 1, always a value between 0 and 1. Numbers are random within one awk run, but predictable from run to run. Awk uses an algorithm to generate the random numbers, and since this algorithm is fixed, the numbers are repeatable.

The following example generates 1000 random numbers between 0 and 100, and shows how often each number was generated.

Generate 1000 random numbers (between 0 and 100):

$ cat rand.awk
BEGIN {
  while(i<1000)
{
  n = int(rand()*100);
  rnd[n]++;
  i++;
}
  for(i=0;i<=100;i++) {
    print i,"Occured", rnd[i], "times";
  }
}

$ awk -f rand.awk
0 Occured 6 times
1 Occured 16 times
2 Occured 12 times
3 Occured 6 times
4 Occured 13 times
5 Occured 13 times
6 Occured 8 times
7 Occured 7 times
8 Occured 16 times
9 Occured 9 times
10 Occured 6 times
11 Occured 9 times
12 Occured 17 times
13 Occured 12 times

From the above output, we can see that the rand() function can generate repeatable numbers very often.

Awk srand(n) Function

srand(n) is used to initialize the random number generation with a given argument n. Whenever program execution starts, awk starts generating its random numbers from n. If no argument were given, awk would use the time of the day to generate the seed.

Generate 5 random numbers starting from 5 to 50:

$ cat srand.awk
BEGIN {
  srand(5); # Initialize the seed with 5.
  total=5; # Totally I want to generate 5 numbers.
  max=50; #maximum number is 50.
  count=0;
  while(count < total) {
    rnd = int(rand() * max);
    if ( array[rnd] == 0 ) {
    count++;
    array[rnd]++;
  }
}
  for ( i=5; i<=max; i++) {
    if ( array[i] )
    print i;
  }
}

$ awk -f srand.awk
9
15
26
37
39

The above srand.awk does the following:

  • Uses rand() function to generate a random number that is multiplied with the maximum desired value to produce a number < 50.
  • Checks if the generated random number already exists in the array. If it does not exist, it increments the index and loop count. It generates 5 numbers using this logic.
  • Finally in the for loop, it loops from minimum to maximum, and prints each index that contains any value.

88. Generic String Functions

Following are the common awk string functions that are available on all flavors of awk.

Index Function

The index function can be used to get the index (location) of the given string (or character) in an input string.

In the following example, string "Cali" is located in the string "CA is California" at location number 7.

You can also use index to check whether a given string (or character) is present in an input string. If the given string is not present, it will return the location as 0, which means the given string doesn't exist, as shown below.

$ cat index.awk
BEGIN {
  state="CA is California"
  print "String CA starts at
  location",index(state,"CA");
  print "String Cali starts at
  location",index(state,"Cali");
  if (index(state,"NY")==0)
  print "String NY is not found in:", state
}

$ awk -f index.awk
String CA starts at location 1
String Cali starts at location 7
String NY is not found in: CA is California

Length Function

The length function returns the length of a string. In the following example, we print the total number of characters in each record of the items.txt file.

$ awk '{print length($0)}' items.txt
29
32
27
31
30

Split Function

Syntax:

split(input-string,output-array,separator)

This split function splits a string into individual array elements. It takes following three arguments.

  • input-string: This is the input string that needs to be split into multiple strings.
  • output-array: This array will contain the split strings as individual elements.
  • separator: The separator that should be used to split the input-string.

For this example, the original items-sold.txt file is slightly changed to have different field delimiters, i.e. a colon to separate the item number and the quantity sold. Within quantity sold, the individual quantities are separated by comma.

So, in order for us to calculate the total number of items sold for a particular item, we should take the 2nd field (which is all the quantities sold delimited by comma), split them using comma separator and store the substrings in an array, then loop through the array to add the quantities.

$ cat items-sold1.txt
101:2,10,5,8,10,12
102:0,1,4,3,0,2
103:10,6,11,20,5,13
104:2,3,4,0,6,5
105:10,2,5,7,12,6
$ cat split.awk
BEGIN {
  FS=":"
}
{
  split($2,quantity,",");
  total=0;
  for (x in quantity)
  total=total+quantity[x];
  print "Item", $1, ":", total, "quantities sold";
}

$ awk -f split.awk items-sold1.txt
Item 101 : 47 quantities sold
Item 102 : 10 quantities sold
Item 103 : 65 quantities sold
Item 104 : 20 quantities sold
Item 105 : 42 quantities sold

Substr Function

Syntax:

substr(input-string, location, length)

The substr function extracts a portion of a given string. In the above syntax:

  • input-string: The input string containing the substring.
  • location: The starting location of the substring.
  • length: The total number of characters to extract from the starting location. This parameter is optional. When you don't specify it extracts the rest of the characters from the starting location.

The following example starts extracting the string from 5th the character and prints the rest of the line. The 1st 3 characters are the item number, 4th character is the comma delimiter. So, this skips the item number and prints the rest.

$ awk '{print substr($0,5)}' items.txt
HD Camcorder,Video,210,10
Refrigerator,Appliance,850,2
MP3 Player,Audio,270,15
Tennis Racket,Sports,190,20
Laser Printer,Office,475,5

Start from the 1st character (of the 2nd field) and prints 5 characters:

$ awk -F"," '{print substr($2,1,5)}' items.txt
HD Ca
Refri
MP3 P
Tenni
Laser

89. GAWK/NAWK String Functions

These string functions are available only in GAWK and NAWK flavors.

Sub Function

syntax:

sub(original-string,replacement-string,string-variable)
  • sub stands for substitution.
  • original-string: This is the original string that needs to be replaced. This can also be a regular expression.
  • replacement-string: This is the replacement string.
  • string-variable: This acts as both input and output string variable. You have to be careful with this, as after the successful substitution, you lose the original value in this string-variable.

In the following example:

  • original-string: This is the regular expression C[Aa], which matches either "CA" or "Ca"
  • replacement-string: When the original-string is found, replace it with "KA"
  • string-variable: Before executing the sub, the variable contains the input string. Once the replacement is done, the variable contains the output string.

Please note that sub replaces only the 1st occurrence of the match.

$ cat sub.awk
BEGIN {
  state="CA is California"
  sub("C[Aa]","KA",state);
  print state;
}
$ awk -f sub.awk
KA is California

The 3rd parameter string-variable is optional. When it is not specified, awk will use $0 (the current line), as shown below. This example changes the first 2 characters of the record from "10" to "20". So, the item number 101 becomes 201, 102 becomes 202, etc.

$ awk '{ sub("10","20"); print $0; }' items.txt
201,HD Camcorder,Video,210,10
202,Refrigerator,Appliance,850,2
203,MP3 Player,Audio,270,15
204,Tennis Racket,Sports,190,20
205,Laser Printer,Office,475,5

When a successful substitution happens, the sub function returns 1, otherwise it returns 0.

Print the record only when a successful substitution occurs:

$ awk '{ if (sub("HD","High-Def")) print $0; }' items.txt
101,High-Def Camcorder,Video,210,10

Gsub Function

gsub stands for global substitution. gsub is exactly same as sub, except that all occurrences of original-string are changed to replacement-string.

In the following example, both "CA" and "Ca" are changed to "KA":

$ cat gsub.awk
BEGIN {
  state="CA is California"
  gsub("C[Aa]","KA",state);
  print state;
}
$ awk -f gsub.awk
KA is KAlifornia

As with sub, the 3rd parameter is optional. When it is not specified, awk will use $0 as shown below.

The following example replaces all the occurrences of "10" in the line with "20". So, other than changing the item-number, it also changes other numeric fields in the record, if it contains "10".

$ awk '{ gsub("10","20"); print $0; }' items.txt
201,HD Camcorder,Video,220,20
202,Refrigerator,Appliance,850,2
203,MP3 Player,Audio,270,15
204,Tennis Racket,Sports,190,20
205,Laser Printer,Office,475,5

Match Function () and RSTART, RLENGTH variables

Match function searches for a given string (or regular expression) in the input-string, and returns a positive value when a successful match occurs.

Syntax:

match(input-string,search-string)
  • • input-string: This is the input-string that needs to be searched.
  • • search-string: This is the search-string, that needs to be search in the input-string. This can also be a regular expression.

The following example searches for the string "Cali" in the state string variable. If present, it prints a successful message.

$ cat match.awk
BEGIN {
  state="CA is California"
  if (match(state,"Cali")) {
  print substr(state,RSTART,RLENGTH),"is present in:",
  state;
}
}
$ awk -f match.awk
Cali is present in: CA is California

Match sets the following two special variables. The above example uses these in the substring function call, to print the pattern in the success message.

  • RSTART - The starting location of the search-string
  • RLENGTH - The length of the search-string.

90. GAWK String Functions

tolower and toupper are available only in Gawk. As the name suggests the function converts the given string to lower case or upper case as shown below.

$ awk '{print tolower($0)}' items.txt
101,hd camcorder,video,210,10
102,refrigerator,appliance,850,2
103,mp3 player,audio,270,15
104,tennis racket,sports,190,20
105,laser printer,office,475,5

$ awk '{print toupper($0)}' items.txt
101,HD CAMCORDER,VIDEO,210,10
102,REFRIGERATOR,APPLIANCE,850,2
103,MP3 PLAYER,AUDIO,270,15
104,TENNIS RACKET,SPORTS,190,20
105,LASER PRINTER,OFFICE,475,5

91. Awk Argument Processing (ARGC, ARGV, ARGIND)

The built-in variables we discussed earlier, FS, NFS, RS, NR, FILENAME, OFS, and ORS, are all available on all versions of awk (including nawk, and gawk).

  • The environment variables discussed in this hack are available only on nawk and gawk.
  • Use ARGC and ARGV to pass some parameters to the awk script from the command line.
  • ARGC contains the total number of arguments passed to the awk script.
  • ARGV is an array contains all the arguments passed to the awk script in the index from 0 through ARGC
  • When you pass 5 arguments, ARGC will contain the value of 6. • ARGV[0] will always contain awk.

The following simple arguments.awk shows how ARGC and ARGV behave:

$ cat arguments.awk
BEGIN {
  print "ARGC=",ARGC
  for (i = 0; i < ARGC; i++)
  print ARGV[i]
}

$ awk -f arguments.awk arg1 arg2 arg3 arg4 arg5
ARGC= 6
awk
arg1
arg2
arg3
arg4
arg5

In the following example:

  • We are passing parameters to the script in the format "-- paramname paramvalue".
  • The awk script can take item number and the quantity as arguments.
  • if you use "--item 104 --qty 25" as argument to the awk script, it will set quantity as 25 for the item number 104.
  • if you use "--item 105 --qty 3" as argument to the awk script, it will set quantity as 3 for the item number 105.
$ cat argc-argv.awk
BEGIN {
FS=",";
OFS=",";
for (i=0; i<ARGC; i++) {
if (ARGV[i]=="--item") {
itemnumber=ARGV[i+1];
delete ARGV[i]
i++;
delete ARGV[i]
} else if (ARGV[i]=="--qty") {
quantity=ARGV[i+1];
delete ARGV[i]
i++;
delete ARGV[i]
}
}
}
{
  if ($1==itemnumber)
  print $1,$2,$3,$4,quantity
  else
  print $0;
}

$ awk -f argc-argv.awk --item 104 --qty 25 items.txt
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,25
105,Laser Printer,Office,475,5

In gawk the file that is currently getting processed is stored in the ARGV array that is accessed from the body loop. The ARGIND is the index to this ARGV array to retrieve the current file.

When you are processing only one file in an awk script, the ARGIND will be 1, and ARGV[ARGIND] will give the file name that is currently getting processed.

The following example contains only the body block, that prints the value of the ARGIND, and the current file name from the ARGV[ARGIND]

$ cat argind.awk
{
print "ARGIND:", ARGIND
print "Current file:", ARGV[ARGIND]
}

When you call the above example with two files, while processing each and every line of the input-file, it will print the two lines. This just gives you the idea of what is getting stored in the ARGIND and ARGV[ARGIND].

$ awk -f argind.awk items.txt items-sold.txt
ARGIND: 1
Current file: items.txt
ARGIND: 1
Current file: items.txt
ARGIND: 1
Current file: items.txt
ARGIND: 1
Current file: items.txt
ARGIND: 1
Current file: items.txt
ARGIND: 2
Current file: items-sold.txt
ARGIND: 2
Current file: items-sold.txt
ARGIND: 2
Current file: items-sold.txt
ARGIND: 2
Current file: items-sold.txt
ARGIND: 2
Current file: items-sold.txt

92. OFMT

The OFMT built-in variable is available only in NAWK and GAWK.

When a number is converted to a string for printing, awk uses the OFMT format to decide how to print the values. The default value is "%.6g", which will print a total of 6 characters including both sides of the dot in a number.

When using g, you have to count all the characters on both sides of the dot. For example, "%.4g" means total of 4 characters will be printed including characters on both sides of the dot.

When using f, you are counting ONLY the characters on the right side of the dot. For example, "%.4f" means 4 characters will be printed on the right side of the dot. The total number of characters on the left side of the dot doesn't matter here.

The following ofmt.awk example shows how the output will be printed when using various OFMT values (for both g and f).

$ cat ofmt.awk
BEGIN {
  total=143.123456789;
  print "---using g----"
  print "Default OFMT:", total;
  OFMT="%.3g";
  print "%.3g OFMT:", total;
  OFMT="%.4g";
  print "%.4g OFMT:", total;
  OFMT="%.5g";
  print "%.5g OFMT:", total;
  OFMT="%.6g";
  print "%.6g OFMT:", total;
  print "---using f----"
  OFMT="%.0f";
  print "%.0f OFMT:", total;
  OFMT="%.1f";
  print "%.1f OFMT:", total;
  OFMT="%.2f";
  print "%.2f OFMT:", total;
  OFMT="%.3f";
  print "%.3f OFMT:", total;
}

$ awk -f ofmt.awk
---using g----
Default OFMT: 143.123
%.3g OFMT: 143
%.4g OFMT: 143.1
%.5g OFMT: 143.12
%.6g OFMT: 143.123
---using f----
%.0f OFMT: 143
%.1f OFMT: 143.1
%.2f OFMT: 143.12
%.3f OFMT: 143.123

93. GAWK Built-in Environment Variables

The built-in variables discussed in this section are available only in GAWK.

ENVIRON

This is very helpful when you want to access the shell environment variable in your awk script. ENVIRON is an array that contains all the environment values. The index to the ENVIRON array is the environment variable name.

For example, the array element ENVIRON["PATH"] will contain the value of the PATH environment variable.

The following example prints all the available environment variables and their values.

$ cat environ.awk
BEGIN {
  OFS="="
  for(x in ENVIRON)
  print x,ENVIRON[x];
}

Partial output is shown below.

$ awk -f environ.awk
SHELL=/bin/bash
PATH=/home/ramesh/bin:/usr/local/sbin:/usr/local/bin:/u
sr/sbin:/usr/bin:/sbin:/bin:/usr/games
HOME=/home/ramesh
TERM=xterm
USERNAME=ramesh
DISPLAY=:0.0
AWKPATH=.:/usr/share/awk

IGNORECASE

By default IGNORECASE is set to 0. So, the awk program is case sensitive.

When you set IGNORECASE to 1, the awk program becomes case insensitive. This will affect regular expression and string comparisons.

The following will not print anything, as it is looking for "video" with lower case "v". But, the items.txt file contains only "Video" with upper case "V".

awk '/video/ {print}' items.txt

However when you set IGNORECASE to 1, and search for "video", it will print the line containing "Video", as it will not do a case sensitive pattern match.

$ awk 'BEGIN{IGNORECASE=1} /video/ {print}' items.txt
101,HD Camcorder,Video,210,10

As you see in the example below, this works for both string and regular expression comparisons.

$ cat ignorecase.awk
BEGIN {
  FS=",";
  IGNORECASE=1;
}
{
  if ($3 == "video") print $0;
  if ($2 ~ "TENNIS") print $0;
}

$ awk -f ignorecase.awk items.txt
101,HD Camcorder,Video,210,10
104,Tennis Racket,Sports,190,20

ERRNO

When there is an error while using I/O operations (for example: getline), the ERRNO variable will contain the corresponding error message.

The following example is trying to read a file that doesn't exist using getline. In this case the ERRNO variable will contain "No such file or directory" message.

$ vi errno.awk
{
  print $0;
  x = getline < "dummy-file.txt"
  if ( x == -1 )
  print ERRNO
  else
  print $0;
}

$ awk -f errno.awk items.txt
101,HD Camcorder,Video,210,10
No such file or directory
102,Refrigerator,Appliance,850,2
No such file or directory
103,MP3 Player,Audio,270,15
No such file or directory
104,Tennis Racket,Sports,190,20
No such file or directory
105,Laser Printer,Office,475,5
No such file or directory

94. Awk Profiler - pgawk

The pgawk program is used to create an execution profile of your awk program. Using pgawk you can view how many time each awk statement (and custom user defined functions) were executed.

First, create a sample awk program that we'll run through the pgawk to see how the profiler output looks like.

$ cat profiler.awk
BEGIN {
  FS=",";
  print "Report Generated On:" strftime("%a %b %d %H:%M:
  %S %Z %Y",systime());
}
{
  if ( $5 <= 5 )
  print "Buy More: Order", $2, "immediately!"
  else
  print "Sell More: Give discount on", $2,
  "immediately!"
}
END {
  print "----"
}

Next, execute the sample awk program using pgawk (instead of just calling awk).

$ pgawk -f profiler.awk items.txt
Report Generated On:Mon Jan 31 08:35:59 PST 2011
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!
----

By default pgawk creates a file called profiler.out (or awkprof.out). You can specify your own profiler output file name using --profiler option as shown below.

$ pgawk --profile=myprofiler.out -f profiler.awk
items.txt

View the default awkprof.out to understand the execution counts of the individual awk statements.

$ cat awkprof.out
# gawk profile, created Mon Jan 31 08:35:59 2011
# BEGIN block(s)
BEGIN {
1 FS = ","
1 print ("Report Generated On:" strftime("%a %b
%d %H:%M:%S %Z %Y", systime()))
}
# Rule(s)
5 {
5 if ($5 <= 5) { # 2
2 print "Buy More: Order", $2,
"immediately!"
3 } else {
3 print "Sell More: Give discount on", $2,
"immediately!"
}
}
# END block(s)
END {
1 print "----"
}

While reading the awkprof.out, please keep the following in mind:

  • The column on the left contains a number. This indicates how many times that particular awk command has executed. For example, the print statement in begin executed only once (duh!). The while lop executed 6 times.
  • For any condition checking, one on the left side, another on the right side after the parenthesis. The left side indicates how many times the pattern was checked. The right side indicate how many times it was successful. In the above example, if was executed 5 times, but it was successful 2 times as indicated by ( # 2 ) next to the if statement.

95. Bit Manipulation

Just like C, awk can manipulate bits. You might not need this on your day to day awk programming. But, this goes to show how much you can do with the awk program.

Following table shows the single digit decimal number and its binary equivalent.

Decimal Binary
2 10
3 11
4 100
5 101
6 110
7 111
8 1000
9 1001

AND

For an AND output to be 1, both the bits should be 1.

  • 0 and 0 = 0
  • 0 and 1 = 0
  • 1 and 0 = 0
  • 1 and 1 = 1

For example, let us do AND between the decimal 15 and 25. The and output of 15 and 25 is binary 01001, which is decimal 9.

  • 15 = 01111
  • 25 = 11001
  • 15 and 25 = 01001

OR

For an OR output to be 1, either one of the bits should be 1.

  • 0 or 0 = 0
  • 0 or 1 = 1
  • 1 or 0 = 1
  • 1 or 1 = 1

For example, let us do OR between the decimal 15 and 25. The or output of 15 and 25 is binary 11111, which is decimal 31.

  • 15 = 01111
  • 25 = 11001
  • 15 or 25 = 11111

XOR

For XOR output to be 1, only one of the bits should be 1. When both the bits are 1, xor will return 0.

  • 0 xor 0 = 0
  • 0 xor 1 = 1
  • 1 xor 0 = 1
  • 1 xor 1 = 0

For example, let us do XOR between the decimal 15 and 25. The xor output of 15 and 25 is binary 10110, which is decimal 22.

  • 15 = 01111
  • 25 = 11001
  • 15 xor 25 = 10110

Complement

Complement Makes 0 as 1, and 1 as 0.

For example, let us complement decimal 15.

  • 15 = 01111
  • 15 compl = 10000

Left Shift

This function shifts the bits to the left side; you can specify how many times it should do the shift. 0s are shifted in from the right side.

For example, let us left shift (two times) decimal 15. The lshift twice output of 15 is binary 111100, which is decimal 60.

  • 15 = 1111
  • lshift twice = 111100

Right Shift

This function shifts the bits to the right side; you can specify how many times it should do the shift. 0s are shifted in from the left side.

For example, let us right shift (two times) decimal 15. The lshift twice output of 15 is binary 0011, which is decimal 3.

  • 15 = 1111
  • lshift twice = 0011

Awk Example using Bit Functions

$ cat bits.awk
BEGIN {
  number1=15
  number2=25
  print "AND: " and(number1,number2);
  print "OR: " or(number1,number2)
  print "XOR: " xor(number1,number2)
  print "LSHIFT: " lshift(number1,2)
  print "RSHIFT: " rshift(number1,2)
}

$ awk -f bits.awk
AND: 9
OR: 31
XOR: 22
LSHIFT: 60
RSHIFT: 3

96. User Defined Functions

Awk allows you to define user defined functions. This is extremely helpful when you are writing a lot of awk code and end-up repeating certain pieces of code every time. Those pieces could be fit into a user defined function.

Syntax:

function fn-name(parameters)
{
  function-body
}

In the above syntax:

  • fn-name is the function name: Just like an awk variable, an awk user defined function name should begin with a letter. The rest of the characters can be numbers, or alphabetic characters, or underscore. Keywords cannot be used as function name.
  • parameters: Multiple parameters are separated by comma. You can also create a user defined function without any parameter.
  • function-body: One or more awk statements.

If you've already used a name for a variable inside the awk program, you cannot use the same name for your user defined function.

The following example creates a simple user defined function called discount that gives a discount in the prices for the specified percentage. For example, discount(10) gives 10% discount on the price.

For any items where the quantity is <= 10, it gives 10% discount, otherwise it gives 50% discount.

$ cat function.awk
BEGIN {
  FS=","
  OFS=","
}
{
  if ($5 <= 10)
  print $1,$2,$3,discount(10),$5
  else
  print $1,$2,$3,discount(50),$5
}
function discount(percentage)
{
  return $4 - ($4*percentage/100)
}

$ awk -f function.awk items.txt
101,HD Camcorder,Video,189,10
102,Refrigerator,Appliance,765,2
103,MP3 Player,Audio,135,15
104,Tennis Racket,Sports,95,20
105,Laser Printer,Office,427.5,5

Another good use of creating a custom function is to print debug messages.

Following is a simple mydebug function:

$ cat function-debug.awk
{
  i=2; total=0;
  while (i <= NF) {
  mydebug("quantity is " $i);
  total = total + $i;
  i++;
}
print "Item", $1, ":", total, "quantities sold";
}
  function mydebug ( message ) {
  printf("DEBUG[%d]>%s\n", NR, message )
}

Partial output is shown below.

$ awk -f function-debug.awk items-sold.txt
DEBUG[1]>quantity is 2
DEBUG[1]>quantity is 10
DEBUG[1]>quantity is 5
DEBUG[1]>quantity is 8
DEBUG[1]>quantity is 10
DEBUG[1]>quantity is 12
Item 101 : 47 quantities sold
DEBUG[2]>quantity is 0
DEBUG[2]>quantity is 1
DEBUG[2]>quantity is 4
DEBUG[2]>quantity is 3
DEBUG[2]>quantity is 0
DEBUG[2]>quantity is 2
Item 102 : 10 quantities sold

97. Language Independent Output (Internationalization)

When you write an awk script to print a report, you might specify the report header and footer information using the print command. You might define the header and footer static values in English. What if you want to execute the report output for some other language? You might end-up copying this awk script to another awk script and modify all the print statements to have the static values displayed in appropriate values.

Probably an easier way is to use internationalization where you can use the same awk script, but change the static values of the output during run time.

This technique is also helpful when you have a huge program, but you end-up changing the printed static output frequently for some reason. Or you might want the users to customize the awk output by changing the static displayed text to something of their own.

This simple example shows the 4 high level steps to implement internalization in awk.

Step 1 - Create text domain

Create a text domain and bind it to the directory where the awk program should look for the text domain. In this example it is set to the current directory.

$ cat iteminfo.awk
BEGIN {
  FS=","
  TEXTDOMAIN = "item"
  bindtextdomain(".")
  print _"START_TIME:" strftime("%a %b %d %H:%M:%S %Z
  %Y",systime());
  printf "%-3s\t", _"Num";
  printf "%-10s\t", _"Description"
  printf "%-10s\t", _"Type"
  printf "%-5s\t", _"Price"
  printf "%-3s\n", _"Qty"
  printf_"-----------------------------------------------------\n"
}
{
  printf "%-3d\t%-10s\t%-10s\t$%-.2f\t%03d\n",$1,$2,$3,$4,$5
}

Note: The above example has _ in front of all the strings that are allowed to be customized. Having _ (underscore) in front of a string doesn't change the way how it is printed, i.e. it will print without any issues as shown below.

$ awk -f iteminfo.awk items.txt
START_TIME:Sat Mar 05 09:15:13 PST 2011
Num Description Type Price Qty
-----------------------------------------------------
101 HD Camcorder Video $210.00 010
102 Refrigerator Appliance $850.00 002
103 MP3 Player Audio $270.00 015
104 Tennis Racket Sports $190.00 020
105 Laser Printer Office $475.00 005

Step 2: Generate .po

$ gawk --gen-po -f iteminfo.awk > iteminfo.po

$ cat iteminfo.po
#: iteminfo.awk:5
msgid "START_TIME:"
msgstr ""

#: iteminfo.awk:6
msgid "Num"
msgstr ""
#: iteminfo.awk:7
msgid "Description"
msgstr ""
#: iteminfo.awk:8
msgid "Type"
msgstr ""
#: iteminfo.awk:9
msgid "Price"
msgstr ""
#: iteminfo.awk:10
msgid "Qty"
msgstr ""
#: iteminfo.awk:11
msgid
"-----------------------------------------------------\
n"
""
msgstr ""

Now, modify this portable object file and change the message string accordingly. For example, if you want to call "Report Generated on:" (Instead of the "START_TIME:"), edit the iteminfo.po file and change the msgstr right below the msgid for "START_TIME:"

$ cat iteminfo.po
#: iteminfo.awk:5
msgid "START_TIME:"
msgstr "Report Generated On:"

Note: In this example, the rest of the msgstr strings are left empty.

Step 3: Create message object

Create message Object file (from the portable object file) using msgfmt command.

If the iteminfo.po has all the msgstr empty, it will not produce any message object file, as shown below.

$ msgfmt -v iteminfo.po
0 translated messages, 7 untranslated messages.

Since we created one message translation, it will create the messages.mo file.

$ msgfmt -v iteminfo.po
1 translated message, 6 untranslated messages.
$ ls -1 messages.mo
messages.mo

Copy this message object file to the message directory that you should create under current directory.

$ mkdir -p en_US/LC_MESSAGES
$ mv messages.mo en_US/LC_MESSAGES/item.mo

Note: The destination file name should match the name we gave in the TEXTDOMAIN variable of the original awk file. TEXTDOMAIN = "item"

Step 4: Verify the message

Now you see that it doesn't display "START TIME:" anymore. It should the translated string "Report Generated On:" in the output.

$ gawk -f iteminfo.awk items.txt
Report Generated On:Sat Mar 05 09:19:19 PST 2011
Num Description Type Price Qty
-----------------------------------------------------
101 HD Camcorder Video $210.00 010
102 Refrigerator Appliance $850.00 002
103 MP3 Player Audio $270.00 015
104 Tennis Racket Sports $190.00 020
105 Laser Printer Office $475.00 005

98. Two Way Communication

Awk can communication to an external process using "|&", which is two way communication.

The following simple sed example substitutes the word "Awk" with "Sed and Awk".

$ echo "Awk is great" | sed 's/Awk/Sed and Awk/'
Sed and Awk is great

To understand how the two way communication from Awk works, the following awk script simulates the above simple example using "|&"

$ cat two-way.awk
BEGIN {
  command = "sed 's/Awk/Sed and Awk/'"
  print "Awk is Great!" |& command
  close(command,"to");
  command |& getline tmp
  print tmp;
  close(command);
}
$ awk -f two-way.awk
Sed and Awk is Great!

In the above example:

  • • command = "sed 's/Awk/Sed and Awk/'" -- This is the command to which we are going to establish the two way communication from awk. This is a simple sed substitute command, that will replace "Awk" with "Sed and Awk".
  • • print "Awk is Great!" |& command -- The input to the command. i.e. The input to the sed substitute command is "Awk is Great!". The "|&" indicates that it is a two way communication. The input to the command on the right side to the "|&" comes from the left side.
  • • close(command,"to") - Once the process is executed, you should close the "to" process.
  • • command |& getline tmp - Now that the process is completed, it is time to get the output of the process using the getline. The output of the previously executed command will now be stored in the variable "tmp".
  • • print tmp - This prints the output.
  • • close(command) - Finally, close the command.

Two way communication can come-in handy when you rely heavily on output from external programs.

99. Awk System Function

You can use the system built-in function to execute system commands. Please note that there is a difference between two way communication and system command.

In "|&", you can pass the output of any awk command as input to an external command, and you can receive the output from the external command in your awk program (basically it is two way communication).

Using the system command, you can pass any string as a parameter, which will get executed exactly as given in the OS command line, and the output will be returned (which is not same as the two way communication).

The following are some simple examples of calling pwd and date command from awk:

$ awk 'BEGIN { system("pwd") }'
/home/ramesh

$ awk 'BEGIN { system("date") }'
Sat Mar 5 09:19:47 PST 2011

When you are executing a long awk program, you might want it to send an email when the program starts and when it ends. The following example shows how you can use system command in the BEGIN and END block to send you an email when it starts and completes.

$ cat system.awk
BEGIN {
  system("echo 'Started' | mail -s 'Program system.awk
  started..' ramesh@thegeekstuff.com");
}
{
  split($2,quantity,",");
  total=0;
  for (x in quantity)
  total=total+quantity[x];
  print "Item", $1, ":", total, "quantities sold";
}
END {
  system("echo 'Completed' | mail -s 'Program system.awk
  completed..' ramesh@thegeekstuff.com");
}

$ awk -f system.awk items-sold.txt
Item 101 : 2 quantities sold
Item 102 : 0 quantities sold
Item 103 : 10 quantities sold
Item 104 : 2 quantities sold
Item 105 : 10 quantities sold

100. Timestamp Functions

These are available only in GAWK.

As you see from the example below, systime() returns the time in POSIX epoch time, i.e. the number of seconds elapsed since January 1, 1970.

$ awk 'BEGIN { print systime() }'
1299345651

The systime function becomes more useful when you use the strftime function to convert the epoch time to a readable format.

The following example displays the current timestamp in a readable format using systime and strftime function.

$ awk 'BEGIN { print strftime("%c",systime()) }'
Sat 05 Mar 2011 09:21:10 AM PST

The following awk script shows various possible date formats.

$ cat strftime.awk
BEGIN {
  print "--- basic formats --"
  print strftime("Format 1: %m/%d/%Y %H:%M: 
  %S",systime())
  print strftime("Format 2: %m/%d/%y %I:%M:%S
  %p",systime())
  print strftime("Format 3: %m-%b-%Y %H:%M:
  %S",systime())
  print strftime("Format 4: %m-%b-%Y %H:%M:%S
  %Z",systime())
  print strftime("Format 5: %a %b %d %H:%M:%S %Z
  %Y",systime())
  print strftime("Format 6: %A %B %d %H:%M:%S %Z
  %Y",systime())
  print "--- quick formats --"
  print strftime("Format 7: %c",systime())
  print strftime("Format 8: %D",systime())
  print strftime("Format 8: %F",systime())
  print strftime("Format 9: %T",systime())
  print strftime("Format 10: %x",systime())
  print strftime("Format 11: %X",systime())
  print "--- single line format with %t--"
  print strftime("%Y %t%B %t%d",systime())
  print "--- multi line format with %n --"
  print strftime("%Y%n%B%n%d",systime())
}

$ awk -f strftime.awk
--- basic formats --
Format 1: 03/05/2011 09:26:03
Format 2: 03/05/11 09:26:03 AM
Format 3: 03-Mar-2011 09:26:03
Format 4: 03-Mar-2011 09:26:03 PST
Format 5: Sat Mar 05 09:26:03 PST 2011
Format 6: Saturday March 05 09:26:03 PST 2011
--- quick formats --
Format 7: Sat 05 Mar 2011 09:26:03 AM PST
Format 8: 03/05/11
Format 8: 2011-03-05
Format 9: 09:26:03
Format 10: 03/05/2011
Format 11: 09:26:03 AM
--- single line format with %t--
2011 March 05
--- multi line format with %n --
2011
March
05

Following are the various time format identifiers you can use in the strftime function. Please note that all the abbreviations shown below depend on your locale setting. These examples are shown for English (en).

Basic Time Formats:

  • %m Month in two number format. January is shown as 01
  • %b Month abbreviated. January is shown as Jan
  • %B Month displayed fully. January is shown as January.
  • %d Day in two number format. 4th of the month is shown as 04.
  • %Y Year in four number format. For example: 2011
  • %y Year in two number format. 2011 is shown as 11.
  • %H Hour in 24 hour format. 1 p.m is shown as 13
  • %I Hour in 12 hour format. 1 p.m is shown as 01.
  • %p Displays AM or PM. Use this along with %I 12 hour format.
  • %M Minute in two character format. 9 minute is shown as 09.
  • %S Seconds in two character format. 5 seconds is shown as 05
  • %a Day of the week shown in three character format. Monday is shown as Mon.
  • %A Day of the week shown fully. Monday is shown as Monday.
  • %Z Time zone. Pacific standard time is shown as PST.
  • %n Displays a new line character
  • %t Displays a tab character

Quick Time Formats:

  • %c Displays the date in current locale full format. For example: Fri 11 Feb 2011 02:45:03 AM PST
  • %D Quick date format. Same as %m/%d/%y
  • %F Quick date format. Same as %Y-%m-%d
  • %T Quick time format. Same as %H:%M:%S
  • %x Date format based on your locale.
  • %X Time format based on your locale.

101. getline Command

As you already know, the body block of an awk script gets executed once for every line in the input file. You don't have any control over it, as awk does it automatically.

However using the getline command, you can control the reading of lines from the input-file (or from some other file). Note that after getline is executed, the awk script sets the value of NF, NR, FNR, and $0 built-in variables appropriately.

Simple getline

$ awk -F"," '{getline; print $0;}' items.txt
102,Refrigerator,Appliance,850,2
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5

When you just specify getline in the body block, awk reads the next line from the input-file. In this example, the 1st statement in the body block is getline. So, even though awk already read the 1st line from the input-file, getline reads the next line, as we are explicitly requesting the next line from the input-file. So, executing 'print $0' after getline makes awk print the 2nd line.

Here is how it works:

  • At the beginning of the body block, before executing any statement, awk reads the 1st line of the items.txt and stores it in $0
  • getline - we are forcing awk to read the next line from the input file and store it in the built-in $0 variable.
  • print $0 - since the 2nd line is read into $0, print $0 will print the 2nd line (And not the 1st line).
  • The body block continues in the same way for rest of the lines in the items.txt and prints only the even numbered lines.

getline to a variable

You can also get the next line from the input file into a variable (instead of reading it to $0).
The following example prints only the odd numbered lines.

$ awk -F"," '{getline tmp; print $0;}' items.txt
101,HD Camcorder,Video,210,10
103,MP3 Player,Audio,270,15
105,Laser Printer,Office,475,5

Here is how it works:

  • At the beginning of the body block, before executing any statement, awk reads the 1st line of the items.txt and stores it in $0
  • getline tmp - We are forcing awk to read the next line from the input file and store it in the tmp variable.
  • print $0 - $0 still contains the 1st line, as "getline tmp" didn't overwrite the value of $0. So, print $0 will print the 1st line (and not the 2nd line).
  • The body block continues in the same way for rest of the lines in the items.txt and prints only the odd numbered lines.

The following example prints both $0 and tmp. As you see below, $0 contains the odd numbered lines and tmp contains the even numbered lines.

$ awk -F"," '{getline tmp; print "$0->", $0; print
"tmp->", tmp;}' items.txt
$0-> 101,HD Camcorder,Video,210,10
tmp-> 102,Refrigerator,Appliance,850,2
$0-> 103,MP3 Player,Audio,270,15
tmp-> 104,Tennis Racket,Sports,190,20
$0-> 105,Laser Printer,Office,475,5
tmp-> 104,Tennis Racket,Sports,190,20

getline from a different file

The previous two examples read the line from the given input-file itself. Using getline you can also read lines from a different file (than the current input-file) as shown below.

Switch back and forth between two files, printing lines from each.

$ awk -F"," '{print $0; getline < "items-sold.txt";
print $0;}' items.txt
101,HD Camcorder,Video,210,10
101 2 10 5 8 10 12
102,Refrigerator,Appliance,850,2
102 0 1 4 3 0 2
103,MP3 Player,Audio,270,15
103 10 6 11 20 5 13
104,Tennis Racket,Sports,190,20
104 2 3 4 0 6 5
105,Laser Printer,Office,475,5
105 10 2 5 7 12 6

Here is how it works:

  • At the beginning of the body block, before executing any statement, awk reads the 1st line of items.txt and stores it in $0
  • print $0 - Prints the 1st line from items.txt • getline < "items-sold.txt" - Reads the 1st line from itemssold. txt and stores it in $0.
  • print $0 - Prints the 1st line from items-sold.txt (not from items.txt)
  • The body block continues in the same way for the rest of the lines in items.txt and items-sold.txt

getline from a different file to a variable

Rather than reading both files into $0, you can also use the "getline var" format to read lines from a different file into a variable.

Switch back and forth between two files, printing lines from each (using tmp var).

$ awk -F"," '{print $0; getline tmp < "items-sold.txt";
print tmp;}' items.txt
101,HD Camcorder,Video,210,10
101 2 10 5 8 10 12
102,Refrigerator,Appliance,850,2
102 0 1 4 3 0 2
103,MP3 Player,Audio,270,15
103 10 6 11 20 5 13
104,Tennis Racket,Sports,190,20
104 2 3 4 0 6 5
105,Laser Printer,Office,475,5
105 10 2 5 7 12 6

This is identical to the previous example except that it stores the lines from the second file in the variable tmp.

getline to execute external command

You can also use getline to execute a UNIX command and get its output.

The following example gets the output of the date command and prints it. Please note that you should also close the command that you just executed as shown below. The output of the date command is stored in the $0 variable.

Use this method to print timestamp on your report's header or footer.

$ cat getline1.awk
BEGIN {
  FS=",";
  "date" | getline
  close("date")
  print "Timestamp:" $0
}
{
  if ( $5 <= 5 )
  print "Buy More: Order", $2, "immediately!"
  else
  print "Sell More: Give discount on", $2,
  "immediately!"
}

$ awk -f getline1.awk items.txt
Timestamp:Sat Mar 5 09:29:22 PST 2011
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!

Instead of storing the output in the $0 variable, you can also store it in any awk variable (for example: timestamp) as shown below.

$ cat getline2.awk
BEGIN {
  FS=",";
  "date" | getline timestamp
  close("date")
  print "Timestamp:" timestamp
}
{
  if ( $5 <= 5 )
  print "Buy More: Order", $2, "immediately!"
  else
  print "Sell More: Give discount on", $2,
  "immediately!"
}

$ awk -f getline2.awk items.txt
Timestamp:Sat Mar 5 09:38:22 PST 2011
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!