Chapter 12. Awk Associative Arrays

77. Assigning Array Elements

Arrays in awk are extremely powerful when compared to the traditional arrays that you might have used in other programming languages. In Awk, arrays are associative, i.e. an array contains multiple index/value pairs. The index doesn't need to be a continuous set of numbers; in fact it can be a string or a number, and you don't need to specify the size of the array. Syntax:
arrayname[string]=value
  • arrayname is the name of the array.
  • string is the index of an array.
  • value is any value assigning to the element of the array.

Accessing elements of the AWK array

If you want to access a particular element in an array, you use the format arrayname[index], which gives you the value assigned to that index. The following is a simple array assignment example:
$ cat array-assign.awk
BEGIN {
  item[101]="HD Camcorder";
  item[102]="Refrigerator";
  item[103]="MP3 Player";
  item[104]="Tennis Racket";
  item[105]="Laser Printer";
  item[1001]="Tennis Ball";
  item[55]="Laptop";
  item["na"]="Not Available";
  print item["101"];
  print item[102];
  print item["103"];
  print item[104];
  print item["105"];
  print item[1001];
  print item[55];
  print item["na"];
}
$ awk -f array-assign.awk
HD Camcorder
Refrigerator
MP3 Player
Tennis Racket
Laser Printer
Tennis Ball
Laptop
Not Available
Please note the following in the above example:
  • Array indexes are not in sequence. It didn't even have to start from 0 or 1. It really started from 101 .. 105, then jumped to 1001, then came down to 55, then it had a string index "na".
  • Array indexes can be string. The last item in this array has an index string. i.e. "na" is the index.
  • You don't need to initialize or even define the array in awk; you don't need to specify the total array size before you have to use it.
  • The naming convention of an awk array is same as the naming convention of an awk variable.
From awk's point of view, the index of the array is always a string. Even when you pass a number for the index, awk will treat it as string index. Both of the following are the same.
item[101]="HD Camcorder"
item["101"]="HD Camcorder"

78. Referring to Array Elements

You can directly print an array element using print command as shown below, or you can assign the array item to another variable for additional manipulation inside awk program.
print item[101]
x=item[105]
If you refer to an array element that doesn't exist, awk will automatically create that array element with the given index, and assign null value to it. If you want to avoid this, check if the index is valid before accessing the array element. You can check whether a particular array index exists by using the following if condition syntax. This will return true, if the index exists in the array.
if ( index in array-name )
The following is a simple array reference example:
$ cat array-refer.awk
BEGIN {
x = item[55];
if ( 55 in item )
  print "Array index 55 contains",item[55];
  item[101]="HD Camcorder";
if ( 101 in item )
  print "Array index 101 contains",item[101];
if ( 1010 in item )
  print "Array index 1010 contains",item[1010];
}

$ awk -f array-refer.awk
Array index 55 contains
Array index 101 contains HD Camcorder
In the above example:
  • item[55] is not assigned with any value earlier. But it is referred in "x = item[55]", so awk will automatically create this array element with null value.
  • item[101] is assigned a value. So, when you check for index 101, it is present.
  • item[1010] does not exist. So, when you check for index 1010, it is not present.

79. Browse the Array using For Loop

If you want to access all the array elements, you can use a special instance of the for loop to go through all the indexes of an array: Syntax:
for (var in arrayname)
actions
  • var is any variable name
  • in is a keyword
  • arrayname is the name of the array.
  • actions are list of awk statements to be executed. If you want to execute more than one action, it has to be enclosed within braces. The loop executes list of actions for each element in the array, by setting the variable var to the index of the corresponding element.
In the following example: In "for (x in item)", x can be any variable, which holds the index. Please note that we don't have any conditions to verify how many times the condition should loop through. We really don't care how many items are there in the array, as the awk for loop will automatically take care of it, and loop through all the items before exiting the for loop. The following is a simple for loop example that loops through all the elements in the item array and prints it.
$ cat array-for-loop.awk
BEGIN {
  item[101]="HD Camcorder";
  item[102]="Refrigerator";
  item[103]="MP3 Player";
  item[104]="Tennis Racket";
  item[105]="Laser Printer";
  item[1001]="Tennis Ball";
  item[55]="Laptop";
  item["na"]="Not Available";
  for (x in item)
  print item[x];
}

$ awk -f array-for-loop.awk
Laptop
HD Camcorder
Refrigerator
MP3 Player
Tennis Racket
Laser Printer
Not Available
Tennis Ball

80. Delete Array Element

If you want to remove an element from a particular index of an array, use awk delete statement. Once you delete an element from an awk array, you can no longer obtain its value. Syntax:
delete arrayname[index];
The loop command below removes all elements from an array.
for (var in array)
delete array[var]
In GAWK, you can specify the following single command to delete all the elements from an array.
delete array
Also, as shown in the example below, item[103]="" does not delete the array element. It just stores null values in it.
$ cat array-delete.awk
BEGIN {
  item[101]="HD Camcorder";
  item[102]="Refrigerator";
  item[103]="MP3 Player";
  item[104]="Tennis Racket";
  item[105]="Laser Printer";
  item[1001]="Tennis Ball";
  item[55]="Laptop";
  item["na"]="Not Available";
  delete item[102];
  item[103]="";
  delete item[104];
  delete item[1001];
  delete item["na"];
  for (x in item)
  print "Index",x,"contains",item[x];
}
$ awk -f array-delete.awk
Index 55 contains Laptop
Index 101 contains HD Camcorder
Index 103 contains
Index 105 contains Laser Printer

81. Multi Dimensional Array

Awk has only one dimensional array. But, the beauty of awk is that you can simulate a multi dimensional array using the single dimensional array itself. Suppose you want to create the following 2 x 2 multi dimensional array.
10 20
30 40
In the above example, item at location "1,1" is 10, item at location "1,2" is 20, etc. Do the following to assign 10 to location "1,1".
item["1,1"]=10
Even though you've given "1,1" as index, it is not two indexes. It is just one index with the string "1,1". So, in the above example, you are really storing the value 10 at a single dimensional array with index "1,1".
$ cat array-multi.awk
BEGIN {
  item["1,1"]=10;
  item["1,2"]=20;
  item["2,1"]=30;
  item["2,2"]=40;
  for (x in item)
  print item[x];
}

$ awk -f array-multi.awk
10
20
30
40
Now, what happens when you don't enclose the indexes within quotes? i.e. item[1,1] (instead of item["1,1"]), as shown in the example below.
$ cat array-multi2.awk
BEGIN {
  item[1,1]=10;
  item[1,2]=20;
  item[2,1]=30;
  item[2,2]=40;
  for (x in item)
  print item[x];
}

$ awk -f array-multi2.awk
30
40
10
20
The above sample program will still work. But, there is a difference. In a multi-dimensional awk array, when you don't enclose the indexes within quotes, awk uses a subscript separator with default value of "\034". When you specify item[1,2], it will be translated to item["1\0342"]. Awk will combine both the subscripts using \034 in between and convert them to string. When you specify item["1,2"], it will not be translated, as it will be treated just as a one dimensional array with no subscripts. This is demonstrated in the example below.
$ cat array-multi3.awk
BEGIN {
  item["1,1"]=10;
  item["1,2"]=20;
  item[2,1]=30;
  item[2,2]=40;
  for (x in item)
  print "Index",x,"contains",item[x];
}

$ awk -f array-multi3.awk
Index 1,1 contains 10
Index 1,2 contains 20
Index 2#1 contains 30
Index 2#2 contains 40
In the above example:
  • Indexes "1,1" and "1,2" are enclosed in quotes. So, this is treated as a one dimensional array index, no subscript separator is used by awk. So, the index gets printed as is.
  • Indexes 2,1 and 2,2 are not enclosed in quotes. So, this is treated as a multi-dimensional array index, and awk uses a subscript separator. So, the index is "2\0341" and "2\0342", which is printed with the non-printable character "\034" between the subscripts.

82. SUBSEP - Subscript Separator

You can change the default subscript separator to anything you like using the SUBSEP variable. In the following example, SUBSEP is set to colon.
$ cat array-multi4.awk
BEGIN {
  SUBSEP=":";
  item["1,1"]=10;
  item["1,2"]=20;
  item[2,1]=30;
  item[2,2]=40;
  for (x in item)
  print "Index",x,"contains",item[x];
}

$ awk -f array-multi4.awk
Index 1,1 contains 10
Index 1,2 contains 20
Index 2:1 contains 30
Index 2:2 contains 40
In the above example, indexes "1,1" and "1,2" didn't use the SUBSEP because they were enclosed in quotes. So, for a multi-dimensional awk array, the best practice is not to enclose any of the indexes within quotes, as shown below.
$ cat array-multi5.awk
BEGIN {
  SUBSEP=":";
  item[1,1]=10;
  item[1,2]=20;
  item[2,1]=30;
  item[2,2]=40;
  for (x in item)
  print "Index",x,"contains",item[x];
}

$ awk -f array-multi5.awk
Index 1:1 contains 10
Index 1:2 contains 20
Index 2:1 contains 30
Index 2:2 contains 40

83. Sort Array Values using asort

The asort function sorts the array values and stores them in indexes from 1 through n. Where n is the total number of elements in the array. Suppose you have two elements in the array: item["something"]="B - I'm big b" and item["notsure"]="A - I'm big a". After an asort function call, the array will be sorted based on the values to: item[1]="A - I'm big a" and item[2]="B - I'm big b". In the following example, we have array indexes with various nonconsecutive numbers and strings. After the asort, the array values will be sorted and stored in the indexes 1,2,3,4,... Please note that asort returns the total number of items in the array.
$ cat asort.awk
BEGIN {
  item[101]="HD Camcorder";
  item[102]="Refrigerator";
  item[103]="MP3 Player";
  item[104]="Tennis Racket";
  item[105]="Laser Printer";
  item[1001]="Tennis Ball";
  item[55]="Laptop";
  item["na"]="Not Available";
  print "------Before asort------"
  for (x in item)
  print "Index",x,"contains",item[x];
  total = asort(item);
  print "------After asort------"
  for (x in item)
  print "Index",x,"contains",item[x];
  print "Return value from asort:", total;
}

$ awk -f asort.awk
------Before asort------
Index 55 contains Laptop
Index 101 contains HD Camcorder
Index 102 contains Refrigerator
Index 103 contains MP3 Player
Index 104 contains Tennis Racket
Index 105 contains Laser Printer
Index na contains Not Available
Index 1001 contains Tennis Ball
------After asort------
Index 4 contains MP3 Player
Index 5 contains Not Available
Index 6 contains Refrigerator
Index 7 contains Tennis Ball
Index 8 contains Tennis Racket
Index 1 contains HD Camcorder
Index 2 contains Laptop
Index 3 contains Laser Printer
Return value from asort: 8
In the above example, after the asort, the array elements are not printed from indexes 1 through 8. Instead, it is random. You can print them from 1 through 8 as shown in the example below.
$ cat asort1.awk
BEGIN {
  item[101]="HD Camcorder";
  item[102]="Refrigerator";
  item[103]="MP3 Player";
  item[104]="Tennis Racket";
  item[105]="Laser Printer";
  item[1001]="Tennis Ball";
  item[55]="Laptop";
  item["na"]="Not Available";
  total = asort(item);
  for (i=1; i<= total; i++)
  print "Index",i,"contains",item[i];
}

$ awk -f asort1.awk
Index 1 contains HD Camcorder
Index 2 contains Laptop
Index 3 contains Laser Printer
Index 4 contains MP3 Player
Index 5 contains Not Available
Index 6 contains Refrigerator
Index 7 contains Tennis Ball
Index 8 contains Tennis Racket
As you may have noticed in the above examples, once asort is executed, you'll lose the original indexes forever. So, instead of overwriting the original array with the new indexes, you might want to create a new array with the new indexes. In the following example, the original array "item" is not modified. Instead, the "itemnew" array will contain the new indexes. i.e. itemnew[1], itemnew[2], itemnew[3], etc.
total = asort(item, itemnew);
Again, remember that asort sorts the array values. But, instead of using the original indexes, it uses new indexes from 1 through n. Original indexes are lost.

84. Sort Array Indexes using asorti

Just like sorting array values, you can take all the array indexes, sort them, and store them in a new array using asorti. The following example shows how asorti differs from asort. Keep the following in mind:
  • asorti sorts the indexes (not the values) and stores them as values.
  • If you specify asorti(state), you'll lose the original values. i.e. the indexes will now become the values. So, to be on safe side, always specify two parameters to the asorti function. i.e. asorti(state,stateabbr). This way, the original array (state), it not overwritten.
$ cat asorti.awk
BEGIN {
  state["TX"]="Texas";
  state["PA"]="Pennsylvania";
  state["NV"]="Nevada";
  state["CA"]="California";
  state["AL"]="Alabama";
  print "----- Function: asort -----"
  total = asort(state,statedesc);
  for (i=1; i<= total; i++)
  print "Index",i,"contains",statedesc[i];
  print "----- Function: asorti -----"
  total = asorti(state,stateabbr);
  for (i=1; i<= total; i++)
  print "Index",i,"contains",stateabbr[i];
}

$ awk -f asorti.awk
----- Function: asort -----
Index 1 contains Alabama
Index 2 contains California
Index 3 contains Nevada
Index 4 contains Pennsylvania
Index 5 contains Texas
----- Function: asorti -----
Index 1 contains AL
Index 2 contains CA
Index 3 contains NV
Index 4 contains PA
Index 5 contains TX