Awk

Awk¶

Execution Model¶

Given the following file

line 1
line 2

If passed through awk every line of the document will go through each pattern

Pattern1 { ACTIONS; }
Pattern2 { ACTIONS; }
Pattern3 { ACTIONS; }
Pattern4 { ACTIONS; }

If “line 1” matches Pattern1 then ACTIONS will be executed. Then “line 2” will be tested against Pattern2, if it doesn’t match, it skips to Pattern3.

Data types¶

Strings and Numbers.¶

Awk likes to convert them into each other. Strings can be interpreted as numerals to convert their values to numbers. If the string doesn’t look like a numeral, it’s 0. – https://ferd.ca/awk-in-20-minutes.html

Variables can be declated in the ACTIONS part of your code using the = operator and if not initialized are given the default value of ""

Arrays¶

var[key] = value

Patterns¶

Awk regular expressions are not PCRE but gawk will support fancier regex stuff.

/admin/  { ... }  # any line that contains 'admin'
/^admin/ { ... } # lines that begin with 'admin'

Patterns cannot capture sepecific groups to make them available in the ACTIONS part of the code.

&&          # and
||          # or
!           # not
"23" == 23  # true
!=
>
<
>=
>=
%
2^2 # exponential
+=
-=
/=
%=
^=

patterns are optional

{ ACTIONS }

Runs ACTIONS for every line of input

BEGIN matches before any lines of input, where you initialize variables and state in your script END matches after whole input has been handled

Fields¶

fields (be default) are separated by white space $0 represents the entire line $1 to $n represents all the bits separated by whitespace

According to the following line

$1         $2    $3
00:34:23   GET   /foo/bar.html
\_____________  _____________/
              $0

# Hack attempt?
/admin.html$/ && $2 == "DELETE" {
  print "Hacker Alert!";
}

You can modify the line by assigning to its fields. So $0 = "foo" would overwrite the line before it’s evaluated by the next pattern.

Actions¶

{ print $0; } # prints $0
{ exit; } # ends program
{ next; } # skips to next line of input (similar to continue in a forloop?)
{ a=$1; b=$0 } # variable assignment
{ c[$1] = $2 } # variable assignment (array)
{ if (BOOLEAN) { ACTION }
    else if (BOOLEAN) { ACTION }
    else { ACTION }
}
{ for (i=1; i<x; i++) { ACTION } }
{ for (item in c) { ACTION } }

# ternary operator
{ print (count == 0) ? "Foo Bar" }
{ while (a > 10) {
    print "foo"
    a++
  }
}

# regular expression matching
{ if ("foo" ~ "^fo+$")
    print "Bar!"
}

{ if ("foo" in bar_list)
    print "Foo"
}

# Most AWK implementations have some standard trig functions
{
    sin(a)
    cos(a)
    atan2(a, b)
    exp(a)
    log(a)
    sqrt(a)
    int(5.34)
    srand()  # random argument (by default time of day)
    rand() # random number between 0 and 1
}

variables are all global

Functions¶

{ somecall($2) }

# function arguments are call-by-value
function name(parameter-list) {
     ACTIONS; # same actions as usual
}

# return is a valid keyword
function add1(val) {
     return val+1;
}

Built-in Functions

Function arguments are local to the function
AWK allows you to define more function arguments than it needs

String Functions¶

function string_functions(    localvar, arr) {

    # Search and replace, first instance (sub) or all instances (gsub)
    # Both return number of matches replaced
    localvar = "fooooobar"
    sub("fo+", "Meet me at the ", localvar) # localvar => "Meet me at the bar"
    gsub("e+", ".", localvar) # localvar => "m..t m. at th. bar"

    # Search for a string that matches a regular expression
    # index() does the same thing, but doesn't allow a regular expression
    match(localvar, "t") # => 4, since the 't' is the fourth character

    # Split on a delimiter
    split("foo-bar-baz", arr, "-") # a => ["foo", "bar", "baz"]

    # Other useful stuff
    sprintf("%s %d %d %d", "Testing", 1, 2, 3) # => "Testing 1 2 3"
    substr("foobar", 2, 3) # => "oob"
    substr("foobar", 4) # => "bar"
    length("foo") # => 3
    tolower("FOO") # => "foo"
    toupper("foo") # => "FOO"
}

via https://learnxinyminutes.com/docs/awk/

IO¶

AWK will automatically open a file handle for you when you use something that needs one

print "foobar" >"/tmp/foobar.txt"  # Looks similar to shell scripting, opens file for you
close("/tmp/foobar.txt")  # Will close file opened by AWK

# Call into the shell:
system("echo foobar") # => prints foobar

# Reads a line from standard input and stores in localvar
getline localvar

# Reads a line from a pipe
"echo foobar" | getline localvar # localvar => "foobar"
close("echo foobar")


# Reads a line from a file and stores in localvar
getline localvar <"/tmp/foobar.txt"
close("/tmp/foobar.txt")

# This will print the second and fourth fields in the line
print $2, $4

# Prints the number of fields on this line
print NF

# Print the last field on this line
print $NF

Special Variables¶

BEGIN { # Can be modified by the user
    FS  = ",";  # Field Separator
    RS  = "\n"; # Record Separator (lines)
    OFS = " ";  # Output Field Separator
    ORS = "\n"; # Output Record Separator (lines)
}
{ # Can't be modified by the user
    NF           # Number of fields in record (line)
    NR           # Number of Records so far (lines?)
    ARGC         # Number of commandline arguments
    ARGV         # Arguments passed in
}

FILENAME  # name of the file being processed

END {  # run after processing all the text files

}

Example¶

$ awk -f foo.awk logs.txt
$ awk 'print $1, $(NF-2)' logs.txt
$ grep something | awk '{print $1}'