Awk
Awk¶
Execution Model¶
Given the following file
line 1
line 2
If passed through awk every line of the document will go through each pattern
Pattern1 { ACTIONS; }
Pattern2 { ACTIONS; }
Pattern3 { ACTIONS; }
Pattern4 { ACTIONS; }
If “line 1” matches Pattern1 then ACTIONS will be executed. Then “line 2”
will be tested against Pattern2, if it doesn’t match, it skips to Pattern3.
Data types¶
Strings and Numbers.¶
Awk likes to convert them into each other. Strings can be interpreted as numerals to convert their values to numbers. If the string doesn’t look like a numeral, it’s 0. – https://ferd.ca/awk-in-20-minutes.html
Variables can be declated in the ACTIONS part of your code using the =
operator and if not initialized are given the default value of ""
Arrays¶
var[key] = value
Patterns¶
Awk regular expressions are not PCRE but gawk will support fancier regex
stuff.
/admin/ { ... } # any line that contains 'admin'
/^admin/ { ... } # lines that begin with 'admin'
Patterns cannot capture sepecific groups to make them available in the
ACTIONS part of the code.
&& # and
|| # or
! # not
"23" == 23 # true
!=
>
<
>=
>=
%
2^2 # exponential
+=
-=
/=
%=
^=
patterns are optional
{ ACTIONS }
Runs ACTIONS for every line of input
BEGIN matches before any lines of input, where you initialize variables and state in your script
END matches after whole input has been handled
Fields¶
fields (be default) are separated by white space $0 represents the entire
line $1 to $n represents all the bits separated by whitespace
According to the following line
$1 $2 $3
00:34:23 GET /foo/bar.html
\_____________ _____________/
$0
# Hack attempt?
/admin.html$/ && $2 == "DELETE" {
print "Hacker Alert!";
}
You can modify the line by assigning to its fields. So $0 = "foo" would
overwrite the line before it’s evaluated by the next pattern.
Actions¶
{ print $0; } # prints $0
{ exit; } # ends program
{ next; } # skips to next line of input (similar to continue in a forloop?)
{ a=$1; b=$0 } # variable assignment
{ c[$1] = $2 } # variable assignment (array)
{ if (BOOLEAN) { ACTION }
else if (BOOLEAN) { ACTION }
else { ACTION }
}
{ for (i=1; i<x; i++) { ACTION } }
{ for (item in c) { ACTION } }
# ternary operator
{ print (count == 0) ? "Foo Bar" }
{ while (a > 10) {
print "foo"
a++
}
}
# regular expression matching
{ if ("foo" ~ "^fo+$")
print "Bar!"
}
{ if ("foo" in bar_list)
print "Foo"
}
# Most AWK implementations have some standard trig functions
{
sin(a)
cos(a)
atan2(a, b)
exp(a)
log(a)
sqrt(a)
int(5.34)
srand() # random argument (by default time of day)
rand() # random number between 0 and 1
}
variables are all global
Functions¶
{ somecall($2) }
# function arguments are call-by-value
function name(parameter-list) {
ACTIONS; # same actions as usual
}
# return is a valid keyword
function add1(val) {
return val+1;
}
- Function arguments are local to the function
- AWK allows you to define more function arguments than it needs
String Functions¶
function string_functions( localvar, arr) {
# Search and replace, first instance (sub) or all instances (gsub)
# Both return number of matches replaced
localvar = "fooooobar"
sub("fo+", "Meet me at the ", localvar) # localvar => "Meet me at the bar"
gsub("e+", ".", localvar) # localvar => "m..t m. at th. bar"
# Search for a string that matches a regular expression
# index() does the same thing, but doesn't allow a regular expression
match(localvar, "t") # => 4, since the 't' is the fourth character
# Split on a delimiter
split("foo-bar-baz", arr, "-") # a => ["foo", "bar", "baz"]
# Other useful stuff
sprintf("%s %d %d %d", "Testing", 1, 2, 3) # => "Testing 1 2 3"
substr("foobar", 2, 3) # => "oob"
substr("foobar", 4) # => "bar"
length("foo") # => 3
tolower("FOO") # => "foo"
toupper("foo") # => "FOO"
}
IO¶
AWK will automatically open a file handle for you when you use something that needs one
print "foobar" >"/tmp/foobar.txt" # Looks similar to shell scripting, opens file for you
close("/tmp/foobar.txt") # Will close file opened by AWK
# Call into the shell:
system("echo foobar") # => prints foobar
# Reads a line from standard input and stores in localvar
getline localvar
# Reads a line from a pipe
"echo foobar" | getline localvar # localvar => "foobar"
close("echo foobar")
# Reads a line from a file and stores in localvar
getline localvar <"/tmp/foobar.txt"
close("/tmp/foobar.txt")
# This will print the second and fourth fields in the line
print $2, $4
# Prints the number of fields on this line
print NF
# Print the last field on this line
print $NF
Special Variables¶
BEGIN { # Can be modified by the user
FS = ","; # Field Separator
RS = "\n"; # Record Separator (lines)
OFS = " "; # Output Field Separator
ORS = "\n"; # Output Record Separator (lines)
}
{ # Can't be modified by the user
NF # Number of fields in record (line)
NR # Number of Records so far (lines?)
ARGC # Number of commandline arguments
ARGV # Arguments passed in
}
FILENAME # name of the file being processed
END { # run after processing all the text files
}
Example¶
$ awk -f foo.awk logs.txt
$ awk 'print $1, $(NF-2)' logs.txt
$ grep something | awk '{print $1}'