Awk is a Command Line Interfacetool for splicing text.

Good guide on awk here. Another good tutorial here. Should always use single quotes. $0 means the current line.

Example Data

Date        Open        High        Low         Close       Volume     Adj Close
2016-03-24  98.639999   98.849998   97.07       98.360001   10646900   98.360001
2016-03-23  99.75       100.389999  98.809998   99.589996   8292300    99.589996
2016-03-22  100.480003  101.519997  99.199997   99.839996   9039500    99.839996
2016-03-21  101.150002  102.099998  99.50       101.059998  9562900    101.059998
2016-03-18  100.50      102.410004  100.010002  101.120003  15437300   101.120003

Read in data not tab separated

Can specify separator with flag -F:

cat mydata.csv | awk -F, '{print $5}'     # split columns on commas

Can also specify a regex for -F:

cat mydata.csv | awk -F'[,-]' '{print $3, "--", $0}'

Print 2nd Column

cat mydata.tsv | awk '{print $2}'

Print 1st, 6th, 5th Column

Commas between values will insert spaces:

cat mydata.tsv | awk '{print $1, $6, $5}'

Print CSV or another format

Commas (or another seperator) in quotes will print out:

cat mydata.tsv | awk '{print $1 "," $6}'

Proper way to specify output separator in AWK is to use OFS:

cat mydata.tsv | awk 'BEGIN {OFS=","} {print $1, $6}'

Match Text

Can be used as an alternative to grep.

echo -e "Test\nMatch" | awk '/Test/'

Regex Match Text

Can be used as an alternative to grep.

echo -e "2015-04-05\n2016-07-08\n2015-12-31" | awk '/^2015-/'

Regex Match Text on a column

Will only match specific lines with the column that matches:

cat mydata.tsv | awk '$1 ~ /^2015-/'

Everything but matched Text

Can be used as an alternative to grep.

echo -e "Test\nMatch" | awk '! /Test/'

Comparing Values

$2 == 124.47   # equality
$2 != 124.47   # inequality

$2 > 124.47    # greater than
$2 >= 124.47   # greater than or equal
$2 < 124.47    # smaller than
$2 <= 124.47   # smaller than or equal

$2 ~ /^10.$/   # regex match
$2 !~ /^10.$/  # regex negated match  -- this one might be new

Logical Operators

$1 ~ /^2015/ && $6 > 20000000  # and -- high volume in 2015
$6 < 1000000 || $6 > 20000000  # or  -- low or high volume
! /^2015/                      # not -- not in 2015

Built-in variables

For one file:

NR: number of records (lines) processed since AWK started
NF: the number of fields (columns) on the current line

On multiple files:

FNR: see NR above, but resets to 1 when it hits a new file
FILENAME: the name of the current file being processed

User defined variables

No need to declare variables with AWK, it is created on use.

Print line number before each result:

cat mydata.tsv | awk '/^2015-/ {count++; print count, $0}'

What do variables initialize to?

awk 'begin {print x + 2}'          # => 2
awk 'begin {x = x + 2; print x}'   # => 2
awk 'begin {print x}'              # => <blank> -- empty string, really

Initialise variables on the command line (with -v):

cat mydata.tsv | awk -v col=6 '{print $col}'

Also can do same for OFS (what separator to print out with):

cat netflix.tsv | awk -v OFS=, '{print $1, $6}'

Special Patterns - Print at Start & End

If you want to print a header or footer, you can use BEGIN or END. BEGIN is triggered before processing any line. END is triggered after all lines are processed.

Example:

cat mydata.tsv | awk 'BEGIN {print "REPORT for XXXX"} {print}'
cat mydata.tsv | awk '{print} END {print NR}'

Multiple Conditions

cat mydata.tsv | awk '/^2016-03-24/ {print} $4 == 96.43 {print}' # Will print both lines that match
# Can also do the same this way:
cat mydata.tsv | awk '/^2016-03-24/; $4 == 96.43'

If you condition matches two lines, you can seperate by order using next keyword:

cat mydata.tsv | awk '/^2016-03-24/ {print; next} $4 == 97.07 {print}'

Arrays

To sum volume by year:

cat mydata.csv | awk -F'[,-]' '{volume[$1] += $8} END { for(year in volume) print year, volume[year]}'

Explanation:

Split on commas or hyphens
accumulates volume ( $8) ina d i c t i o na ry, u s in g ye a r a s t h e k ey ($ 1)
Volume dictionary is created automatically, as we use it that way
At the END print each year and volume sum

Embedding in shell scripts

Same as above example in arrays, but in a script:

#!/bin/bash

cat "$@" | awk -F'[,-]' '

{volume[$1] += $8}

END {
  for(year in volume) {
    print year, volume[year]
  }
}
'

Explanation:

normal bash script
cat ”$@” will pass through filename or content as piped to the script
awk will keep going between quotes ’ to ’ allowing multiple lines
can make code more readable
can add a pipe after last quote(’) if needed for next program

Notes

Explorer

awk

Example Data

Read in data not tab separated

Print 2nd Column

Print 1st, 6th, 5th Column

Print CSV or another format

Match Text

Regex Match Text

Regex Match Text on a column

Everything but matched Text

Comparing Values

Logical Operators

Built-in variables

User defined variables

Special Patterns - Print at Start & End

Multiple Conditions

Arrays

Embedding in shell scripts

Graph View

Table of Contents

Backlinks