Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


From Wed 20th December 2017 all the media from the HPR feeds will be served via a redirect towards Archive.org. Please notify admin@HPR of any issues.

Our hosting is kindly provided by Josh from AnHonestHost.com. We would appreciate it if you could donate to help reduce his costs in funding the hosting.
He is also accepting bitcoins to 1KsxJr9HtsdaUeU7yaV9bk9bQi21UPBtUq

hpr2238 :: Gnu Awk - Part 6

Looking more deeply into Awk's regular expressions

<< First, < Previous, Latest >>

Host Image
Hosted by Dave Morriss on 2017-03-01 is flagged as Explicit and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (0)

Part of the series: Learning Awk

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

Gnu Awk - Part 6

Introduction

This is the sixth episode of the “Learning Awk” series that b-yeezi and I are doing.

Recap of the last episode

Regular expressions

In the last episode we saw regular expressions in the ‘pattern’ part of a ‘pattern {action}’ sequence. Such a sequence is called a ‘RULE’, (as we have seen in earlier episodes).

$1 ~ /p[elu]/ {print $0}

Meaning: If field 1 contains a ‘p’ followed by one of ‘e’, ‘l’ or ‘u’ print the whole line.

$2 ~ /e{2}/ {print $0}

Meaning: If field 2 contains two instances of letter ‘e’ in sequence, print the whole line.

It is usual to enclose the regular expression in slashes, which make it a regexp constant.

We had a look at many of the operators used in regular expressions in episode 5. Unfortunately, some small errors crept into the list of operators mentioned in that episode. These are incorrect:

  • \A (beginning of a string)
  • \z (end of a string)
  • \b (on a word boundary)

The first two operators exist, in languages like Perl and Ruby, but not in GNU Awk.

For the ‘\b’ sequence the GNU manual says:

In other GNU software, the word-boundary operator is ‘\b’. However, that conflicts with the awk language’s definition of ‘\b’ as backspace, so gawk uses a different letter. An alternative method would have been to require two backslashes in the GNU operators, but this was deemed too confusing. The current method of using ‘\y’ for the GNU ‘\b’ appears to be the lesser of two evils.

The corrected list of operators is discussed later in this episode.

Replacement

Last episode we saw the built-in functions that use regular expressions for manipulating strings. These are sub, gsub and gensub. Regular expressions are used in other functions but we will look at them later.

We will be looking at sub, gsub and gensub in more detail in this episode.

Long notes

I have written out a set of longer notes for this episode and these are available here.


Comments

Subscribe to the comments RSS feed.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to
record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?