Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.

The passing of FiftyOneFifty

It is with deep sadness we announce that another of our hosts and friends Donald Grier, known to us as FiftyOneFifty, has passed away.

FiftyOneFifty's frat brother Randy Hall has written an lovely piece. The team at Linuxlugcast are preparing our own tribute if you want to contribute an audio file you can email Honkeymagoo or join the show.

Our thoughts go out to his friends and family at this difficult time.

hpr2091 :: Everyday Unix/Linux Tools for data processing

In this episode, I give some examples of common and uncommon tools for processing data files

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Tags: linux,unix,data,commandline.
Listen in ogg, spx, or mp3 format. | Comments (4)

Here are some of the tools I use to process and clean data from all manner of customers:


The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

See other episodes for great sed information. I like to remove DOS end of line and end of file characters:

sed -i 's/
//g' *.txt


sed -i 's/\r//g' *.txt

Command-line tools

  • ack
  • awk
  • detox
  • grep
  • pandoc
  • pdftotext -layout
  • sed
  • unix2dos and dos2unix
  • wget
  • curl

R libraries

  • RCurl
  • XML
  • rvest
  • tm
  • xlsx

Python libraries

Vim tricks

  • buffer searches (:vim /pattern/ ##)
  • Ack plugin
  • bufdo (:bufdo %s/pattern/replace/ge | update)

Other tools


Subscribe to the comments RSS feed.

Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp


Thanks this is a genius tool. Never heard of it before.

Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon

I love detox

detox -vr *

wow what an excellent tool.

Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss

Thanks for mentioning 'ack'

Wow! I had never encountered 'ack' before. It's amazing.

I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:

ack --shell --pager=more psql .

There's no other easy way to do this that I know of.

Thanks very much for pointing this one out.

Comment #4 posted on 2016-08-21T14:53:50Z by ivor


I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...

<< First, < Previous, Latest >>

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?