Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at


hpr2698 :: XSV for fast CSV manipulations - Part 1

Written in Rust, xsv is my new favorite tool for manipulating csv files

<< First, < Previous, , Latest >>

Thumbnail of Mr. Young
Hosted by Mr. Young on 2018-12-05 is flagged as Clean and is released under a CC-BY-SA license.
CSV, XSV. 3.
The show is available on the Internet Archive at: https://archive.org/details/hpr2698

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:30:37

general.

XSV for fast CSV manipulations - Part 1: Basic Usage

https://github.com/BurntSushi/xsv

Introduction

xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable:

  1. Simple tasks should be easy.
  2. Performance trade offs should be exposed in the CLI interface.
  3. Composition should not come at the expense of performance.

We will be using the CSV file provided in the documentation.

Commands covered in this episode

  • count - Count the rows of CSV data
  • headers - Show the headers of CSV data, or show the intersection of all headers between many CSV files
  • index - Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file.
  • frequency - Build frequency tables of each column in CSV data.
  • stats - Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, median, range, etc.)
  • sort - Sort CSV data
  • select - Select or re-order columns from CSV data.
  • slice - Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
  • search - Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows.
  • table - Show aligned output of any CSV data using elastic tabstops.
  • flatten - A flattened view of CSV records. Useful for viewing one record at a time.

Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2018-12-05 14:58:29 by Mike Ray

Good timing

What a brilliant tool and a great show.

This has come at a good time for me as I am deep into a large screen-scraping project which is yielding complex CSV files with many columns.

Like b-yeezi I frequently get involved with textual data manipulation in all kinds of formats. I did not know about xsv and have often had to guess the ordinal position of specific columns, and have to do all kinds of slicing and dicing operations.

Not easy at the best of times, and time consuming. All the more so if you can't easily guess the column position because you can't see.

So the timing of this show is great for me. And this is real hacking.

Comment #2 posted on 2018-12-16 20:24:11 by Dave Morriss

This is a great bit of software

Thanks for this.

I just listened to the show and immediately thought of several applications of xsv in what I do. I have installed it and am learning my way around it. Definitely a great addition to the toolkit.

Comment #3 posted on 2018-12-20 18:12:22 by Klaatu

Neato

I don't encounter CSV all that often, but this is a great tool to know about. Thanks!

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the letter P in HPR stand for?
Are you a spammer?
What is the HOST_ID for the host of this show?
What does HPR mean to you?