Intro to `qsv count`¶

In this notebook we'll be covering examples of using qsv's count command.

This notebook uses qsv, an open-source CSV data wrangling toolkit available as a command line tool. You may learn more at https://github.com/jqnatividad/qsv.

Table of Contents¶

Setup

1.1 Environment Notes
1.2 Downloading qsv
1.3 Downloading a CSV Data Set

Let's Use qsv count!

2.1 Option: --help
2.2 Running qsv count On Our CSV
2.3 Option: --human-readable, -H
2.4 Option: --no-headers, -n
2.5 Option: --width
- 2.5.1 Understanding the --width Option's Output

Bash Use Cases

3.1 String Interpolation

Python Use Cases

4.1 Running qsv count on User's Input File Path

Conclusion

Part 1: Setup¶

1.1 Environment Notes¶

The notebook was run on Google Colab based on an Ubuntu 22.04 LTS environment, so you may need to modify the commands if you're running on a different OS (i.e. Windows) or missing any dependencies.
Commands are prepended by an exclamation point ! in this Jupyter notebook environment to execute them, but should be removed when using Bash on a terminal.

1.2 Downloading qsv¶

First, let's download qsv into our notebook from the releases page. We'll use qsv 0.111.0:

In [25]:

# Downloading the .zip file that contains qsv
!curl -LO https://github.com/jqnatividad/qsv/releases/download/0.111.0/qsv-0.111.0-x86_64-unknown-linux-gnu.zip
# Unzipping the .zip file into a folder
!unzip -o qsv-0.111.0-x86_64-unknown-linux-gnu.zip -d qsv-0.111.0-files
# Moving the qsv binary file from the folder into /bin to use the qsv command anywhere on our system
!cp qsv-0.111.0-files/qsv /bin

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 73.3M  100 73.3M    0     0  42.5M      0  0:00:01  0:00:01 --:--:-- 76.7M
Archive:  qsv-0.111.0-x86_64-unknown-linux-gnu.zip
  inflating: qsv-0.111.0-files/README  
  inflating: qsv-0.111.0-files/qsv   
  inflating: qsv-0.111.0-files/qsv_glibc-2.31  
  inflating: qsv-0.111.0-files/qsv_glibc-2.31_rust_version_info.txt  
  inflating: qsv-0.111.0-files/qsv_nightly  
  inflating: qsv-0.111.0-files/qsv_nightly_rust_version_info.txt  
  inflating: qsv-0.111.0-files/qsvdp  
  inflating: qsv-0.111.0-files/qsvdp_glibc-2.31  
  inflating: qsv-0.111.0-files/qsvdp_nightly  
  inflating: qsv-0.111.0-files/qsvlite  
  inflating: qsv-0.111.0-files/qsvlite_glibc-2.31  
  inflating: qsv-0.111.0-files/qsvlite_nightly

1.3 Downloading a CSV Data Set¶

Here is the main CSV data set I'll be using:

Data set	Source	Download Link	Rounded size
Indicators of Anxiety or Depression Based on Reported Frequency of Symptoms During Last 7 Days	https://catalog.data.gov/dataset/indicators-of-anxiety-or-depression-based-on-reported-frequency-of-symptoms-during-last-7-	https://data.cdc.gov/api/views/8pt5-q6wp/rows.csv?accessType=DOWNLOAD	2.1 MB

Let's download the data set into our notebook as data.csv.

In [26]:

# Downloading the .csv file as data.csv
!curl https://data.cdc.gov/api/views/8pt5-q6wp/rows.csv?accessType=DOWNLOAD -o data.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2144k    0 2144k    0     0  1894k      0 --:--:--  0:00:01 --:--:-- 1895k

Part 2: Let's Use `qsv count`!¶

Time to explore with qsv count! Let's start by simply getting the help message for qsv count.

2.1 Option: `--help`¶

As with any qsv command, we'll use the --help option to get the help message:

In [27]:

!qsv count --help

Prints a count of the number of records in the CSV data.

Note that the count will not include the header row (unless --no-headers is
given).

For examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_count.rs.

Usage:
    qsv count [options] [<input>]
    qsv count --help

count options:
    -H, --human-readable   Comma separate row count.
    --width                Also return the length of the longest record.
                           The count and width are separated by a semicolon.

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will be included in
                           the count.

2.2 Running `qsv count` On Our CSV¶

We may start by getting the default output for qsv count by running it on our data set. This should get us the number of non-header records (rows) in our CSV:

In [28]:

!qsv count data.csv

That's 13,671 non-header rows of data!

2.3 Option: `--human-readable`, `-H`¶

With the --human-readable option (or its alias -H), qsv should automatically add commas in the appropriate places to help us read the number better.

In [29]:

!qsv count data.csv --human-readable

13,671

2.4 Option: `--no-headers`, `-n`¶

What if we we want to also include the header row in the count (therefore counting all the rows in the CSV)?

We can use the --no-headers (or -n) option to include the header row in the count.

We should expect 13,672 as our output, including the commas by also using -H.

In [30]:

!qsv count data.csv -n -H

13,672

2.5 Option: `--width`¶

There's one more option that you might not expect.

What if we wanted to find out how long the longest row is in our data set, based on the number of characters it has?

The --width option should return the length of the longest record. The count and width are separated by a semicolon.

In [31]:

!qsv count data.csv --width

13671;237

2.5.1 Understanding the `--width` Option's Output¶

The longest record has 237 characters. But you may have some questions about this width output:

Does the width include the header if we don't specify the --no-headers option?
Does the width include the commas within the rows that separate the field values?

Let's find out with this simple CSV file we'll name sample.csv:

In [32]:

# Write our data to sample.csv
!echo 'letter,number' > sample.csv
!echo 'alpha,13' >> sample.csv
!echo 'beta,24' >> sample.csv
# Display the data from sample.csv
!cat sample.csv

letter,number
alpha,13
beta,24

First let's use --width without --no-headers.

Our initial assumption is that if the headers are not included then we should get 8 as the width because there are 8 total characters in the row alpha,13 when you also include the comma ,.

In [33]:

!qsv count sample.csv --width

2;9

Hmm... We get 9. Why is that?

In our CSV data, there's a sort of hidden character at the end of each row: the newline character \n. This is included as a character in the width for our rows, so we simply add 1 to our estimate of 8. We can also see that the comma is included in the width output.

To further verify both of these claims, let's run the command with --no-headers to try and include the header row in the width output. Based on what we've learned so far, we can expect that all characters including the commas , between field values and the newline \n at the end of the longest row are included in the width output. So for the header row letter,number we should expect a width of 12 + 1 + 1 = 14:

In [34]:

!qsv count sample.csv --width --no-headers

3;14

Awesome! Now that you have a better understanding of qsv count, try it out for yourself!

Part 3: Bash Use Cases¶

3.1 String Interpolation¶

Let's say I want to write a sentence that dynamically includes the count of a CSV file within it. For example, I want to print out:

There are 1,000,000 non-header rows of data in the data set!

The 1,000,000 is arbitrary, that is, it should be the output from using qsv count on a CSV file. Here's a Bash script using the echo command we can use to achieve this:

In [35]:

!echo "There are $(qsv count data.csv -H) non-header rows of data in the data set!"

There are 13,671 non-header rows of data in the data set!

Part 4: Python Use Cases¶

4.1 Running `qsv count` on User's Input File Path¶

Let's say we want to run a Python script where the user can simply enter the path to the CSV file (in our case we can just write data.csv) and then get the output of running qsv count on it. We can use the subprocess module to run qsv commands and print the output. Here's a sample script with comments to help understand how it works:

In [36]:

import subprocess

# Get user input for the CSV path
csv_path = input('Enter the path to your CSV file: ')

# Run qsv count on the CSV file with the -H option
command = ["qsv", "count", csv_path, "-H"]

# Get the qsv count output
subprocess_output = subprocess.run(command, capture_output=True)
# stdout - standard stream of output from our subprocess that runs count
# decode - convert the stdout output from bytes to string
# strip - remove any spaces/newline characters in the output
count = subprocess_output.stdout.decode().strip()

# Print the output of qsv count within a sentence
print(f"There are {count} non-header rows of data in the data set!")

Enter the path to your CSV file: data.csv
There are 13,671 non-header rows of data in the data set!

With this script we can now simply provide the file path and then get the qsv count output for it! Of course you may expand on this script with improvements such as:

Verifying the file exists and is a CSV.
Error handling with a try/except block and printing stderr.

5. Conclusion¶

In this notebook we covered example usage of qsv count for tallying the number of rows in a CSV file. We discussed all the options that are available for qsv count, and we also went further to discover how qsv count can be integrated in Bash and Python.

Intro to qsv count¶

Table of Contents¶

Part 1: Setup¶

1.1 Environment Notes¶

1.2 Downloading qsv¶

1.3 Downloading a CSV Data Set¶

Part 2: Let's Use qsv count!¶

2.1 Option: --help¶

2.2 Running qsv count On Our CSV¶

2.3 Option: --human-readable, -H¶

2.4 Option: --no-headers, -n¶

2.5 Option: --width¶

2.5.1 Understanding the --width Option's Output¶