Intro to jq

General usage

$ jq --help
jq - commandline JSON processor [version 1.4]
Usage: jq [options] <jq filter> [file...]
For a description of the command line options and
how to write jq filters (and why you might want to)
see the jq manpage, or the online documentation at
http://stedolan.github.com/jq

Try looking at raw json

cat data.json
{"took":23,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":614,"max_score":1,"hits":[…]}}

Pretty ugly, right? Let's see if we can't make it more readable.

Basic filters

The absolute simplest (and least interesting) filter is .

It simply takes its input and produces it unchanged as output.

jq . < data.json
{
  "took": 23,
  "timed_out": false,
  "_shards": { … },
  "hits": {
    "total": 614,
    "max_score": 1,
    "hits": [ … ]
  }
}

Wow, that's a lot of data! What if we just want to know what the top-level keys are? Built-in function "keys" to the rescue.

jq 'keys' < data.json
[
  "_shards",
  "hits",
  "timed_out",
  "took"
]

Much better! We can see what keys there are, so let's see what's actually in the _shards key. The simplest useful filter is .foo

When given a JSON object (aka dictionary or hash) as input, it produces the value at the key "foo", or null if there's none present.

jq '._shards' < data.json
{
  "total": 5,
  "successful": 5,
  "failed": 0
}

Object construction, array querying

We can create a custom object by using the same object construction syntax as JavaScript: { key: value }. The difference here is that the value is interpreted as an expression (you must wrap the key in parentheses to have it evaluated):

jq '{ shards_successful: ._shards.successful, hits_total: .hits.total }' < data.json
{
  "shards_successful": 5,
  "hits_total": 614
}

Array slicing takes a start index (inclusive) and an end index (exclusive):

jq '{ total: .hits.total, hits: .hits.hits[0:5] }' < data.json
{
  "total": 614,
  "hits": [
    {
      "_index": "exampledata",
      "_type": "datapoint",
      "_id": "AU7-aFXJPJ54-NJYtvNb",
      "_score": 1,
      "_source": {
        "id": 174,
        "date": "2014-12-14T03:49:00Z",
        "value": 2
      }
    },
    { … }
  ]
}

Not bad, but let's clean up those hits to make them more readable. Like bash scripts, we can "pipe" filters and functions together, making the output of one into the input of the next. Also introducing the "map" built-in function:

jq '{ total: .hits.total, hits: .hits.hits[0:5]|map(._source) }' < data.json
{
  "total": 614,
  "hits": [
    {
      "id": 174,
      "date": "2014-12-14T03:49:00Z",
      "value": 2
    },
    { … }
  ]
}

Pretty good! Though perhaps we don't want to leak those object ids:

jq '{ total: .hits.total, hits: .hits.hits[0:5]|map(._source|del(.id)) }' < data.json
{
  "total": 614,
  "hits": [
    {
      "date": "2014-12-14T03:49:00Z",
      "value": 2
    },
    { … }
  ]
}

If you want to experiment without installing jq, there's a playground available, but there are pre-built binaries available for Linux, OS X, Solaris, and Windows. It's also available in popular package managers such as Homebrew for OS X, and Aptitude for Ubuntu and Debian Linux.

Intro to jq
Mat Gadd