Laser's cool website :)

Stop Parsing (unstructured) Text

Or don’t, I’m not your father.

The situation

Sometimes, you want to get a very specific part of a command output. The classic problem is to get the IP address of an interface. Even today, the results to these queries on your favorite search engine will most likely involve grep, sed, awk and the like.

But what if I told you there’s a better way?

Using JSON for structuring data

Some tools offer outputting their data in JSON format. That data can be processed with the venerable jq utility. The control you have over the processing is higher and the chance that the data format of the output changes is lower. Let’s look at what that means in practice.

Yes, I’m aware that we’re still parsing text…

The IP problem

Let’s say I want to get the current public IPv6 address from my network interface. A good first step is to run the utility you’re using with JSON output and pipe it into jq with the identity operator filter. This is fancy-talk for “just give the data back to me”. You’ll get an error in cases where jq can’t work with the data, plus it’ll be formatted for better readability. Let’s give that a try.

$ ip -6 -json address show scope global dev end0 | jq .
 1[
 2  {
 3    "ifindex": 2,
 4    "ifname": "end0",
 5    "flags": [
 6      "BROADCAST",
 7      "MULTICAST",
 8      "UP",
 9      "LOWER_UP"
10    ],
11    "mtu": 1500,
12    "qdisc": "mq",
13    "operstate": "UP",
14    "group": "default",
15    "txqlen": 1000,
16    "altnames": [
17      "enx463a53cb99e5"
18    ],
19    "addr_info": [
20      {
21        "family": "inet6",
22        "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
23        "prefixlen": 64,
24        "scope": "global",
25        "dynamic": true,
26        "mngtmpaddr": true,
27        "noprefixroute": true,
28        "valid_life_time": 6726,
29        "preferred_life_time": 3126
30      },
31      {
32        "family": "inet6",
33        "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
34        "prefixlen": 64,
35        "scope": "global",
36        "deprecated": true,
37        "dynamic": true,
38        "mngtmpaddr": true,
39        "noprefixroute": true,
40        "valid_life_time": 1815,
41        "preferred_life_time": 0
42      },
43      {
44        "family": "inet6",
45        "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
46        "prefixlen": 64,
47        "scope": "global",
48        "dynamic": true,
49        "mngtmpaddr": true,
50        "noprefixroute": true,
51        "valid_life_time": 6726,
52        "preferred_life_time": 3126
53      },
54      {}
55    ]
56  }
57]

Looking at the data, we get an array (denoted by the outer square brackets) containing a single object (denoted by the curly braces just inside them). We are interested in the data contained in the "addr_info" property, which holds another array. Let’s unwrap that data first because that is very straightforward:

$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[]'
 1{
 2  "family": "inet6",
 3  "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
 4  "prefixlen": 64,
 5  "scope": "global",
 6  "dynamic": true,
 7  "mngtmpaddr": true,
 8  "noprefixroute": true,
 9  "valid_life_time": 6726,
10  "preferred_life_time": 3126
11}
12{
13  "family": "inet6",
14  "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
15  "prefixlen": 64,
16  "scope": "global",
17  "deprecated": true,
18  "dynamic": true,
19  "mngtmpaddr": true,
20  "noprefixroute": true,
21  "valid_life_time": 1815,
22  "preferred_life_time": 0
23}
24{
25  "family": "inet6",
26  "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
27  "prefixlen": 64,
28  "scope": "global",
29  "dynamic": true,
30  "mngtmpaddr": true,
31  "noprefixroute": true,
32  "valid_life_time": 6726,
33  "preferred_life_time": 3126
34}
35{}

We get four objects, the last one being empty. Let’s remove that because I know from experience that it’ll create issues:

$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[] | select (length > 0)'
 1{
 2  "family": "inet6",
 3  "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
 4  "prefixlen": 64,
 5  "scope": "global",
 6  "dynamic": true,
 7  "mngtmpaddr": true,
 8  "noprefixroute": true,
 9  "valid_life_time": 6726,
10  "preferred_life_time": 3126
11}
12{
13  "family": "inet6",
14  "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
15  "prefixlen": 64,
16  "scope": "global",
17  "deprecated": true,
18  "dynamic": true,
19  "mngtmpaddr": true,
20  "noprefixroute": true,
21  "valid_life_time": 1815,
22  "preferred_life_time": 0
23}
24{
25  "family": "inet6",
26  "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
27  "prefixlen": 64,
28  "scope": "global",
29  "dynamic": true,
30  "mngtmpaddr": true,
31  "noprefixroute": true,
32  "valid_life_time": 6726,
33  "preferred_life_time": 3126
34}

A good start. Now to our actual criteria. In my case, I don’t want the address starting with "fdaa" (it’s a Unique Local Address), but the public one. But the concept for the opposite will be very similar. Also the address shouldn’t be deprecated. And from the results, we’re interested in the value of the "local" property. The select operation’s criteria can be chained with the and operator (and others, see https://jqlang.org/manual/#and-or-not).

$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[] | select ((length > 0) and (.deprecated != true) and (.local | startswith("fdaa") | not)).local'

This gives the following:

"2001:9e8:473c:bd00:443a:53ff:fecb:99e5"

We’re almost done, because we want the string without quotation marks because the shell will read them as a literal character. We want the raw data. For this, jq has --raw-output. Let’s test that we read the value correctly by comparing to what an online service returns as our address:

$ [[ $(ip -6 -json address show scope global dev end0 | jq --raw-output '.[].addr_info.[] | select ((length > 0) and (.deprecated != true) and (.local | startswith("fdaa") | not)).local') == $(curl https://api6.ipify.org) ]] && echo "Same string"
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100    38  100    38    0     0     78      0 --:--:-- --:--:-- --:--:--    78
Same string

Looking good!

Other use-cases

Quite some tools have support for JSON output, for example journalctl, ffprobe and mediainfo from the top of my head. This isn’t a complete guide to using all of them, but let’s look at two more examples:

$ ffprobe -of json -show_streams big_buck_bunny_480p_h264.mov  | jq --raw-output '.streams.[] | select (.codec_type == "video").bit_rate'
(lovely ffprobe information)
2899884
$ LASTCONNECT=$(journalctl -b -u murmur -o json | jq --raw-output -s '. | map(select (.MESSAGE | contains("New connection"))) | last | .__REALTIME_TIMESTAMP')
$ date -d @${LASTCONNECT:0:-6}
Sat May  3 07:01:17 PM UTC 2025

To find out what the last command does and why it’s different from the others so far is left as an exercise to the reader.

Conclusion

Using the JSON format, extracting data from CLI utilities is much more exact. I believe you should filter as early as possible (meaning already at the tool that outputs your JSON), but for more complex or otherwise impossible scenarios, this is a powerful tool.

#linux