Stop Parsing (unstructured) Text
Or don’t, I’m not your father.
The situation
Sometimes, you want to get a very specific part of a command output.
The classic problem is to get the IP address of an interface.
Even today, the results to these queries on your favorite search engine
will most likely involve grep
, sed
, awk
and the like.
But what if I told you there’s a better way?
Using JSON for structuring data
Some tools offer outputting their data in JSON format.
That data can be processed with the venerable jq
utility.
The control you have over the processing is higher
and the chance that the data format of the output changes is lower.
Let’s look at what that means in practice.
Yes, I’m aware that we’re still parsing text…
The IP problem
Let’s say I want to get the current public IPv6 address from my network interface.
A good first step is to run the utility you’re using with JSON output
and pipe it into jq
with the identity operator filter.
This is fancy-talk for “just give the data back to me”.
You’ll get an error in cases where jq
can’t work with the data,
plus it’ll be formatted for better readability.
Let’s give that a try.
$ ip -6 -json address show scope global dev end0 | jq .
1[
2 {
3 "ifindex": 2,
4 "ifname": "end0",
5 "flags": [
6 "BROADCAST",
7 "MULTICAST",
8 "UP",
9 "LOWER_UP"
10 ],
11 "mtu": 1500,
12 "qdisc": "mq",
13 "operstate": "UP",
14 "group": "default",
15 "txqlen": 1000,
16 "altnames": [
17 "enx463a53cb99e5"
18 ],
19 "addr_info": [
20 {
21 "family": "inet6",
22 "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
23 "prefixlen": 64,
24 "scope": "global",
25 "dynamic": true,
26 "mngtmpaddr": true,
27 "noprefixroute": true,
28 "valid_life_time": 6726,
29 "preferred_life_time": 3126
30 },
31 {
32 "family": "inet6",
33 "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
34 "prefixlen": 64,
35 "scope": "global",
36 "deprecated": true,
37 "dynamic": true,
38 "mngtmpaddr": true,
39 "noprefixroute": true,
40 "valid_life_time": 1815,
41 "preferred_life_time": 0
42 },
43 {
44 "family": "inet6",
45 "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
46 "prefixlen": 64,
47 "scope": "global",
48 "dynamic": true,
49 "mngtmpaddr": true,
50 "noprefixroute": true,
51 "valid_life_time": 6726,
52 "preferred_life_time": 3126
53 },
54 {}
55 ]
56 }
57]
Looking at the data, we get an array (denoted by the outer square brackets)
containing a single object (denoted by the curly braces just inside them).
We are interested in the data contained in the "addr_info"
property,
which holds another array. Let’s unwrap that data first because that is very straightforward:
$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[]'
1{
2 "family": "inet6",
3 "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
4 "prefixlen": 64,
5 "scope": "global",
6 "dynamic": true,
7 "mngtmpaddr": true,
8 "noprefixroute": true,
9 "valid_life_time": 6726,
10 "preferred_life_time": 3126
11}
12{
13 "family": "inet6",
14 "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
15 "prefixlen": 64,
16 "scope": "global",
17 "deprecated": true,
18 "dynamic": true,
19 "mngtmpaddr": true,
20 "noprefixroute": true,
21 "valid_life_time": 1815,
22 "preferred_life_time": 0
23}
24{
25 "family": "inet6",
26 "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
27 "prefixlen": 64,
28 "scope": "global",
29 "dynamic": true,
30 "mngtmpaddr": true,
31 "noprefixroute": true,
32 "valid_life_time": 6726,
33 "preferred_life_time": 3126
34}
35{}
We get four objects, the last one being empty. Let’s remove that because I know from experience that it’ll create issues:
$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[] | select (length > 0)'
1{
2 "family": "inet6",
3 "local": "2001:9e8:473c:bd00:443a:53ff:fecb:99e5",
4 "prefixlen": 64,
5 "scope": "global",
6 "dynamic": true,
7 "mngtmpaddr": true,
8 "noprefixroute": true,
9 "valid_life_time": 6726,
10 "preferred_life_time": 3126
11}
12{
13 "family": "inet6",
14 "local": "2001:9e8:4738:2a00:443a:53ff:fecb:99e5",
15 "prefixlen": 64,
16 "scope": "global",
17 "deprecated": true,
18 "dynamic": true,
19 "mngtmpaddr": true,
20 "noprefixroute": true,
21 "valid_life_time": 1815,
22 "preferred_life_time": 0
23}
24{
25 "family": "inet6",
26 "local": "fdaa:66e:6af0:0:443a:53ff:fecb:99e5",
27 "prefixlen": 64,
28 "scope": "global",
29 "dynamic": true,
30 "mngtmpaddr": true,
31 "noprefixroute": true,
32 "valid_life_time": 6726,
33 "preferred_life_time": 3126
34}
A good start. Now to our actual criteria.
In my case, I don’t want the address starting with "fdaa"
(it’s a Unique Local Address), but the public one.
But the concept for the opposite will be very similar.
Also the address shouldn’t be deprecated.
And from the results, we’re interested in the value of the "local"
property.
The select
operation’s criteria can be chained with the and
operator
(and others, see https://jqlang.org/manual/#and-or-not).
$ ip -6 -json address show scope global dev end0 | jq '.[].addr_info.[] | select ((length > 0) and (.deprecated != true) and (.local | startswith("fdaa") | not)).local'
This gives the following:
"2001:9e8:473c:bd00:443a:53ff:fecb:99e5"
We’re almost done, because we want the string without quotation marks
because the shell will read them as a literal character.
We want the raw data. For this, jq
has --raw-output
.
Let’s test that we read the value correctly
by comparing to what an online service returns as our address:
$ [[ $(ip -6 -json address show scope global dev end0 | jq --raw-output '.[].addr_info.[] | select ((length > 0) and (.deprecated != true) and (.local | startswith("fdaa") | not)).local') == $(curl https://api6.ipify.org) ]] && echo "Same string"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 38 100 38 0 0 78 0 --:--:-- --:--:-- --:--:-- 78
Same string
Looking good!
Other use-cases
Quite some tools have support for JSON output,
for example journalctl
, ffprobe
and mediainfo
from the top of my head.
This isn’t a complete guide to using all of them, but let’s look at two more examples:
$ ffprobe -of json -show_streams big_buck_bunny_480p_h264.mov | jq --raw-output '.streams.[] | select (.codec_type == "video").bit_rate'
(lovely ffprobe information)
2899884
$ LASTCONNECT=$(journalctl -b -u murmur -o json | jq --raw-output -s '. | map(select (.MESSAGE | contains("New connection"))) | last | .__REALTIME_TIMESTAMP')
$ date -d @${LASTCONNECT:0:-6}
Sat May 3 07:01:17 PM UTC 2025
To find out what the last command does and why it’s different from the others so far is left as an exercise to the reader.
Conclusion
Using the JSON format, extracting data from CLI utilities is much more exact. I believe you should filter as early as possible (meaning already at the tool that outputs your JSON), but for more complex or otherwise impossible scenarios, this is a powerful tool.