Wrangling JSON and YAML: Practical jq and yq Techniques for Linux

Table of Contents

The Headache of Structured Data in the Terminal

I remember a 3:00 AM incident where I had to find a specific container image version buried in a 2,400-line Kubernetes deployment file. I tried using grep and awk, but YAML’s nested structure made it a nightmare. One tiny indentation change broke my regular expression entirely. If you have ever stared at a 50KB wall of minified JSON from a curl response, you know the frustration.

DevOps workflows and backend systems run on JSON and YAML. We deal with them in AWS CLI outputs, Docker Compose files, and GitHub Actions. Managing this data with standard text tools is like trying to eat soup with a fork. It is simply the wrong tool for the job.

Why Traditional Tools Fail

Standard utilities like grep, sed, and awk are line-oriented. They treat your file as a sequence of strings separated by newlines. However, JSON and YAML are tree-oriented structures. A key named “status” might appear 15 times in different nested objects. Running grep "status" gives you 15 lines of output with zero context about where they belong.

Minified files create another hurdle. JSON parsers don’t care about whitespace, but grep does. A single-line file and a 100-line indented file are identical to an application, yet they break manual parsing scripts. This lack of structural awareness is why manual regex often fails when a schema shifts slightly.

Choosing the Right Path

When you need to process these formats, you generally have three options:

Manual Scripting (Python/Node.js): You can write a script using json.load(). It works, but writing 10 lines of code just to check a single value is slow and inefficient.
Standard Linux Tools: As mentioned, grep and sed are brittle. They often fail on complex nested data.
Dedicated Parsers (jq and yq): These tools understand data syntax. They treat files as searchable objects rather than flat text.

Instead of writing a Python script that takes 2 seconds to initialize, jq (written in C) can process a 10MB JSON file in about 0.1 seconds. On a production Ubuntu server with limited 4GB RAM, this efficiency matters. It allows you to query data using a syntax that feels like CSS selectors.

Getting Set Up

Most distributions do not include these by default, but they are in the standard repositories. On Ubuntu or Debian, installation takes seconds:

sudo apt update
sudo apt install jq -y

# For yq, the version by Mike Farah is the standard
sudo wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq
sudo chmod +x /usr/bin/yq

Using jq for JSON Processing

The first step with jq is usually just making the data readable. If you receive a minified response from a GitHub API call, run:

curl -s https://api.github.com/repos/stedolan/jq/commits?per_page=1 | jq '.'

The '.' filter is the simplest tool in the box. It takes the input and outputs it exactly as is, but adds formatting and color coding.

Extracting Specific Values

Let’s get specific. If you have a JSON object for a user and only need their email, use dot notation. It drills down through the hierarchy quickly:

# Input: {"id": 1, "profile": {"email": "[email protected]"}}
echo '{"id": 1, "profile": {"email": "[email protected]"}}' | jq '.profile.email'

Arrays and Filtering

Arrays introduce more complexity, but jq handles them gracefully. Suppose you need the names of all users who are currently “active”:

# Sample: [{"name": "Alice", "active": true}, {"name": "Bob", "active": false}]
cat users.json | jq '.[] | select(.active == true) | .name'

The .[] operator iterates over the array. Then, select() filters the items, and .name pulls the final value.

Using yq for YAML Processing

Switching to YAML is easy because yq uses a very similar syntax. If you are checking a docker-compose.yml file to see which images your services use, run this command:

yq '.services[].image' docker-compose.yml

Editing Files in Place

Editing is where yq truly stands out. I use it in CI/CD pipelines to update image tags before a deployment. It avoids the mess of temporary files:

yq -i '.services.web.image = "my-app:v2.0.5"' docker-compose.yml

The -i flag modifies the file directly. This is much safer than using sed, which might accidentally replace the wrong string elsewhere in the file.

Cross-format Workflows

Sometimes you have YAML but your API requires JSON. You can pipe these tools together to bridge the gap. It makes different formats work as one.

# Convert YAML to JSON, then extract a specific value
yq -o=json eval config.yaml | jq '.database.port'

Sometimes the Terminal Isn’t Enough

Terminal tools are efficient, but sometimes a visual interface helps. This is true when debugging complex structures with colleagues. It also helps when validating snippets copied from messy logs. In those cases, I use ToolCraft.

Their JSON Formatter & Validator is excellent because it runs entirely in your browser. Since it is client-side, your production data never hits their servers. If you need to switch formats, the YAML ↔ JSON Converter is faster than looking up yq flags when you are in a hurry.

Final Thoughts

Learning jq and yq saves hours of manual searching. Start with basic key lookups. As you get comfortable, explore map() and reduce() functions. These tools allow you to treat infrastructure as code with actual precision. Once you stop treating JSON as plain text, your command line becomes much more powerful.