Deep pcap Analysis with tshark on Linux: Filter Protocols, Extract Fields, and Generate Traffic Statistics

You’ve got a pcap file from tcpdump or Wireshark. Now you need to answer something from it. If you’re SSHed into a remote server or working inside a CI pipeline, the Wireshark GUI isn’t an option. tshark fills that gap — it’s the CLI front-end to Wireshark’s full dissection engine. Once you learn the field extraction syntax, raw packet captures become structured data you can query directly from the shell.

I’ve used this on production incident investigations — parsing captures from multiple nodes simultaneously, including files well over 100MB. It handles that without complaint.

Table of Contents

Approach Comparison: tshark vs tcpdump vs Scripted Dissection

When analyzing an existing pcap file, you have three realistic paths:

Option 1: Re-read with tcpdump

Replaying a pcap through tcpdump -r file.pcap is lightweight and available everywhere. But tcpdump’s output is line-oriented text with limited protocol awareness — good for a quick eyeball, not useful for extracting specific fields or computing per-protocol byte counts.

Option 2: Parse raw bytes with Python (scapy / dpkt)

scapy gives you full Python control. Iterate every packet, build any statistic you want. The downside is real: setup overhead, sluggish performance on files above 50MB, and you end up reimplementing protocol parsing that Wireshark already handles correctly — including edge cases you’d never think of.

Option 3: tshark with field extraction

tshark ships with Wireshark and understands 3000+ protocols out of the box. Filter with display filters (same syntax as Wireshark’s GUI), extract named fields into tab-separated columns, and pipe the output to awk, sort, or Python. It’s the fastest path from a pcap file to a concrete answer.

Pros and Cons

tshark strengths

Full Wireshark dissection engine — correct protocol parsing, including TLS handshakes, HTTP/2, DNS over TCP, etc.
Display filter syntax is expressive and well-documented
Field names are stable and predictable (ip.src, tcp.dstport, http.request.uri)
Built-in statistics commands (-z flag) that skip external processing entirely
Works on live interfaces and existing pcap files with the same flags

tshark weaknesses

Large binary footprint — Wireshark pulls in Qt and dozens of dissector libraries
Memory climbs on very large captures when building conversation tables
The -z statistics flags have inconsistent syntax across major versions
Not installed by default on minimal server images

When to pick tcpdump instead

Spot-checking a capture or confirming a packet reached a host? tcpdump is faster to type and always present. Reach for tshark when you have a specific question: “How many DNS queries returned NXDOMAIN?” or “Which IPs sent the most bytes?”

Recommended Setup

Install tshark on the analysis machine — you don’t need it on the capture node, just copy the pcap file over:

# Debian / Ubuntu
sudo apt install tshark -y

# RHEL / Rocky / AlmaLinux
sudo dnf install wireshark-cli -y

# Arch
sudo pacman -S wireshark-cli

The Debian installer asks whether non-root users can capture packets. Say yes, then add yourself to the group:

sudo usermod -aG wireshark $USER
newgrp wireshark

Check which version you have — field names occasionally shift between major releases:

tshark --version | head -1

Implementation Guide

Step 1: Inspect a pcap file quickly

Before filtering, start with a quick headcount:

# Count total packets
tshark -r capture.pcap 2>/dev/null | wc -l

# Print first 20 packets with protocol summary
tshark -r capture.pcap -c 20

The 2>/dev/null suppresses the interface/version banner that clutters scripts. Add it to every tshark command you’re piping.

Step 2: Filter by protocol with display filters

Display filters share the same syntax as Wireshark’s filter bar. The key distinction from tcpdump BPF syntax: display filters operate on decoded protocol fields, not raw bytes. That means you can filter on dns.flags.rcode instead of trying to match bytes at specific offsets.

# Only DNS traffic
tshark -r capture.pcap -Y 'dns' 2>/dev/null

# HTTP requests (not responses)
tshark -r capture.pcap -Y 'http.request' 2>/dev/null

# TCP RST packets — useful for detecting connection rejections
tshark -r capture.pcap -Y 'tcp.flags.reset == 1' 2>/dev/null

# DNS queries that returned NXDOMAIN
tshark -r capture.pcap -Y 'dns.flags.rcode == 3' 2>/dev/null

# Traffic between two specific hosts
tshark -r capture.pcap -Y 'ip.addr == 10.0.1.5 and ip.addr == 10.0.1.1' 2>/dev/null

Step 3: Extract specific fields into columns

The -T fields flag combined with -e fieldname gives you tab-separated output ready for awk, sort, or Python. No manual parsing — just column extraction from decoded protocol data.

# Extract source IP, destination IP, and destination port for every TCP packet
tshark -r capture.pcap -Y 'tcp' -T fields \
  -e ip.src \
  -e ip.dst \
  -e tcp.dstport \
  2>/dev/null

# Extract DNS query names and response codes
tshark -r capture.pcap -Y 'dns' -T fields \
  -e dns.qry.name \
  -e dns.flags.rcode \
  2>/dev/null

# Extract HTTP method, host, and URI for every HTTP request
tshark -r capture.pcap -Y 'http.request' -T fields \
  -e http.request.method \
  -e http.host \
  -e http.request.uri \
  2>/dev/null

Don’t know the exact field name? In Wireshark, hover over a field in the packet details pane — the name appears in the status bar at the bottom. Or search the field registry directly:

tshark -G fields 2>/dev/null | grep -i 'user.agent'

Step 4: Build traffic statistics with -z

The -z flag invokes tshark’s built-in statistics modules. They’re faster than piping field output to external tools because everything runs in a single pass — no intermediate output, no extra shell processes.

# Protocol hierarchy — percentage breakdown by protocol
tshark -r capture.pcap -z io,phs -q 2>/dev/null

# Top talkers by IP (bytes transferred)
tshark -r capture.pcap -z conv,ip -q 2>/dev/null | sort -k 6 -rn | head -20

# DNS query/response summary
tshark -r capture.pcap -z dns,tree -q 2>/dev/null

# HTTP request methods and response codes
tshark -r capture.pcap -z http,tree -q 2>/dev/null

# TCP connection summary (SYN/FIN/RST counts per pair)
tshark -r capture.pcap -z conv,tcp -q 2>/dev/null

The -q flag suppresses per-packet output so you only see the statistics table.

Step 5: Combine field extraction with Unix tools for custom statistics

Built-in stats don’t cover every question. When you need a custom answer, pipe field output through standard Unix tools:

# Count DNS queries per domain, sorted by frequency
tshark -r capture.pcap -Y 'dns.flags.response == 0' -T fields \
  -e dns.qry.name 2>/dev/null \
  | sort | uniq -c | sort -rn | head -20

# Find top destination ports being contacted
tshark -r capture.pcap -Y 'tcp.flags.syn == 1 and tcp.flags.ack == 0' \
  -T fields -e tcp.dstport 2>/dev/null \
  | sort | uniq -c | sort -rn

# Extract all HTTP User-Agent strings (useful for detecting bots)
tshark -r capture.pcap -Y 'http.user_agent' -T fields \
  -e http.user_agent 2>/dev/null \
  | sort -u

Step 6: Export filtered packets to a new pcap

Sometimes you need a slice of the capture — to share with a colleague or feed into another tool without moving a 500MB file:

# Save only DNS traffic to a new file
tshark -r capture.pcap -Y 'dns' -w dns_only.pcap 2>/dev/null

# Save traffic to/from a specific host
tshark -r capture.pcap -Y 'ip.addr == 192.168.1.100' -w host_traffic.pcap 2>/dev/null

Practical script: quick pcap summary

Here’s a shell script I keep around for incident triage. Drop a pcap on it and get an immediate overview in under 10 seconds:

#!/bin/bash
# Usage: ./pcap-summary.sh capture.pcap
FILE="${1:-capture.pcap}"

echo "=== Packet count ==="
tshark -r "$FILE" 2>/dev/null | wc -l

echo "=== Protocol breakdown ==="
tshark -r "$FILE" -z io,phs -q 2>/dev/null

echo "=== Top 10 talkers ==="
tshark -r "$FILE" -z conv,ip -q 2>/dev/null \
  | tail -n +8 | sort -k6 -rn | head -10

echo "=== DNS failures (NXDOMAIN) ==="
tshark -r "$FILE" -Y 'dns.flags.rcode == 3' -T fields \
  -e dns.qry.name 2>/dev/null | sort | uniq -c | sort -rn

echo "=== TCP RST sources ==="
tshark -r "$FILE" -Y 'tcp.flags.reset == 1' -T fields \
  -e ip.src -e tcp.srcport 2>/dev/null | sort | uniq -c | sort -rn | head -10

On high-throughput services, pcap files grow into the gigabytes fast. Two things make the biggest difference: keep the display filter as tight as possible (the narrower, the less tshark has to walk through), and always pair -z with -q. Without -q, per-packet output floods your terminal alongside the statistics table.

A Few Things to Watch Out For

Multi-valued fields: A single packet can return multiple values for dns.qry.name if it contains multiple questions. tshark separates them with commas by default — use -E separator=\n if you need one value per line.
Encrypted traffic: tshark can decrypt TLS if you supply the session keys file: -o tls.keylog_file:/path/to/sslkeylog.log. Set SSLKEYLOGFILE in your browser or application to generate this file.
Large files: For captures over 1GB, split with editcap -c 100000 big.pcap split/out.pcap first, then process each chunk in parallel.