Stop Shipping Vulnerabilities: Automating SAST with Semgrep

Security tutorial - IT technology blog
Security tutorial - IT technology blog

The 2 AM Wake-up Call

I once spent a frantic early morning fending off a brute-force attack on a production server. That experience changed my perspective: reactive security is a losing game. Waiting for an alert to trigger means you’ve already lost ground.

You have to bake defenses into the code itself. Industry data from NIST suggests that fixing a vulnerability in production can cost 30 to 100 times more than catching it during development. For a small team, that’s the difference between a minor patch and a week of lost productivity.

I’ve watched teams accidentally push AWS secret keys or SQL injection flaws because they were sprinting toward a deadline. Manual code reviews are essential, but humans are fallible, especially at 4 PM on a Friday. This is where Static Application Security Testing (SAST) fills the gap. I’ve found Semgrep to be the most effective tool for stopping these leaks before they reach your main branch.

Why Flaws Slip Through the Cracks

Security debt usually isn’t a result of negligence. It happens because the environment moves faster than the checks. I’ve identified three specific bottlenecks that cause most leaks:

  • Subjective Reviews: One developer might catch a lack of input validation, while another only checks for naming conventions. Without automation, your security posture depends entirely on who is reviewing the PR.
  • Framework Blind Spots: Modern stacks are deep. It is easy to forget that a specific method, like dangerouslySetInnerHTML in React or mark_safe in Django, opens a direct path for XSS.
  • The “Pre-Release” Bottleneck: Running heavy scans only on staging creates a massive pile of bugs right when you want to ship. It turns security into a roadblock rather than a feature.

The Feedback Loop Problem

Developers don’t need more 500-page PDF reports. Legacy security tools often act as “black boxes” that take three hours to run and return 40% false positives. When a tool cries wolf, engineers stop listening. To be effective, security feedback must be fast, transparent, and live where the code lives—inside GitHub, GitLab, or Bitbucket.

How Semgrep Compares to Legacy Tools

I tested several scanners before landing on Semgrep. Most traditional tools treat code like flat text, but Semgrep understands the code’s structure by looking at the Abstract Syntax Tree (AST). This allows it to recognize that exec(user_input) is a risk even if the variables are defined dozens of lines apart.

Feature Manual Review Legacy SAST (e.g., SonarQube) Semgrep (Modern SAST)
Scan Speed Days 20+ Minutes < 2 Minutes
Rule Language Human Brain Proprietary/Java Simple YAML
Noise Level Moderate High (False Positives) Low (Highly Targeted)
Setup Time Instant Hours/Days 5 Minutes

Semgrep can process over 1,000 security rules against 100,000 lines of code in seconds. That speed makes it viable for every single commit, not just nightly builds.

Integrating Semgrep into GitHub Actions

The goal is to make Semgrep a “gatekeeper.” If a PR introduces a critical flaw, the build should fail immediately. Here is the workflow I use for my production environments.

1. Local Testing

Catching errors on your machine is faster than waiting for a CI runner. Install Semgrep via Homebrew or Pip to get started:

# Install via pip
python3 -m pip install semgrep

# Run a scan using community-vetted rules
semgrep scan --config auto

The --config auto flag is a lifesaver. It automatically detects your project’s languages and pulls the relevant security policies from the Semgrep Registry.

2. The GitHub Actions Workflow

Drop this configuration into .github/workflows/semgrep.yml. It triggers on every pull request to ensure no new vulnerabilities enter your main branch.

name: Security Scan

on:
  pull_request:
    branches: ["main"]
  push:
    branches: ["main"]

jobs:
  semgrep_scan:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Semgrep
        run: semgrep scan --config p/default --error

      - name: Generate Security Report
        run: semgrep scan --config p/default --sarif --output=semgrep.sarif

      - name: Upload to GitHub Security Tab
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif
        if: always()

The --error flag is the most important part of this script. It forces a non-zero exit code when high-severity issues are found, effectively blocking the merge until the code is fixed.

Custom Rules for Team Standards

Every team has specific “don’ts.” Maybe you want to ban MD5 hashing because it’s prone to collision attacks, or perhaps you want to ensure no one uses a specific internal library incorrectly. You can write a custom rule in minutes.

Create semgrep-rules.yaml:

rules:
  - id: ban-insecure-hashing
    patterns:
      - pattern: hashlib.md5(...)
    message: "MD5 is insecure. Use SHA-256 for all hashing requirements."
    languages: [python]
    severity: ERROR

This turns your internal security policy into executable code that never forgets a rule.

Managing False Positives

Security tools aren’t psychics. Sometimes a flagged exec() call is genuinely necessary for a specific administrative task. Instead of disabling the scanner, use localized ignores to maintain the audit trail.

# nosemgrep: python.lang.security.audit.dangerous-exec
exec(internal_config_string) # Validated during boot process

I always insist that my team includes a comment explaining the # nosemgrep tag. This makes future security audits much faster.

Building a Security Culture

Semgrep is more than just a bug hunter; it’s a teaching tool. When a developer sees a security warning inside their PR, they learn about the vulnerability in real-time. This creates a continuous feedback loop that improves the team’s coding skills over time. Since implementing this 24/7 automated guard, I’ve seen a significant drop in critical vulnerabilities reaching our staging environment, and I finally get to sleep through the night.

Share: