September 2025•Defense AI

DSPy-Based Security Pipeline for Defense-Grade LLM Protection

Multi-stage threat detection and mitigation architecture for LLMs deployed in defense and high-security environments.

Download PDF Discuss Your Security Needs

Abstract

This paper presents a comprehensive DSPy-based security pipeline designed to detect and mitigate prompt injection, jailbreaking attempts, and adversarial inputs in large language models deployed for defense and high-security applications. The architecture implements session-based authentication, cryptographic immutability guarantees, parallel ensemble validation, and sophisticated threat aggregation.

TL;DR: An 8-stage security pipeline that detects LLM attacks through immutable state management, parallel threat analysis, and session-based authentication. Handles 40+ edge cases including mid-request credential expiry, multi-intent scenarios, and feedback loop poisoning.

The Recursive Security Problem

How do you use LLMs to secure LLMs without the security system itself being vulnerable to the same attacks?

Traditional security approaches fail because LLMs operate at the semantic level. Unlike SQL injection or XSS attacks that exploit syntactic vulnerabilities, prompt-based attacks exploit the model's instruction-following capabilities themselves. The defense system must understand intent, context, and subtle semantic patterns—tasks that themselves require LLM-based reasoning.

This architecture addresses the challenge through defense in depth: multiple independent detection layers (rule-based, embedding-based, LLM-based), cryptographically signed immutable state, session-based authentication, and fail-secure defaults.

8-Stage Pipeline Architecture

Each stage has a single, well-defined responsibility with explicit verification and fail-secure defaults

Session & Context Init

Establish authentication with immutable session tokens preventing mid-request credential expiration

Pre-Processing

Rule-based validation and encoding normalization for fast-path filtering

Screening

Rapid triage using semantic embeddings and anomaly detection

Threat Analysis

Parallel ensemble detection (3-5 instances) for prompt injection, jailbreaks, adversarial inputs

Calibration

Confidence calibration with poisoning detection and signal correlation analysis

Aggregation

Bayesian signal fusion with conflict resolution and threshold consistency checks

Contextual Validation

Multi-turn analysis and cross-session coordination detection

Output Safety

Final sanitization with covert channel detection and information leakage prevention

40+ Critical Edge Cases Handled

Through five iterations of stress testing, the architecture evolved to handle sophisticated attack scenarios and system failure modes.

→Credential expiration during parallel processing

→Multi-intent ambiguity (research + educational context)

→Ensemble deadlocks and tie-breaking

→Cascading sanitization attacks

→Calibration poisoning via distribution shift

→Feedback loop stability and anti-poisoning

→Streaming boundary-splitting attacks

→Zero-confidence anomaly scenarios

Production Performance

Stages

8 processing stages

Latency P50

450ms

Latency P95

<2 seconds

Throughput

1000 req/s

False Positives

<5%

False Negatives

<1%

Key Security Guarantees

•Immutability: Input cannot be modified after hash creation without detection (SHA-256 collision resistance)
•Session Consistency: Authentication context remains constant throughout request lifecycle
•Ensemble Validation: 3-5 parallel detector instances with statistical consensus
•Anti-Poisoning: Trust-scored feedback loops with stability monitoring

Deployment Configuration

Technology Stack

DSPy framework with Claude Sonnet 4.5, JWT session tokens (HMAC-SHA256), Redis for session state, PostgreSQL for audit trails, Prometheus monitoring with custom security dashboards.

Dataset Requirements

2000-3000 labeled examples: prompt injection attacks (500-700), jailbreak attempts (500-700), adversarial inputs (300-400), legitimate requests (700-1000), authenticated researcher testing (200-300), multi-turn sequences (400-600).

Production Recommendations

Enable strict mode by default, require authentication for non-emergency requests, implement IP-based rate limiting (100 req/hr unauthenticated), configure 5-instance ensembles, enable comprehensive audit logging, deploy in isolated network segment.

Need Defense-Grade LLM Security?

We build production security systems for defense and intelligence applications.

Get in Touch View All Research