Research
September 2025Defense AI

DSPy-Based Security Pipeline for Defense-Grade LLM Protection

Multi-stage threat detection and mitigation architecture for LLMs deployed in defense and high-security environments.

Abstract

This paper presents a comprehensive DSPy-based security pipeline designed to detect and mitigate prompt injection, jailbreaking attempts, and adversarial inputs in large language models deployed for defense and high-security applications. The architecture implements session-based authentication, cryptographic immutability guarantees, parallel ensemble validation, and sophisticated threat aggregation.

TL;DR: An 8-stage security pipeline that detects LLM attacks through immutable state management, parallel threat analysis, and session-based authentication. Handles 40+ edge cases including mid-request credential expiry, multi-intent scenarios, and feedback loop poisoning.

The Recursive Security Problem

How do you use LLMs to secure LLMs without the security system itself being vulnerable to the same attacks?

Traditional security approaches fail because LLMs operate at the semantic level. Unlike SQL injection or XSS attacks that exploit syntactic vulnerabilities, prompt-based attacks exploit the model's instruction-following capabilities themselves. The defense system must understand intent, context, and subtle semantic patterns—tasks that themselves require LLM-based reasoning.

This architecture addresses the challenge through defense in depth: multiple independent detection layers (rule-based, embedding-based, LLM-based), cryptographically signed immutable state, session-based authentication, and fail-secure defaults.

8-Stage Pipeline Architecture

Each stage has a single, well-defined responsibility with explicit verification and fail-secure defaults

01
Session & Context Init
Establish authentication with immutable session tokens preventing mid-request credential expiration
02
Pre-Processing
Rule-based validation and encoding normalization for fast-path filtering
03
Screening
Rapid triage using semantic embeddings and anomaly detection
04
Threat Analysis
Parallel ensemble detection (3-5 instances) for prompt injection, jailbreaks, adversarial inputs
05
Calibration
Confidence calibration with poisoning detection and signal correlation analysis
06
Aggregation
Bayesian signal fusion with conflict resolution and threshold consistency checks
07
Contextual Validation
Multi-turn analysis and cross-session coordination detection
08
Output Safety
Final sanitization with covert channel detection and information leakage prevention

40+ Critical Edge Cases Handled

Through five iterations of stress testing, the architecture evolved to handle sophisticated attack scenarios and system failure modes.

Credential expiration during parallel processing
Multi-intent ambiguity (research + educational context)
Ensemble deadlocks and tie-breaking
Cascading sanitization attacks
Calibration poisoning via distribution shift
Feedback loop stability and anti-poisoning
Streaming boundary-splitting attacks
Zero-confidence anomaly scenarios

Production Performance

Stages
8 processing stages
Latency P50
450ms
Latency P95
<2 seconds
Throughput
1000 req/s
False Positives
<5%
False Negatives
<1%
Key Security Guarantees
  • Immutability: Input cannot be modified after hash creation without detection (SHA-256 collision resistance)
  • Session Consistency: Authentication context remains constant throughout request lifecycle
  • Ensemble Validation: 3-5 parallel detector instances with statistical consensus
  • Anti-Poisoning: Trust-scored feedback loops with stability monitoring

Deployment Configuration

Technology Stack

DSPy framework with Claude Sonnet 4.5, JWT session tokens (HMAC-SHA256), Redis for session state, PostgreSQL for audit trails, Prometheus monitoring with custom security dashboards.

Dataset Requirements

2000-3000 labeled examples: prompt injection attacks (500-700), jailbreak attempts (500-700), adversarial inputs (300-400), legitimate requests (700-1000), authenticated researcher testing (200-300), multi-turn sequences (400-600).

Production Recommendations

Enable strict mode by default, require authentication for non-emergency requests, implement IP-based rate limiting (100 req/hr unauthenticated), configure 5-instance ensembles, enable comprehensive audit logging, deploy in isolated network segment.

Need Defense-Grade LLM Security?

We build production security systems for defense and intelligence applications.