DSPy-Based Security Pipeline for Defense-Grade LLM Protection
Multi-stage threat detection and mitigation architecture for LLMs deployed in defense and high-security environments.
Abstract
This paper presents a comprehensive DSPy-based security pipeline designed to detect and mitigate prompt injection, jailbreaking attempts, and adversarial inputs in large language models deployed for defense and high-security applications. The architecture implements session-based authentication, cryptographic immutability guarantees, parallel ensemble validation, and sophisticated threat aggregation.
TL;DR: An 8-stage security pipeline that detects LLM attacks through immutable state management, parallel threat analysis, and session-based authentication. Handles 40+ edge cases including mid-request credential expiry, multi-intent scenarios, and feedback loop poisoning.
The Recursive Security Problem
How do you use LLMs to secure LLMs without the security system itself being vulnerable to the same attacks?
Traditional security approaches fail because LLMs operate at the semantic level. Unlike SQL injection or XSS attacks that exploit syntactic vulnerabilities, prompt-based attacks exploit the model's instruction-following capabilities themselves. The defense system must understand intent, context, and subtle semantic patterns—tasks that themselves require LLM-based reasoning.
This architecture addresses the challenge through defense in depth: multiple independent detection layers (rule-based, embedding-based, LLM-based), cryptographically signed immutable state, session-based authentication, and fail-secure defaults.
8-Stage Pipeline Architecture
Each stage has a single, well-defined responsibility with explicit verification and fail-secure defaults
40+ Critical Edge Cases Handled
Through five iterations of stress testing, the architecture evolved to handle sophisticated attack scenarios and system failure modes.
Production Performance
- •Immutability: Input cannot be modified after hash creation without detection (SHA-256 collision resistance)
- •Session Consistency: Authentication context remains constant throughout request lifecycle
- •Ensemble Validation: 3-5 parallel detector instances with statistical consensus
- •Anti-Poisoning: Trust-scored feedback loops with stability monitoring
Deployment Configuration
Technology Stack
DSPy framework with Claude Sonnet 4.5, JWT session tokens (HMAC-SHA256), Redis for session state, PostgreSQL for audit trails, Prometheus monitoring with custom security dashboards.
Dataset Requirements
2000-3000 labeled examples: prompt injection attacks (500-700), jailbreak attempts (500-700), adversarial inputs (300-400), legitimate requests (700-1000), authenticated researcher testing (200-300), multi-turn sequences (400-600).
Production Recommendations
Enable strict mode by default, require authentication for non-emergency requests, implement IP-based rate limiting (100 req/hr unauthenticated), configure 5-instance ensembles, enable comprehensive audit logging, deploy in isolated network segment.
Need Defense-Grade LLM Security?
We build production security systems for defense and intelligence applications.