aisentry - Unified AI/LLM Security Scanner

The Problem

The AI Security Gap

LLMs are being deployed to production faster than ever, but security is an afterthought.

⚠️

LLMs Deployed Without Testing

Teams rush AI features to production without security testing. No one checks if the model can be jailbroken, tricked into leaking data, or manipulated via prompt injection.

🎯

OWASP LLM Top 10 is Real

Prompt injection, insecure output handling, model theft — these aren't theoretical. They're actively exploited. Yet most teams don't know they exist, let alone how to detect them.

🧩

Fragmented Tooling

Existing SAST tools don't understand LLM-specific vulnerabilities. You need multiple tools, different configurations, separate reports. No single tool covers the OWASP LLM Top 10.

🔒

No Unified Standard

Security teams lack a single CLI tool that covers the full OWASP LLM Top 10. No standardized output format. No way to track vulnerabilities across code and runtime.

The Solution

One Tool. Complete Coverage.

aisentry combines static code analysis and security posture audit in a single, unified tool.

10

Static Detectors

61

Security Controls

88%

FP Reduction Accuracy

7

LLM Providers

1.7k+

PyPI Downloads

Benchmarks

Tested on Real-World LLM Frameworks

Evaluated against 10 major LLM frameworks including LangChain, LlamaIndex, vLLM, OpenAI Python, and more.

75.4%

Precision

63.0%

Recall

68.7%

F1 Score

🎯

LLM-Specific Detection Coverage

Patterns that generic SAST tools (Semgrep, Bandit) cannot detect:

LLM01 Prompt Injection (73% F1)

LLM02 Insecure Output (74% F1)

LLM04 Model DoS (80% F1)

LLM07 Insecure Plugin (93% F1)

Note: For general patterns (eval/exec/SQL), use aisentry + Bandit together. See methodology →

Real-World Repository Analysis (10 repos, 14,991 files)

Repository	Files	Findings	Findings/File
LangChain	2,501	170	0.07
LlamaIndex	4,088	999	0.24
Haystack	523	45	0.09
LiteLLM	2,792	1,623	0.58
DSPy	231	98	0.42
OpenAI Python	1,134	40	0.04
Guidance	149	31	0.21
vLLM	2,239	1,245	0.56
Semantic Kernel	1,241	30	0.02
Text Gen WebUI	93	131	1.41
Total	14,991	4,412	0.29

Detection Rate by OWASP LLM Category

Category	Recall	Precision	F1
LLM07: Insecure Plugin	100%	87.5%	93.3%
LLM04: Model DoS	66.7%	100%	80.0%
LLM09: Overreliance	66.7%	100%	80.0%
LLM02: Insecure Output	70.0%	77.8%	73.7%
LLM01: Prompt Injection	66.7%	80.0%	72.7%
LLM06: Sensitive Info	71.4%	55.6%	62.5%
LLM08: Excessive Agency	50.0%	75.0%	60.0%
LLM03: Training Poisoning	40.0%	100%	57.1%
LLM05: Supply Chain	60.0%	54.5%	57.1%
LLM10: Model Theft	28.6%	100%	44.4%

📊

Transparency: Current metrics: 75.4% precision, 63.0% recall, 68.7% F1. Outperforms Semgrep (6.8% recall) and Bandit (46.3% F1) on LLM-specific vulnerabilities. Best at detecting Insecure Plugin (93.3%), Model DoS (80.0%), and Insecure Output (73.7%).

Benchmark Methodology

All metrics are computed against a ground truth testbed with labeled vulnerabilities across 10 OWASP categories. Results are fully reproducible.

📁 View Testbed & Ground Truth 🔬 Reproduce Results 🏷️ Labels & Annotations

False Positive Reduction

ML-Powered Noise Reduction

Automatically filter common false positives with 88% accuracy using ML-trained heuristics.

🔥

PyTorch model.eval()

Not Python's dangerous eval(). Sets model to evaluation mode.

🗄️

SQLAlchemy session.exec()

Not Python's dangerous exec(). Executes SQL queries safely.

🖼️

Base64 Images

Data URIs like data:image/png;base64,... are not leaked secrets.

🔑

Placeholder Keys

Example values like your-api-key or sk-test-xxx.

🧠

Enhanced ML Reduction

For ML-based classification trained on 1,000 labeled findings, install with the [ml] extra:

pip install aisentry[ml]

Value Proposition

Built for Your Workflow

Whether you're a security engineer, developer, or platform team — we've got you covered.

🔐

For Security Engineers

Complete OWASP LLM Top 10 coverage
Evidence-based confidence scoring
Actionable remediation guidance
SARIF output for GitHub Security

💻

For Developers

Simple pip install, intuitive CLI
Works with your existing codebase
No config files needed
Clear, actionable output

⚙️

For DevOps / Platform

CI/CD ready out of the box
Multi-provider: cloud + local (Ollama)
JSON/SARIF for automation
Enterprise-ready, MIT licensed

Quick Start

Up and Running in Minutes

Install, scan, test. It's that simple.

Try It Now - Scan a Real Project

Clone any LLM project and generate reports in seconds:

# Install
pip install aisentry

# Clone a sample LLM project
git clone https://github.com/langchain-ai/langchain.git
cd langchain

# Generate HTML report (interactive, with audit)
aisentry scan ./libs/langchain -o html -f report.html

# Generate JSON report (for automation)
aisentry scan ./libs/langchain -o json -f report.json

# Generate SARIF report (for GitHub Code Scanning)
aisentry scan ./libs/langchain -o sarif -f report.sarif

# View the HTML report
open report.html

See example reports from real-world scans.

1

Install

Install from PyPI with pip

pip install aisentry

2

Scan Your Code

Static analysis + security posture audit

                                # Scan with audit (tabbed HTML report)

                                aisentry scan ./my-project -o html

                                # Standalone security posture audit

                                aisentry audit ./src -o html

3

Test Live Models

For runtime testing, use Garak

                                # Install Garak (NVIDIA's LLM vulnerability scanner)

                                pip install garak

                                # Run probes against a model

                                garak --model_type openai --model_name gpt-4

aisentry v1.0.0

Scanning ./my-project...

✓ Analyzed 47 files
✓ Ran 10 OWASP detectors
✓ Evaluated 61 security controls

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FINDINGS SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

■ CRITICAL  1   Prompt Injection (LLM01)
■ HIGH      2   Insecure Output (LLM02)
■ MEDIUM    3   Secrets Exposure (LLM06)

─────────────────────────────────────
Vulnerability Score: 67/100
Security Posture:   72/100
Maturity Level:     Developing

✓ Report saved to report.html

Supported Providers

Test Any LLM Provider

Cloud providers, local models, or custom endpoints — we support them all.

🟢 OpenAI

🟠 Anthropic

🔶 AWS Bedrock

🔵 Google Vertex

🔷 Azure OpenAI

🦙 Ollama

⚡ Custom API

FAQ

Frequently Asked Questions

What's the difference between scan and audit?

scan - Static code analysis for OWASP LLM Top 10 vulnerabilities in your source code.
audit - Security posture assessment evaluating 61 controls across 10 categories.
For live runtime testing, we recommend Garak.

How accurate is the detection?

Current metrics: 75.4% precision, 63.0% recall, 68.7% F1 score. We outperform Semgrep and Bandit on LLM-specific vulnerabilities. Use --mode strict for fewer false positives.

Does it support languages other than Python?

Currently Python-only. JavaScript/TypeScript support is planned. The architecture is extensible — see CONTRIBUTING.md if you want to help add new parsers.

Can I use this in CI/CD?

Yes! Use -o sarif for GitHub Code Scanning, or -o json for custom integrations. See CI/CD integration docs.

Is my code sent anywhere?

No. All analysis runs 100% locally. Your source code never leaves your machine.

How can I contribute?

We welcome contributions! Check out our CONTRIBUTING.md for guidelines. Good first issues are labeled good-first-issue on GitHub.

Roadmap

Building the Future of AI Security

Features you won't find anywhere else. Open-source and community-driven.

Shipped

In Progress

Planned

SHIPPED

Q1

Foundation

January 2026

✓
10 OWASP LLM Detectors
Full coverage of OWASP LLM Top 10
✓
ML-based FP Reduction
88% accuracy filtering false positives
✓
Security Posture Audit
61 controls across 10 categories
✓
HTML/JSON/SARIF Reports
CI/CD ready with GitHub integration

IN PROGRESS

Q2

Agent Security

April 2026

●
MCP Server Scanning FIRST
Scan Model Context Protocol servers for over-permissioned tools
●
Agentic Flow Analysis
Trace agent chains in LangGraph, CrewAI, AutoGen
●
RAG Pipeline Security
Vector DB injection, unsafe loaders, PII in embeddings
●
VS Code Extension
Real-time scanning in your editor

PLANNED

Q3

Deep Analysis

July 2026

○
Prompt Template Analyzer
Static analysis for injection vectors in prompts
○
Guardrails Scanner
Detect missing NeMo, LlamaGuard, Guardrails AI
○
Fine-tuning Data Security
PII detection, poisoning indicators in training data
○
LLM Gateway Scanner
Audit LiteLLM, AI Gateway proxy configs

PLANNED

Q4

Advanced Features

October 2026

○
Multi-modal Security NEW
Image prompt injection, audio input sanitization
○
Runtime-Static Bridge
Generate Garak tests from static findings
○
JavaScript/TypeScript
Full support for JS/TS LLM applications
○
Team Dashboard
Cloud dashboard with trend tracking

Have a feature request? We're building in the open.

Request Feature Star on GitHub