# LLM Security Checklist

### **1. OWASP Top 10 for LLM Applications (2025)**

***

#### **1.1 Prompt Injection**

* [ ] Test for **Direct Prompt Injection** where crafted inputs alter behavior unexpectedly.

- ⬇️ **Sample Attack Scenarios**:
  * An attacker injects a prompt in a chatbot to bypass guidelines, query private data stores, and escalate privileges.
  * **Payload splitting:** malicious prompts are fragmented to evade detection but manipulate the LLM when combined.

* [ ] Validate against **Indirect Prompt Injection** by testing inputs from external sources.
  * ⬇️ **Sample Attack Scenarios**:
    * Summarizing a webpage with hidden instructions, causing the LLM to exfiltrate private conversation details.
    * Using Retrieval-Augmented Generation (RAG) to inject modified content in a repository, leading to misleading outputs.
* [ ] Ensure defenses against **Jailbreaking** attempts to bypass safety protocols.
* [ ] Conduct adversarial tests for **Multimodal Prompt Injection** (hidden instructions in images, audio, etc.).
  * ⬇️ **Sample Attack Scenario**:
    * A malicious prompt embedded in an image alters the model’s behavior when processed with text.
* [ ] Evaluate risks of **Adversarial Suffix Attacks** and multilingual/obfuscated input strategies.

***

#### **1.2 Sensitive Information Disclosure**

* [ ] Test for **Training Data Leakage** using specific queries.
* [ ] Validate system prevention of **PII or Confidential Data Extraction**.
  * ⬇️ **Sample Attack Scenario**:
    * An attacker queries the model repeatedly to infer sensitive training data patterns.
* [ ] Verify output sanitization to avoid unintended **System Prompt Disclosure**.

***

#### **1.3 Supply Chain Vulnerabilities**

* [ ] Audit dependencies for vulnerabilities in the **MLOps Pipeline**.
* [ ] Test integrity and authenticity of third-party components in the pipeline.
  * ⬇️ **Sample Attack Scenario**:
    * A compromised pre-trained model dependency introduces malicious behaviors in production.
* [ ] Ensure proper version control and immutability for LLM components.

***

#### **1.4 Data and Model Poisoning**

* [ ] Test for resistance to **Adversarial Training Data Insertion**.
  * ⬇️ **Sample Attack Scenario**:
    * Poisoned training data subtly biases an LLM to produce harmful or incorrect outputs under specific prompts.
* [ ] Monitor for unauthorized modifications of training data.
* [ ] Validate input data integrity during model fine-tuning.

***

#### **1.5 Improper Output Handling**

* [ ] Validate output to ensure compliance with safety and relevance constraints.
  * ⬇️ **Sample Attack Scenario**:
    * An LLM produces responses that violate content policies when queried with edge-case inputs.
* [ ] Test that sensitive or harmful content cannot bypass output filters.

***

#### **1.6 Excessive Agency**

* [ ] Test for improper escalation of **autonomous agent permissions**.
  * ⬇️ **Sample Attack Scenario**:
    * An LLM autonomously escalates privileges to execute unauthorized API calls.
* [ ] Validate agent actions to prevent risky or unintended decisions.

***

#### **1.7 System Prompt Leakage**

* [ ] Verify that system prompts remain inaccessible through direct or indirect queries.
  * ⬇️ **Sample Attack Scenario**:
    * An attacker uses adversarial prompts to infer and extract system-level prompt templates.
* [ ] Monitor for leakage through metadata, logs, or embedded queries.

***

#### **1.8 Vector and Embedding Weaknesses**

* [ ] Test **vector database query security** against unauthorized access.
  * ⬇️ **Sample Attack Scenario**:
    * An attacker exploits embedding similarity searches to infer sensitive stored vectors.
* [ ] Validate embedding sanitization to prevent injection or retrieval flaws.

***

#### **1.9 Misinformation Risks**

* [ ] Test for generation of **factually incorrect or biased outputs**.
  * ⬇️ **Sample Attack Scenario**:
    * An attacker manipulates LLM responses to spread false narratives by exploiting content sourcing flaws.
* [ ] Validate retrieval-augmented generation (RAG) for accurate and grounded sourcing.

***

#### **1.10 Unbounded Consumption**

* [ ] Test for **resource exhaustion vulnerabilities**, including memory and API limits.
  * ⬇️ **Sample Attack Scenario**:
    * Malicious inputs cause an LLM to perform excessive computations, leading to denial-of-service or unexpected costs.
* [ ] Monitor for abusive usage patterns.
* [ ] Test rate-limiting of Models, APIs, etc.

***

### **2. Additional Categories**

#### **2.1 Input and Output Security**

* [ ] Perform extensive **input validation** for injection attacks (e.g., SQL, XSS, command).
* [ ] Ensure outputs are sanitized and properly encoded.
* [ ] Prevent sensitive data from being accidentally returned in outputs.

***

#### **2.2 Orchestrator Security**

* [ ] Enforce **access control policies** (RBAC, ABAC) to restrict orchestrator-level permissions.
* [ ] Test for identity manipulation and unauthorized API calls.
* [ ] Test multi-factor authentication for orchestrator interfaces.

***

#### **2.3 Incident Response and Monitoring**

* [ ] Enable comprehensive logging of interactions for **audit and forensic purposes**.
* [ ] Regularly conduct tabletop exercises to test incident response to LLM-related threats.
* [ ] Create clear post-incident analysis methodologies

## References

* [OWASP Top 10 for LLM Applications 2025 PDF](https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf)
* [MITRE ATLAS](https://atlas.mitre.org/matrices/ATLAS)
* [Syncubes](https://www.syncubes.com/llm-pentesting-checklist)
* [PortSwigger's recommendations](https://portswigger.net/web-security/llm-attacks).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://playbook.sidthoviti.com/ai-security/llm-security-checklist.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
