# LLM Security1

## LLM Security Testing and Mitigation Checklist

### Initial Access and Reconnaissance

* [ ] **Hyper-Personalized Attacks**: Test LLM for responses that could be manipulated for spear phishing.
* [ ] **Customer Impersonation Risks**: Check for potential generation of impersonation or spoofing content.
* [ ] **Malicious Input Detection**: Develop detection mechanisms for harmful or hostile prompts.
* [ ] **Social Engineering**: Assess susceptibility to prompts that generate misinformation or influence users.

### Execution and Persistence

* [ ] **Direct Prompt Injection**: Attempt direct injections to alter LLM behavior directly.
* [ ] **Indirect Prompt Injection**: Test inputs for prompts that may bypass initial model instructions.
* [ ] **Command Injection**: Attempt injection within LLM-processed inputs for unauthorized command execution.
* [ ] **System Instruction Manipulation**: Test for user manipulation of underlying instructions.

### Defense Evasion

* [ ] **Jailbreaking Attempts**: Simulate bypass attempts to subvert ethical and content guidelines.
* [ ] **Insider Threat Management**: Assess controls for authorized users potentially misusing LLMs.
* [ ] **Role-Based Access**: Verify LLM adherence to role-based access restrictions.
* [ ] **Secure Output Filtering**: Ensure filtering of harmful or sensitive content in generated responses.

### Credential Access and Privilege Escalation

* [ ] **Access Control Testing**: Check for unauthorized access to data or privileged LLM features.
* [ ] **Unauthorized API Calls**: Test if LLM can be manipulated to make unauthorized API requests.
* [ ] **Sensitive Data Extraction**: Attempt to retrieve sensitive data through crafted prompts.
* [ ] **ACL Synchronization**: Verify synchronization of ACLs in vector database and storage systems.

### Collection and Exfiltration

* [ ] **Training Data Exposure**: Test for exposure of training data through specific user inputs.
* [ ] **Data Leakage in Similarity Searches**: Assess the risk of data leakage in similarity search results.
* [ ] **Sensitive Information Disclosure**: Probe for unintended disclosure of sensitive information.
* [ ] **Memory Poisoning**: Test for manipulation of LLM’s memory or context across sessions.

### Impact

* [ ] **Sandbox Escape Testing**: Ensure generated code remains sandboxed to prevent unauthorized execution.
* [ ] **Malicious Code Injection**: Attempt injection of malicious code via crafted prompts.
* [ ] **Unauthorized Imports**: Verify that unauthorized libraries cannot be imported within generated code.
* [ ] **Resource Limits**: Test for enforcement of resource usage and execution time limits.

### Command and Control

* [ ] **Unauthorized API Requests**: Check for unauthorized calls made by the LLM to external APIs.
* [ ] **Confused Deputy Attacks**: Evaluate multi-system interactions for potential confused deputy risks.
* [ ] **Identity Propagation**: Verify that identity is consistently propagated in LLM-driven API requests.

### Trust Boundary Mapping and Secure Integration

* [ ] **Threat Modeling of LLM Components**: Map trust boundaries in LLM architecture, identifying risk points.
* [ ] **Secure Integration with Systems**: Ensure secure integration points and trust boundaries.
* [ ] **Orchestrator Security**: Test for secure identity handling and error processing in orchestration.
* [ ] **Cache Security**: Verify secure management and access control for LLM cache layers.

### Data Security and Access Control

* [ ] **Data Classification and Protection**: Validate how sensitive data is handled and classified within LLM.
* [ ] **Access Control Policies**: Implement least privilege and defense-in-depth.
* [ ] **Training Pipeline Security**: Ensure secure management of data, models, and algorithms in training pipeline.
* [ ] **Vector Database Security**: Confirm document-level and query-level access controls in vector storage.

### MLOps Pipeline Security

* [ ] **Training Data Poisoning**: Test for resilience against malicious data insertion.
* [ ] **Model Versioning Security**: Verify proper access control in model versioning systems.
* [ ] **Supply Chain Vulnerabilities**: Assess third-party dependency security within the ML pipeline.
* [ ] **Training Artifacts Access Control**: Check access controls on training logs and artifacts.

### Input Validation and Sanitization

* [ ] **SQL Injection Testing**: Attempt SQL injections in queries generated by the LLM.
* [ ] **XSS Vulnerability Testing**: Check for XSS vulnerabilities in LLM-generated outputs.
* [ ] **Special Character Handling**: Verify secure handling of special characters in user inputs.

### Output Validation and Filtering

* [ ] **Harmful Content Filtering**: Ensure filters block malicious or sensitive content.
* [ ] **Sensitive Data Filtering**: Validate filtering for PII and sensitive data in outputs.
* [ ] **Handling of PII**: Confirm LLM’s handling of PII aligns with data protection standards.
* [ ] **Context Leakage Prevention**: Verify that context from one user session does not bleed into another.

### Incident Response and Monitoring

* [ ] **Logging and Monitoring**: Ensure all actions are logged with proper audit storage.
* [ ] **Automated Response**: Implement automated response mechanisms for detected threats.
* [ ] **Incident Response Drills**: Conduct regular tabletop exercises for LLM-specific threats.
* [ ] **Red Teaming Exercises**: Include LLM-related risks in red teaming and vulnerability assessments.

***

This checklist provides structured security tests and mitigations for each relevant threat area, incorporating both your security requirements and the MITRE ATLAS framework. Let me know if you'd like any adjustments or further customization.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://playbook.sidthoviti.com/ai-security/old-drafts/llm-security1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
