PromptEval Documentation
Professional testing framework for LLM applications. Validate AI responses with ML-powered semantic matching.
v1.0.0Python 3.8+MIT License
Quick Start
Get started with PromptEval in under 5 minutes:
# Install PromptEval
pip install prompteval
# Create a test file
cat > test_example.yaml << EOF
tests:
- name: greeting_test
prompt: "Say hello"
expected: "Hello! How can I help you?"
threshold: 0.85
EOF
# Run tests
prompteval run test_example.yamlInstallation
Requirements
- Python 3.8 or higher
- pip or conda
- API key from your LLM provider
Install via pip
pip install promptevalInstall from source
git clone https://github.com/prompteval/prompteval.git
cd prompteval
pip install -e .Verify installation
prompteval --version
# Output: PromptEval v1.0.0Configuration
Configure PromptEval using environment variables or a config file:
Environment Variables
# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# Model Settings
export PROMPTEVAL_MODEL="gpt-4"
export PROMPTEVAL_TEMPERATURE="0.7"
# Semantic Matching
export PROMPTEVAL_THRESHOLD="0.85"
export PROMPTEVAL_EMBEDDING_MODEL="text-embedding-ada-002"Config File
# prompteval.yaml
api:
provider: openai
model: gpt-4
temperature: 0.7
semantic:
threshold: 0.85
embedding_model: text-embedding-ada-002
output:
format: html
path: ./reports
verbose: trueWriting YAML Tests
Define your tests in simple, readable YAML files:
Basic Test Structure
tests:
- name: customer_support_greeting
prompt: "Greet the customer"
expected: "Hello! How can I help you today?"
threshold: 0.85
- name: product_recommendation
prompt: "Recommend a laptop for coding"
expected: "I recommend a laptop with good CPU and RAM"
threshold: 0.80
contains:
- "CPU"
- "RAM"
- "laptop"Advanced Features
tests:
- name: complex_test
prompt: "Explain quantum computing"
# Multiple validation methods
expected: "Quantum computing uses quantum mechanics"
threshold: 0.85
# Required keywords
contains:
- "quantum"
- "computing"
# Forbidden content
not_contains:
- "classical"
- "traditional"
# Length constraints
min_length: 50
max_length: 500
# Custom validation
custom_validator: "validators.check_technical_accuracy"
# Timeout
timeout: 30Semantic Validation
PromptEval uses ML-powered semantic matching to validate meaning, not just text:
Traditional Testing ❌
assert response == "Hello, how can I help?"
# Fails if response is "Hi! How may I assist?"PromptEval ✅
validate_semantic(
response="Hi! How may I assist?",
expected="Hello, how can I help?",
threshold=0.85
)
# Passes with 94% similarityHow It Works
- Convert text to embeddings using state-of-the-art models
- Calculate cosine similarity between vectors
- Compare against threshold (default 0.85)
- Generate detailed similarity report
HTTP Adapter
Test any API endpoint with zero configuration:
OpenAI
tests:
- name: openai_test
adapter: http
endpoint: https://api.openai.com/v1/chat/completions
method: POST
headers:
Authorization: "Bearer ${OPENAI_API_KEY}"
Content-Type: application/json
body:
model: gpt-4
messages:
- role: user
content: "{{prompt}}"
response_path: choices[0].message.content
expected: "{{expected}}"Anthropic Claude
tests:
- name: claude_test
adapter: http
endpoint: https://api.anthropic.com/v1/messages
method: POST
headers:
x-api-key: "${ANTHROPIC_API_KEY}"
anthropic-version: "2023-06-01"
Content-Type: application/json
body:
model: claude-3-opus-20240229
max_tokens: 1024
messages:
- role: user
content: "{{prompt}}"
response_path: content[0].text
expected: "{{expected}}"Custom API
tests:
- name: custom_api_test
adapter: http
endpoint: https://your-api.com/chat
method: POST
headers:
Authorization: "Bearer ${YOUR_API_KEY}"
body:
input: "{{prompt}}"
response_path: data.response
expected: "{{expected}}"CI/CD Integration
GitHub Actions
name: LLM Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install PromptEval
run: pip install prompteval
- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: prompteval run tests/
- name: Upload report
uses: actions/upload-artifact@v2
with:
name: test-report
path: reports/GitLab CI
llm_tests:
stage: test
image: python:3.9
script:
- pip install prompteval
- prompteval run tests/
artifacts:
paths:
- reports/
only:
- main
- merge_requestsAPI Reference
Command Line
# Run tests
prompteval run <file_or_directory>
# Run with custom config
prompteval run tests/ --config=custom.yaml
# Generate report only
prompteval report results.json
# Validate YAML syntax
prompteval validate tests/
# Show version
prompteval --versionPython API
from prompteval import TestRunner, SemanticValidator
# Initialize runner
runner = TestRunner(config_path="prompteval.yaml")
# Run tests
results = runner.run("tests/")
# Semantic validation
validator = SemanticValidator(threshold=0.85)
similarity = validator.compare(
text1="Hello, how can I help?",
text2="Hi! How may I assist?"
)
print(f"Similarity: {similarity:.2%}")Examples
Customer Support Bot
tests:
- name: greeting
prompt: "Hello"
expected: "Hi! How can I help you?"
threshold: 0.85
- name: refund_request
prompt: "I want a refund"
expected: "I'll help you with that refund"
contains: ["refund", "help"]
- name: product_question
prompt: "Tell me about your premium plan"
expected: "Our premium plan includes..."
contains: ["premium", "plan"]Code Assistant
tests:
- name: python_function
prompt: "Write a function to reverse a string"
contains:
- "def"
- "return"
- "[::-1]" or "reversed"
not_contains:
- "import"
- name: code_explanation
prompt: "Explain what this does: [1,2,3].map(x => x*2)"
expected: "This maps each element and doubles it"
threshold: 0.80Content Moderation
tests:
- name: safe_content
prompt: "Is this safe: 'Hello friend'"
expected: "Yes, this content is safe"
threshold: 0.85
- name: unsafe_content
prompt: "Is this safe: [offensive content]"
expected: "No, this content violates policy"
contains: ["no", "violate", "unsafe"]Ready to Get Started?
Join 50+ companies using PromptEval to ship AI features faster.
Start Free Trial