PromptEval Documentation

Professional semantic testing framework for LLM applications.

v1.0.0Python 3.10+API Key Auth

🧪 Semantic Validation

ML-powered matching using sentence transformers (85%+ accuracy)

🔐 API Key Authentication

Secure access with pe_ prefixed keys

📊 Usage Tracking

Monitor test consumption with monthly limits

🔄 CI/CD Ready

Integrate with GitHub Actions, GitLab CI, etc.

Quick Start

Get started with PromptEval in under 5 minutes:

1. Install PromptEval

pip install prompteval-core

2. Set your API Key

# Option A: Environment variable (recommended)
export PROMPTEVAL_API_KEY=pe_your_api_key_here

# Option B: Pass directly to commands
prompteval run adapter.yml --api-key pe_xxxxx

3. Check your account status

prompteval status

# Output:
# ============================================================
# 📊 PROMPTEVAL ACCOUNT STATUS
# ============================================================
#
#    📦 License: abc123def456...
#    📋 Plan: PRO
#    ✅ Status: Active
#
#    📈 Usage This Month:
#       [████████░░░░░░░░░░░░] 40.0%
#       Tests used:      200
#       Tests remaining: 300
#       Monthly limit:   500
#
#    🔑 API Keys: 2 active
# ============================================================

4. Create test files

adapter.yml - Your LLM endpoint configuration:

name: my-llm-api
description: My LLM Testing

endpoint:
  url: https://api.example.com/v1/chat
  method: POST
  timeout: 30

request:
  headers:
    Content-Type: application/json
  
  template:
    prompt: "{{PROMPT}}"
    type: "{{TYPE}}"
    max_tokens: 150

# Response extraction
response:
  type: json
  path: choices.0.message.content

# Validation
validation:
  ml_threshold: 0.75
  use_semantic: true

# Test execution (these override config.ini if present)
execution:
  parallel_limit: 10
  batch_delay: 0.3
  output_dir: ./output

tests.yml - Your test cases:

tests:
  - name: duration_basic
    id: FIT-001
    description: Pregunta sobre tiempo de entrenamiento
    prompt: "Ask about training duration"
    context:
      PROMPT: "Ask about training duration"
      TYPE: "duration"
    expected: "Desde cuando estas entrenando este ejercicio o rutina"
    variants:
      - "Hace cuanto tiempo empezaste con este entrenamiento"
      - "Cuanto tiempo llevas entrenando este ejercicio"
      - "Por favor, dime cuándo empezaste con este entrenamiento."
      - "Cuéntame, ¿desde cuándo entrenas así?"
    threshold: 0.70
    tags:
      - duration

5. Run tests

prompteval run adapter.yml --tests tests.yml

# Output:
# 🚀 Running evaluation...
#    Adapter: adapter.yml
#
# ==================================================
# 📊 EVALUATION RESULTS
# ==================================================
#    Total tests:  2
#    ✅ Passed:    2
#    ❌ Failed:    0
#    ⚠️  Errors:    0
#    Success rate: 100.0%
#    Duration:     1523.45ms
# ==================================================
#
# 💾 Results saved: results.json

6. Generate HTML report

prompteval report results.json --output report.html

Installation

Requirements

Python 3.10 or higher
Valid PromptEval API Key

Install via pip

pip install prompteval-core

Verify installation

prompteval --help

# Output shows available commands:
# run, validate, report, status, licenses

CLI Reference

Commands Overview

Command	Description
prompteval run	Execute tests against your LLM
prompteval validate	Validate YAML syntax (no API call)
prompteval report	Generate HTML report from results
prompteval status	Show account status and quota
prompteval licenses	List your licenses

Run Tests

# Basic usage
prompteval run adapter.yml --tests tests.yml --api-key pe_xxxxx

# With environment variable
export PROMPTEVAL_API_KEY=pe_xxxxx
prompteval run adapter.yml --tests tests.yml

# Custom output file
prompteval run adapter.yml -t tests.yml -o my_results.json

# Against local/staging API
prompteval run adapter.yml -t tests.yml -u http://localhost:8000

Validate YAML

# Validate test file (doesn't consume quota)
prompteval validate tests.yml

# Output:
# 🔍 Validating: tests.yml
# ✅ Valid YAML - 5 test cases found

Generate Report

# Basic report
prompteval report results.json

# Custom output
prompteval report results.json --output my-report.html

Account Status

# Check quota and usage
prompteval status --api-key pe_xxxxx

# Or with env variable
prompteval status

Python SDK

Basic Usage

from prompteval import PromptEval

# Initialize client
client = PromptEval(api_key="pe_xxxxx")

# Run tests from files
result = client.run_from_files(
    adapter_path="adapter.yml",
    tests_path="tests.yml"
)

# Check results
print(f"Success rate: {result.success_rate}%")
print(f"Passed: {result.total_passed}/{result.total_tests}")

if result.success:
    print("✅ All tests passed!")
else:
    print("❌ Some tests failed:")
    for test in result.failed_tests:
        print(f"   - {test.id_test}: {test.similarity:.1%} similarity")

Account Methods

from prompteval import PromptEval

client = PromptEval(api_key="pe_xxxxx")

# Get licenses
licenses = client.get_licenses()
for lic in licenses:
    print(f"Plan: {lic.plan}")
    print(f"Tests remaining: {lic.tests_remaining}")

# Get usage details
usage = client.get_usage()
print(f"Tests this month: {usage['tests_this_month']}")

Error Handling

from prompteval import PromptEval
from prompteval.exceptions import (
    AuthenticationError,
    QuotaExceededError,
    RateLimitError,
    APIError
)

client = PromptEval(api_key="pe_xxxxx")

try:
    result = client.run_from_files("adapter.yml", "tests.yml")
except AuthenticationError:
    print("Invalid API key")
except QuotaExceededError:
    print("Monthly quota exceeded - upgrade your plan")
except RateLimitError:
    print("Too many requests - slow down")
except APIError as e:
    print(f"API error: {e}")

CI/CD Integration

GitHub Actions

name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install PromptEval
        run: pip install prompteval-core
      
      - name: Run LLM Tests
        env:
          PROMPTEVAL_API_KEY: ${{ secrets.PROMPTEVAL_API_KEY }}
        run: |
          prompteval run adapter.yml --tests tests.yml
      
      - name: Generate Report
        if: always()
        run: prompteval report results.json --output report.html
      
      - name: Upload Report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-report
          path: report.html

GitLab CI

llm-tests:
  image: python:3.11
  script:
    - pip install prompteval-core
    - prompteval run adapter.yml --tests tests.yml
    - prompteval report results.json --output report.html
  artifacts:
    paths:
      - report.html
    when: always
  variables:
    PROMPTEVAL_API_KEY: $PROMPTEVAL_API_KEY

Pricing Plans

Feature	Free	Pro	Team	Enterprise
Price	$0/mo	$99/mo	$299/mo	Custom
Tests/Month	30	1,000	5,000	On Demand
API Keys	1	1	5	On Demand
Semantic Validation	✅	✅	✅	✅
HTML Reports	✅	✅	✅	✅
CI/CD Integration	❌	✅	✅	✅
Priority Support	❌	❌	✅	✅
SLA	❌	❌	❌	✅

Semantic Validation

PromptEval uses sentence transformers to compute semantic similarity between expected and actual LLM outputs.

How It Works

Both expected and actual responses are converted to embeddings
Cosine similarity is calculated between embeddings
If similarity ≥ threshold, the test passes

Example

Expected: "The capital of France is Paris"
Actual:   "Paris is the capital city of France"
Similarity: 94% ✅ (threshold: 75%)

Threshold Guidelines

Threshold	Use Case
0.90+	Exact facts, numbers, specific answers
0.75-0.89	General responses, similar meaning
0.60-0.74	Loose matching, related topics
<0.60	Very flexible matching

Ready to Get Started?

Get your API key today and start testing your LLM applications professionally.

View Pricing