PromptEval Documentation

Professional semantic testing framework for LLM applications.

v1.0.0Python 3.10+API Key Auth

πŸ§ͺ Semantic Validation

ML-powered matching using sentence transformers (85%+ accuracy)

πŸ” API Key Authentication

Secure access with pe_ prefixed keys

πŸ“Š Usage Tracking

Monitor test consumption with monthly limits

πŸ”„ CI/CD Ready

Integrate with GitHub Actions, GitLab CI, etc.

Quick Start

Get started with PromptEval in under 5 minutes:

1. Install PromptEval

pip install prompteval-core

2. Set your API Key

# Option A: Environment variable (recommended)
export PROMPTEVAL_API_KEY=pe_your_api_key_here

# Option B: Pass directly to commands
prompteval run adapter.yml --api-key pe_xxxxx

3. Check your account status

prompteval status

# Output:
# ============================================================
# πŸ“Š PROMPTEVAL ACCOUNT STATUS
# ============================================================
#
#    πŸ“¦ License: abc123def456...
#    πŸ“‹ Plan: PRO
#    βœ… Status: Active
#
#    πŸ“ˆ Usage This Month:
#       [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 40.0%
#       Tests used:      200
#       Tests remaining: 300
#       Monthly limit:   500
#
#    πŸ”‘ API Keys: 2 active
# ============================================================

4. Create test files

adapter.yml - Your LLM endpoint configuration:

name: my-llm-api
description: My LLM Testing

endpoint:
  url: https://api.example.com/v1/chat
  method: POST
  timeout: 30

request:
  headers:
    Content-Type: application/json
  
  template:
    prompt: "{{PROMPT}}"
    type: "{{TYPE}}"
    max_tokens: 150

# Response extraction
response:
  type: json
  path: choices.0.message.content

# Validation
validation:
  ml_threshold: 0.75
  use_semantic: true

# Test execution (these override config.ini if present)
execution:
  parallel_limit: 10
  batch_delay: 0.3
  output_dir: ./output

tests.yml - Your test cases:

tests:
  - name: duration_basic
    id: FIT-001
    description: Pregunta sobre tiempo de entrenamiento
    prompt: "Ask about training duration"
    context:
      PROMPT: "Ask about training duration"
      TYPE: "duration"
    expected: "Desde cuando estas entrenando este ejercicio o rutina"
    variants:
      - "Hace cuanto tiempo empezaste con este entrenamiento"
      - "Cuanto tiempo llevas entrenando este ejercicio"
      - "Por favor, dime cuΓ‘ndo empezaste con este entrenamiento."
      - "CuΓ©ntame, ΒΏdesde cuΓ‘ndo entrenas asΓ­?"
    threshold: 0.70
    tags:
      - duration

5. Run tests

prompteval run adapter.yml --tests tests.yml

# Output:
# πŸš€ Running evaluation...
#    Adapter: adapter.yml
#
# ==================================================
# πŸ“Š EVALUATION RESULTS
# ==================================================
#    Total tests:  2
#    βœ… Passed:    2
#    ❌ Failed:    0
#    ⚠️  Errors:    0
#    Success rate: 100.0%
#    Duration:     1523.45ms
# ==================================================
#
# πŸ’Ύ Results saved: results.json

6. Generate HTML report

prompteval report results.json --output report.html

Installation

Requirements

  • Python 3.10 or higher
  • Valid PromptEval API Key

Install via pip

pip install prompteval-core

Verify installation

prompteval --help

# Output shows available commands:
# run, validate, report, status, licenses

CLI Reference

Commands Overview

CommandDescription
prompteval runExecute tests against your LLM
prompteval validateValidate YAML syntax (no API call)
prompteval reportGenerate HTML report from results
prompteval statusShow account status and quota
prompteval licensesList your licenses

Run Tests

# Basic usage
prompteval run adapter.yml --tests tests.yml --api-key pe_xxxxx

# With environment variable
export PROMPTEVAL_API_KEY=pe_xxxxx
prompteval run adapter.yml --tests tests.yml

# Custom output file
prompteval run adapter.yml -t tests.yml -o my_results.json

# Against local/staging API
prompteval run adapter.yml -t tests.yml -u http://localhost:8000

Validate YAML

# Validate test file (doesn't consume quota)
prompteval validate tests.yml

# Output:
# πŸ” Validating: tests.yml
# βœ… Valid YAML - 5 test cases found

Generate Report

# Basic report
prompteval report results.json

# Custom output
prompteval report results.json --output my-report.html

Account Status

# Check quota and usage
prompteval status --api-key pe_xxxxx

# Or with env variable
prompteval status

Python SDK

Basic Usage

from prompteval import PromptEval

# Initialize client
client = PromptEval(api_key="pe_xxxxx")

# Run tests from files
result = client.run_from_files(
    adapter_path="adapter.yml",
    tests_path="tests.yml"
)

# Check results
print(f"Success rate: {result.success_rate}%")
print(f"Passed: {result.total_passed}/{result.total_tests}")

if result.success:
    print("βœ… All tests passed!")
else:
    print("❌ Some tests failed:")
    for test in result.failed_tests:
        print(f"   - {test.id_test}: {test.similarity:.1%} similarity")

Account Methods

from prompteval import PromptEval

client = PromptEval(api_key="pe_xxxxx")

# Get licenses
licenses = client.get_licenses()
for lic in licenses:
    print(f"Plan: {lic.plan}")
    print(f"Tests remaining: {lic.tests_remaining}")

# Get usage details
usage = client.get_usage()
print(f"Tests this month: {usage['tests_this_month']}")

Error Handling

from prompteval import PromptEval
from prompteval.exceptions import (
    AuthenticationError,
    QuotaExceededError,
    RateLimitError,
    APIError
)

client = PromptEval(api_key="pe_xxxxx")

try:
    result = client.run_from_files("adapter.yml", "tests.yml")
except AuthenticationError:
    print("Invalid API key")
except QuotaExceededError:
    print("Monthly quota exceeded - upgrade your plan")
except RateLimitError:
    print("Too many requests - slow down")
except APIError as e:
    print(f"API error: {e}")

CI/CD Integration

GitHub Actions

name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install PromptEval
        run: pip install prompteval-core
      
      - name: Run LLM Tests
        env:
          PROMPTEVAL_API_KEY: ${{ secrets.PROMPTEVAL_API_KEY }}
        run: |
          prompteval run adapter.yml --tests tests.yml
      
      - name: Generate Report
        if: always()
        run: prompteval report results.json --output report.html
      
      - name: Upload Report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-report
          path: report.html

GitLab CI

llm-tests:
  image: python:3.11
  script:
    - pip install prompteval-core
    - prompteval run adapter.yml --tests tests.yml
    - prompteval report results.json --output report.html
  artifacts:
    paths:
      - report.html
    when: always
  variables:
    PROMPTEVAL_API_KEY: $PROMPTEVAL_API_KEY

Pricing Plans

FeatureFreeProTeamEnterprise
Price$0/mo$99/mo$299/moCustom
Tests/Month301,0005,000On Demand
API Keys115On Demand
Semantic Validationβœ…βœ…βœ…βœ…
HTML Reportsβœ…βœ…βœ…βœ…
CI/CD IntegrationβŒβœ…βœ…βœ…
Priority SupportβŒβŒβœ…βœ…
SLAβŒβŒβŒβœ…

Semantic Validation

PromptEval uses sentence transformers to compute semantic similarity between expected and actual LLM outputs.

How It Works

  1. Both expected and actual responses are converted to embeddings
  2. Cosine similarity is calculated between embeddings
  3. If similarity β‰₯ threshold, the test passes

Example

Expected: "The capital of France is Paris"
Actual:   "Paris is the capital city of France"
Similarity: 94% βœ… (threshold: 75%)

Threshold Guidelines

ThresholdUse Case
0.90+Exact facts, numbers, specific answers
0.75-0.89General responses, similar meaning
0.60-0.74Loose matching, related topics
<0.60Very flexible matching

Ready to Get Started?

Get your API key today and start testing your LLM applications professionally.