PowerMem Benchmark

A comprehensive benchmarking suite for PowerMem, including a REST API server for managing memories and a load testing tool based on the LOCOMO dataset.

Overview

The PowerMem Benchmark suite consists of two main components:

Benchmark Server (benchmark/server/): A FastAPI-based REST API server that provides:
- Memory storage and management
- Semantic search capabilities
- Token usage tracking
- Support for multiple database backends (OceanBase, PostgreSQL)
Load Testing Tool (benchmark/locomo/): A comprehensive benchmarking tool that:
- Tests memory addition and search performance
- Evaluates response quality using multiple metrics
- Measures latency and token consumption
- Uses the LOCOMO dataset for realistic testing scenarios

Quick Start

1. Start the Benchmark Server

# Install dependencies
pip install -e .

# Configure environment (use project root .env)
cp .env.example .env
# Edit .env at project root with your settings

# Start the server
uvicorn benchmark.server.main:app --host 0.0.0.0 --port 8000 --reload

2. Run Load Tests

# Install load testing dependencies
pip install -r benchmark/locomo/requirements.txt

# Configure environment
cd benchmark/locomo
cp .env.example .env
# Edit .env with your API keys and server URL

# Run tests
bash run.sh results

Benchmark Server

Prerequisites

Python 3.11 or higher
pip or poetry for dependency management
LLM and embedding API keys (OpenAI, Qwen, etc. — see root .env.example)
Database: OceanBase, PostgreSQL, or SQLite (depending on your configuration)

Installation

Install dependencies

From the project root:
```
pip install -e .
```
Or install specific dependencies:
```
pip install fastapi uvicorn powermem
```
Configure environment variables

Copy the example environment file at project root:
```
cp .env.example .env
```
Edit .env at project root and configure:
- LLM_API_KEY and EMBEDDING_API_KEY (required)
- DATABASE_PROVIDER and database connection settings (OceanBase, PostgreSQL, or SQLite)
- Other options as in the root .env.example
See the root .env.example for all available configuration options.

Configuration

All configuration is done through environment variables. The server loads the .env file from the project root (same as the main PowerMem app).

Required Environment Variables

LLM_API_KEY: Your LLM API key (or set OPENAI_API_KEY for OpenAI)
EMBEDDING_API_KEY: Your embedding API key (or set OPENAI_API_KEY for OpenAI)

Optional Environment Variables

All options are the same as the main PowerMem app. See the root .env.example for the full list. Examples:

DATABASE_PROVIDER: oceanbase, postgres, or sqlite (default: sqlite)
LLM_PROVIDER / LLM_MODEL / LLM_TEMPERATURE: LLM settings
EMBEDDING_PROVIDER / EMBEDDING_MODEL / EMBEDDING_DIMS: Embedding settings

Token counting is always enabled on the benchmark server (no env to disable it).

Starting the Server

Method 1: Using uvicorn (Recommended)

From the project root:

uvicorn benchmark.server.main:app --host 0.0.0.0 --port 8000 --reload

The --reload flag enables auto-reload during development.

Method 2: Production Mode

For production with multiple workers:

uvicorn benchmark.server.main:app --host 0.0.0.0 --port 8000 --workers 4

Method 3: Using Python Module

python -m uvicorn benchmark.server.main:app --host 0.0.0.0 --port 8000

Method 4: From Server Directory

If you're in the benchmark/server directory:

cd benchmark/server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Accessing the API

Once the server is running, you can access:

Alternative API Documentation (ReDoc): http://localhost:8000/redoc
API Root: http://localhost:8000/

API Endpoints

The server provides the following main endpoints:

Method	Endpoint	Description
`POST`	`/memories`	Create new memories
`GET`	`/memories`	Get all memories (with filters)
`GET`	`/memories/{memory_id}`	Get a specific memory
`PUT`	`/memories/{memory_id}`	Update a memory
`DELETE`	`/memories/{memory_id}`	Delete a memory
`DELETE`	`/memories`	Delete all memories (with filters)
`POST`	`/search`	Search memories
`GET`	`/memories/{memory_id}/history`	Get memory history
`POST`	`/reset`	Reset all memories
`POST`	`/configure`	Update server configuration
`GET`	`/token_count`	Get token usage statistics
`POST`	`/reset_token_count`	Reset token count

Load Testing (LOCOMO)

The LOCOMO benchmark tool performs comprehensive evaluations of memory systems using the LOCOMO dataset. It tests both memory addition and search capabilities, measuring performance, quality, and resource consumption.

Prerequisites for Load Testing

Benchmark server must be running
- Start the benchmark server first (see Starting the Server)
- The server should be accessible at the URL specified in your configuration

Install LOCOMO dependencies

From the project root:

pip install -r benchmark/locomo/requirements.txt

Or from the locomo directory:

cd benchmark/locomo
pip install -r requirements.txt

Configuring Environment Variables for Load Testing

Create environment configuration file
```
cd benchmark/locomo
cp .env.example .env
```
Edit the .env file

Open benchmark/locomo/.env and configure the following variables:
```
# OpenAI API configuration
MODEL="qwen3-max"  # or "gpt-4o", "gpt-4", etc.
OPENAI_BASE_URL="https://api.openai.com/v1"  # or your API endpoint
OPENAI_API_KEY="your_api_key_here"  # Your OpenAI API key

# API configuration - must match your running server
API_BASE_URL="http://127.0.0.1:8000"  # URL of the benchmark server
```
Important Configuration Notes:
- API_BASE_URL: Must match the URL where your benchmark server is running
  - Default: http://127.0.0.1:8000 (if server runs on localhost:8000)
  - If server runs on a different port, update accordingly
- MODEL: The LLM model to use for answer generation
- OPENAI_API_KEY: API key for the LLM service
- OPENAI_BASE_URL: Base URL for the LLM API (can be OpenAI or compatible service)

Running Load Tests

Full Test Suite

Ensure the benchmark server is running

In a separate terminal:

uvicorn benchmark.server.main:app --host 0.0.0.0 --port 8000

Run the complete test script

From the benchmark/locomo directory:
```
cd benchmark/locomo
bash run.sh [output_folder]
```
Or from the project root:
```
cd benchmark/locomo && bash run.sh results
```
The output_folder parameter is optional (defaults to results).
What the script does

The run.sh script performs the following steps:
- Resets token count on the server
- Records initial token count
- Runs memory addition experiments (add method)
- Runs memory search experiments (search method)
- Records final token count
- Generates evaluation metrics
- Displays evaluation results

Running Individual Test Methods

You can also run individual test methods manually:

Memory Addition Test:

cd benchmark/locomo
python3 run_experiments.py --method add --output_folder results

Memory Search Test:

cd benchmark/locomo
python3 run_experiments.py --method search --output_folder results --top_k 30

Available Options:

--method: Test method - add or search (default: add)
--chunk_size: Chunk size for processing (default: 1000)
--top_k: Number of top memories to retrieve (default: 30)
--filter_memories: Enable memory filtering
--is_graph: Use graph-based search
--num_chunks: Number of chunks to process (default: 1)
--output_folder: Output directory for results (default: results/)

Understanding Test Results

After running the tests, results are stored in the output folder (default: results/):

Output Files

results.json: Detailed results for each question and conversation
- Contains all question-answer pairs with retrieved memories
- Includes timing information for each operation
evaluation_metrics.json: Computed evaluation metrics
- BLEU scores
- F1 scores
- LLM judge scores
evaluation.txt: Human-readable evaluation summary
- Total server execution time
- Total requests processed
- Average request time
- P95 latency (95th percentile)
- Token consumption statistics
token1.json and token2.json: Token counts before and after tests
- Used to calculate total token consumption

Evaluation Metrics

The benchmark evaluates performance using multiple metrics:

BLEU Score: Measures similarity between model response and ground truth
- Range: 0.0 to 1.0 (higher is better)
- Based on n-gram overlap
F1 Score: Harmonic mean of precision and recall
- Range: 0.0 to 1.0 (higher is better)
- Measures answer quality
LLM Score: Binary score (0 or 1) from LLM judge evaluation
- 1 = Correct answer, 0 = Incorrect answer
- Determined by an LLM judge
Token Consumption: Number of tokens used for answer generation
- Includes both prompt and completion tokens
- Tracked before and after tests
Latency Metrics:
- Average request time: Mean time per request
- P95 latency: 95th percentile request time
- Total execution time: Total time for all operations

Troubleshooting

Server Issues

"OPENAI_API_KEY environment variable is required" (or missing LLM/embedding keys)

Solution: Create or edit the .env file at project root (same as PowerMem)
Verify that LLM_API_KEY and EMBEDDING_API_KEY (or OPENAI_API_KEY) are set in the project root .env
Ensure you have configured the root .env before starting the server

Database connection errors

Solution:
- Check your database configuration in .env
- Ensure your database server is running
- Verify connection credentials (host, port, user, password)
- Test database connectivity separately

Port already in use

Solution:
- Change the port: uvicorn benchmark.server.main:app --port 8001
- Or find and kill the process using the port:
```
lsof -ti:8000 | xargs kill -9
```

OpenAI API 404 Not Found Error

Solution:
- Check your OPENAI_BASE_URL is right, for example:

OpenAI API 402 Rate Limiting

Solution:
- Reduce concurrency: Adjust max_workers in methods/add.py if needed
- Increase the number of ApiKeys (the multi-ApiKey proxy solution will be uploaded in the future)

Module not found errors

Solution:
- Install dependencies: pip install -e .
- Verify you're running from the project root directory
- Check Python path and virtual environment activation

Load Testing Issues

"api_base_url is not set"

Solution:
- Create .env file in benchmark/locomo/ directory
- Verify that API_BASE_URL is set correctly
- Ensure the URL matches your running server address

Connection refused errors

Solution:
- Verify the benchmark server is running
- Check that API_BASE_URL in .env matches the server URL
- Ensure the server is accessible from the test location

"model is not set" or "openai_api_key is not set"

Solution:
- Check that MODEL and OPENAI_API_KEY are set in benchmark/locomo/.env
- Verify the API key is valid
- Ensure no extra quotes or spaces in the values

Import errors

Solution:
- Install all dependencies: pip install -r benchmark/locomo/requirements.txt
- Ensure you're running from the correct directory
- Check Python version (requires 3.11+)

Slow performance

Solution:
- The tests use multi-threading (32 workers by default)
- Reduce concurrency: Adjust max_workers in methods/add.py if needed
- Consider running tests on a machine with more resources
- Check server performance and database connection pool size

Dataset not found

Solution:

Nltk data not found:

import nltk
nltk.download("punkt", quiet=True)
nltk.download("wordnet", quiet=True)

SentenceTransformer model not found:

from sentence_transformers import SentenceTransformer
# Initialize SentenceTransformer model (this will be reused)
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

License

See the main project LICENSE file for details.

Overview​

Quick Start​

1. Start the Benchmark Server​

2. Run Load Tests​

Benchmark Server​

Prerequisites​

Installation​

Configuration​

Required Environment Variables​

Optional Environment Variables​

Starting the Server​

Method 1: Using uvicorn (Recommended)​

Method 2: Production Mode​

Method 3: Using Python Module​

Method 4: From Server Directory​

Accessing the API​

API Endpoints​

Load Testing (LOCOMO)​

Prerequisites for Load Testing​

Configuring Environment Variables for Load Testing​

Running Load Tests​

Full Test Suite​

Running Individual Test Methods​

Understanding Test Results​

Output Files​

Evaluation Metrics​

Troubleshooting​

Server Issues​

"OPENAI_API_KEY environment variable is required" (or missing LLM/embedding keys)​

Database connection errors​

Port already in use​

OpenAI API 404 Not Found Error​

OpenAI API 402 Rate Limiting​

Module not found errors​

Load Testing Issues​

"api_base_url is not set"​

Connection refused errors​

"model is not set" or "openai_api_key is not set"​

Import errors​

Slow performance​

Dataset not found​

License​