Skip to content

Latest commit

 

History

History
280 lines (194 loc) · 8.15 KB

File metadata and controls

280 lines (194 loc) · 8.15 KB

asyncmy Benchmark Suite

Comprehensive performance benchmarks for asyncmy, comparing against mysqlclient, pymysql, and aiomysql.

Test Environment

  • CPU: Apple Mac Studio (M4 Max)
  • Memory: 64GB
  • Python: 3.14
  • MySQL: 9.6.0
  • Test Data: 100,000 rows with realistic schema

Performance Summary

Test Winner asyncmy Rank Notes
Large Result Set (50k rows) mysqlclient #2/4 ~0.090s - 2x faster than aiomysql
Concurrent Queries (50 queries) Variable #1-2/2 Very close to aiomysql
Connection Pool (2k queries) 🏆 asyncmy #1/2 Consistently 22-28% faster
Batch Insert (10k rows) Variable #1-4/4 Results vary significantly

Key Insights

  • Connection Pool: asyncmy consistently shows 22-28% better throughput than aiomysql
  • Large Result Set: asyncmy is 2x faster than pymysql/aiomysql, close to mysqlclient
  • Concurrent Queries: asyncmy and aiomysql are comparable (within margin of error)
  • Batch Insert: Results vary by run; all libraries perform similarly

Recent Optimizations (v0.2.12)

Five major performance improvements were implemented:

  1. Buffer Management - Eliminated redundant memory copies in packet reading (fast path optimization)
  2. DateTime Parsing - Replaced regex with fast string slicing for datetime/date/time conversions
  3. Row Parsing - Pre-allocated lists and C-level indexing replacing Python list.append()
  4. Protocol Parsing - Inlined read_length_coded_string with fast path for common cases (length < 251)
  5. Batch Operations - Pre-process values and use list accumulation + join for executemany

These optimizations resulted in:

  • asyncmy consistently leads in connection pool benchmarks (22-28% faster)
  • Large result set parsing improved - now 2x faster than aiomysql
  • Close to mysqlclient performance in data-intensive workloads

Test Scenarios

1. Large Result Set (large_resultset.py)

Tests the efficiency of fetching and processing large datasets in a single query.

What it measures:

  • Packet reading efficiency (optimized buffer management)
  • Data parsing speed (optimized datetime conversion)
  • Memory efficiency

Test: Fetch 50,000 rows with all column types (int, varchar, datetime, decimal, text)

Results (typical):

1. mysqlclient      0.085s  (1.00x vs best)
2. asyncmy          0.090s  (0.94x vs best)
3. pymysql          0.165s  (0.52x vs best)
4. aiomysql         0.170s  (0.50x vs best)

2. Concurrent Queries (concurrent.py)

Tests async libraries' ability to run multiple queries concurrently.

What it measures:

  • Connection handling under concurrent load
  • Async I/O efficiency
  • Query interleaving capabilities

Test: Execute 50 concurrent SELECT queries (each query gets its own connection)

Results (typical):

1. asyncmy/aiomysql  ~0.015s - Results vary, both libraries perform similarly

Note: Synchronous libraries (mysqlclient, pymysql) cannot efficiently handle this scenario without threads. The difference between asyncmy and aiomysql is within margin of error.

3. Connection Pool (pool.py)

Tests connection pool performance with concurrent query load.

What it measures:

  • Connection acquisition/release efficiency
  • Connection reuse patterns
  • Pool management under concurrent access

Test: Execute 2,000 queries using a connection pool (5-20 connections)

Results (typical):

1. asyncmy          ~0.190s  (1.00x vs best) ⭐ WINNER - ~10,500 queries/sec
2. aiomysql         ~0.240s  (0.78x vs best) - ~8,300 queries/sec

Insight: asyncmy's connection pool consistently shows 22-28% better throughput than aiomysql across multiple runs.

4. Batch Insert (batch_insert.py)

Tests bulk insert performance using executemany().

What it measures:

  • Batch operation efficiency
  • SQL statement building
  • Network/protocol overhead

Test: Insert 10,000 rows using executemany()

Results (variable):

Results vary significantly between runs. All libraries perform similarly,
with rankings changing between mysqlclient, pymysql, asyncmy, and aiomysql.
Typical throughput: 80,000-110,000 rows/sec for all libraries.

Running the Benchmarks

Prerequisites

  1. MySQL server running and accessible
  2. Python ≥ 3.9 with required packages installed

Setup

# Set database connection parameters
export MYSQL_HOST=localhost
export MYSQL_USER=root
export MYSQL_PASS=password

# Optional: specify port (default: 3306)
export MYSQL_PORT=3306

Run All Benchmarks

# Run complete benchmark suite
python -m benchmark.run_all

This will:

  1. Create test database and populate with 100,000 rows
  2. Run all 4 benchmark tests
  3. Generate a summary report
  4. Clean up test data

Run Individual Tests

# Test large result set performance
python -m benchmark.large_resultset

# Test concurrent query performance
python -m benchmark.concurrent

# Test connection pool performance
python -m benchmark.pool

# Test batch insert performance
python -m benchmark.batch_insert

Test Data Schema

The benchmark uses a realistic table schema:

CREATE TABLE benchmark_data (
    id INT NOT NULL AUTO_INCREMENT,
    user_id INT NOT NULL,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,
    score DECIMAL(10,2) NOT NULL,
    is_active TINYINT NOT NULL,
    data TEXT,
    PRIMARY KEY (id),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Benchmark Configuration

Default settings (can be modified in benchmark/__init__.py):

ROW_COUNT = 100000           # Total test rows
BATCH_SIZE = 10000          # Batch operation size
CONCURRENT_COUNT = 50       # Number of concurrent operations

Performance Tips

Based on benchmark results, asyncmy performs best when:

  1. Using connection pools - asyncmy's pool implementation is highly optimized (22-28% faster than aiomysql)
  2. Fetching large result sets - Use asyncmy for queries returning 1,000+ rows (2x faster than aiomysql)
  3. Processing datetime-heavy data - Optimized datetime parsing provides significant gains
  4. Handling concurrent workloads - Async architecture allows efficient I/O interleaving

Comparison with Other Libraries

vs mysqlclient (sync)

  • Competitive for large result sets (only 5-10% slower)
  • Better for concurrent workloads (asyncmy uses async I/O)
  • Similar for batch inserts (results vary)

vs pymysql (sync)

  • 2x faster for large result sets
  • Much better for concurrent workloads
  • Better connection pool (asyncmy has native pool)

vs aiomysql (async)

  • 2x faster for large result sets (optimized parsing)
  • 22-28% faster connection pool throughput (consistent)
  • Comparable for concurrent queries (within margin of error)
  • Similar for batch inserts (results vary)

Contributing

To add new benchmarks:

  1. Create a new file: benchmark/your_benchmark.py
  2. Implement test functions following the existing pattern
  3. Add a run_benchmark() function that returns sorted results
  4. Import and call your test in benchmark/run_all.py

Example template:

"""
Benchmark: Your Test Name

Description of what this test measures.
"""
import asyncio
import time
from benchmark import connection_kwargs

async def test_asyncmy():
    # Your test implementation
    start = time.time()
    # ... test code ...
    elapsed = time.time() - start
    return elapsed

def run_benchmark():
    results = {}
    loop = asyncio.new_event_loop()

    # Run tests and collect results
    results['asyncmy'] = loop.run_until_complete(test_asyncmy())

    # Return sorted results
    return sorted(results.items(), key=lambda x: x[1])

License

Apache-2.0