Skip to content

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI #524

@justadreamer

Description

@justadreamer

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI

Problem Statement

We've identified and profiled a significant performance bottleneck in SWIG-generated .NET bindings compared to Java bindings for the same native C++ library. The .NET implementation is 2.5x slower than Java despite both using identical native code.

Performance Comparison

Implementation Detections/Second Performance Gap
Java (JNI) 540,541 Baseline
.NET (P/Invoke) 212,745 2.5x slower
.NET (with UTF-8 workaround) ~357,000 1.5x slower

Profiling Results

We instrumented the .NET code to measure time spent in different operations during device detection:

Without UTF-8 Preprocessing:

Total detections: 5,000
Total time: 100.0 ms

Time breakdown:
  Process():        66.0 ms (66.0%) ← SWIG-generated native call
  Other operations: 34.0 ms (34.0%)

Per-detection: 0.0200 ms/detection

With UTF-8 Preprocessing Workaround:

Total detections: 5,000  
Total time: 44.0 ms

Time breakdown:
  Process():        26.0 ms (59.1%) ← Same native call, 2.5x faster
  Other operations: 18.0 ms (40.9%)

Per-detection: 0.0088 ms/detection (2.27x improvement)

Root Cause Analysis

String Marshaling Overhead

66% of .NET detection time is spent in the SWIG-generated Process() call due to string marshaling.

The marshaling overhead occurs at the P/Invoke boundary when calling native methods like:

[DllImport("FiftyOne.DeviceDetection.Hash.Engine.OnPremise.Native.dll")]
public static extern IntPtr EngineHashSwig_process__SWIG_1(HandleRef jarg1, string jarg2);

Every string parameter undergoes automatic marshaling by the .NET runtime:

.NET (Generated P/Invoke):

[DllImport("Native.dll")]
public static extern void MapStringStringSwig_Add(HandleRef jarg1, string jarg2, string jarg3);
  • UTF-16 to ASCII conversion happens for every string parameter (required by native code)
  • This is fundamentally more expensive than Java's UTF-8 to ASCII conversion
  • .NET runtime performs this conversion at every P/Invoke call
  • The same strings are converted repeatedly in high-throughput scenarios

Java (Generated JNI):

public final static native void Evidence_AddFromBytes(long jarg1, EvidenceBaseSwig jarg1_, byte[] jarg2, byte[] jarg4);
  • Java strings are stored as UTF-8 internally (ASCII is a subset of UTF-8)
  • Direct conversion from UTF-8 to ASCII without encoding overhead
  • Byte arrays passed directly to native code with minimal marshaling cost

Evidence Processing Comparison

.NET Approach:

// String passed directly - UTF-16 to ASCII conversion overhead
relevantEvidence.Add(new KeyValuePair<string, string>(
    evidenceItem.Key,           // UTF-16 → ASCII conversion at P/Invoke
    evidenceItem.Value.ToString())); // UTF-16 → ASCII conversion at P/Invoke

Java Approach:

// Efficient byte array conversion - UTF-8 to ASCII (minimal overhead)
relevantEvidence.addFromBytes(
    Swig.asBytes(evidenceItem.getKey()),     // UTF-8 → ASCII (straightforward)
    Swig.asBytes(evidenceItem.getValue().toString()));

Our Current Workaround

We implemented a proof-of-concept UTF-8 preprocessing approach in our performance benchmarks to demonstrate the impact of string marshaling:

// Pre-encode to UTF-8 bytes and back to string
var utf8Bytes = Encoding.UTF8.GetBytes(strValue);
utf8Evidence[kvp.Key] = Encoding.UTF8.GetString(utf8Bytes);

This preprocessing was implemented experimentally to test whether string encoding affects performance. Under controlled conditions with 5,000 detections:

  • Process() time reduced from 66.0ms to 26.0ms (2.5x faster)
  • Overall performance improved by 2.27x (0.0200 to 0.0088 ms/detection)
  • Process() percentage dropped from 66% to 59.1% of total execution time

While this preprocessing does show measurable improvement, it's not a practical solution because:

  • .NET strings remain UTF-16 internally regardless of preprocessing
  • Users would need to preprocess all their evidence data
  • The improvement suggests SWIG's string marshaling could be optimized
  • Java achieves even better performance without any preprocessing

The key finding is that 66% of execution time is spent in the SWIG-generated Process() call. Further analysis comparing with Java's performance reveals:

  • ~86% of the Process() time is string marshaling overhead (0.01135 ms out of 0.0132 ms per detection)
  • Only ~14% is actual native code execution (0.00185 ms out of 0.0132 ms per detection)
  • Java's efficient byte[] marshaling achieves near-native performance (0.00185 ms/detection)
  • .NET's string marshaling adds 6x more overhead than the actual native computation

This confirms that SWIG's .NET string marshaling is the primary bottleneck, not the native library performance.

Potential Solutions

Since the native code requires ASCII strings, the conversion is unavoidable. However, SWIG could generate more efficient bindings by:

  1. Adopt Java's approach for .NET: Generate methods that accept byte[] parameters

    public static extern void Evidence_AddFromBytes(HandleRef jarg1, byte[] key, byte[] value);

    This would allow developers to control when conversion happens and cache results.

  2. Generate overloads: Provide both string and byte[] versions

    public void Add(string key, string value); // Convenience method
    public void AddBytes(byte[] key, byte[] value); // Performance method
  3. Use unsafe code: Generate methods that accept pinned byte arrays or pointers to avoid repeated conversions

  4. Batch operations: Generate methods that accept arrays of key-value pairs to amortize marshaling overhead

Questions for SWIG Community

  1. Why do Java and .NET use different approaches? Java uses byte[] parameters while .NET uses string parameters for the same interface. Is this intentional?

  2. String Marshaling Optimization: Are there SWIG directives, typemaps, or configuration options that can make .NET string marshaling as efficient as Java's byte array approach?

  3. Custom Typemaps: Can we create custom typemaps for .NET that use byte[] parameters similar to Java's approach?

  4. UTF-8 Native Support: Are there plans to optimize .NET string marshaling in SWIG to avoid UTF-16→UTF-8 conversion overhead?

  5. Best Practices: What are the recommended approaches for high-performance string handling in SWIG .NET bindings when processing thousands of strings per second?

SWIG Configuration

  • Version: 4.0.2
  • Languages: C# (.NET 8.0) and Java (OpenJDK 21)
  • Native Library: C++ with extensive string processing
  • Use Case: High-throughput device detection (target: 500K+ operations/sec)
  • Evidence: Typically 10-20 string key-value pairs per detection

SWIG Interface Files

.NET SWIG Interface:

Java SWIG Interface:

Reproducible Test Case

Benchmark Code Locations

Required Data Files

Both benchmarks require:

  • TAC-HashV41.hash (device detection data file - or use 51Degrees-LiteV4.1.hash from Git LFS)
  • 20000 Evidence Records.yml (test evidence file)

These can be obtained from: https://github.com/51Degrees/device-detection-data
Note: The repository uses Git LFS, so make sure to run git lfs pull after cloning

Core Library Repositories

Expected Outcome

We're seeking guidance on achieving Java-level performance in .NET bindings through SWIG configuration rather than application-level workarounds. The 2.5x performance gap makes .NET unsuitable for high-throughput scenarios where Java excels.

Any insights on optimizing SWIG-generated .NET string marshaling would be greatly appreciated! We're happy to test patches or provide additional profiling data as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions