SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI

# SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI

## Problem Statement
We've identified and profiled a significant performance bottleneck in SWIG-generated .NET bindings compared to Java bindings for the same native C++ library. The .NET implementation is **2.5x slower** than Java despite both using identical native code.

## Performance Comparison
| Implementation | Detections/Second | Performance Gap |
|---------------|------------------|-----------------|
| **Java (JNI)** | 540,541 | Baseline |
| **.NET (P/Invoke)** | 212,745 | **2.5x slower** |
| **.NET (with UTF-8 workaround)** | ~357,000 | **1.5x slower** |

## Profiling Results

We instrumented the .NET code to measure time spent in different operations during device detection:

**Without UTF-8 Preprocessing:**
```
Total detections: 5,000
Total time: 100.0 ms

Time breakdown:
  Process():        66.0 ms (66.0%) ← SWIG-generated native call
  Other operations: 34.0 ms (34.0%)

Per-detection: 0.0200 ms/detection
```

**With UTF-8 Preprocessing Workaround:**
```
Total detections: 5,000  
Total time: 44.0 ms

Time breakdown:
  Process():        26.0 ms (59.1%) ← Same native call, 2.5x faster
  Other operations: 18.0 ms (40.9%)

Per-detection: 0.0088 ms/detection (2.27x improvement)
```

## Root Cause Analysis

### String Marshaling Overhead
**66% of .NET detection time** is spent in the SWIG-generated `Process()` call due to string marshaling.

The marshaling overhead occurs at the P/Invoke boundary when calling native methods like:
```csharp
[DllImport("FiftyOne.DeviceDetection.Hash.Engine.OnPremise.Native.dll")]
public static extern IntPtr EngineHashSwig_process__SWIG_1(HandleRef jarg1, string jarg2);
```

Every string parameter undergoes automatic marshaling by the .NET runtime:

**.NET (Generated P/Invoke):**
```csharp
[DllImport("Native.dll")]
public static extern void MapStringStringSwig_Add(HandleRef jarg1, string jarg2, string jarg3);
```
- UTF-16 to ASCII conversion happens for every string parameter (required by native code)
- This is fundamentally more expensive than Java's UTF-8 to ASCII conversion
- .NET runtime performs this conversion at every P/Invoke call
- The same strings are converted repeatedly in high-throughput scenarios

**Java (Generated JNI):**
```java
public final static native void Evidence_AddFromBytes(long jarg1, EvidenceBaseSwig jarg1_, byte[] jarg2, byte[] jarg4);
```
- Java strings are stored as UTF-8 internally (ASCII is a subset of UTF-8)
- Direct conversion from UTF-8 to ASCII without encoding overhead
- Byte arrays passed directly to native code with minimal marshaling cost

### Evidence Processing Comparison

**.NET Approach:**
```csharp
// String passed directly - UTF-16 to ASCII conversion overhead
relevantEvidence.Add(new KeyValuePair<string, string>(
    evidenceItem.Key,           // UTF-16 → ASCII conversion at P/Invoke
    evidenceItem.Value.ToString())); // UTF-16 → ASCII conversion at P/Invoke
```

**Java Approach:**
```java
// Efficient byte array conversion - UTF-8 to ASCII (minimal overhead)
relevantEvidence.addFromBytes(
    Swig.asBytes(evidenceItem.getKey()),     // UTF-8 → ASCII (straightforward)
    Swig.asBytes(evidenceItem.getValue().toString()));
```

## Our Current Workaround
We implemented a proof-of-concept UTF-8 preprocessing approach in our performance benchmarks to demonstrate the impact of string marshaling:

```csharp
// Pre-encode to UTF-8 bytes and back to string
var utf8Bytes = Encoding.UTF8.GetBytes(strValue);
utf8Evidence[kvp.Key] = Encoding.UTF8.GetString(utf8Bytes);
```

This preprocessing was implemented experimentally to test whether string encoding affects performance. Under controlled conditions with 5,000 detections:

- **Process() time reduced from 66.0ms to 26.0ms** (2.5x faster)
- **Overall performance improved by 2.27x** (0.0200 to 0.0088 ms/detection)
- **Process() percentage dropped from 66% to 59.1%** of total execution time

While this preprocessing does show measurable improvement, it's not a practical solution because:
- .NET strings remain UTF-16 internally regardless of preprocessing
- Users would need to preprocess all their evidence data
- The improvement suggests SWIG's string marshaling could be optimized
- Java achieves even better performance without any preprocessing

The key finding is that **66% of execution time is spent in the SWIG-generated Process() call**. Further analysis comparing with Java's performance reveals:

- **~86% of the Process() time is string marshaling overhead** (0.01135 ms out of 0.0132 ms per detection)
- **Only ~14% is actual native code execution** (0.00185 ms out of 0.0132 ms per detection)
- Java's efficient byte[] marshaling achieves near-native performance (0.00185 ms/detection)
- .NET's string marshaling adds 6x more overhead than the actual native computation

This confirms that SWIG's .NET string marshaling is the primary bottleneck, not the native library performance.

## Potential Solutions

Since the native code requires ASCII strings, the conversion is unavoidable. However, SWIG could generate more efficient bindings by:

1. **Adopt Java's approach for .NET**: Generate methods that accept `byte[]` parameters
   ```csharp
   public static extern void Evidence_AddFromBytes(HandleRef jarg1, byte[] key, byte[] value);
   ```
   This would allow developers to control when conversion happens and cache results.

2. **Generate overloads**: Provide both string and byte[] versions
   ```csharp
   public void Add(string key, string value); // Convenience method
   public void AddBytes(byte[] key, byte[] value); // Performance method
   ```

3. **Use unsafe code**: Generate methods that accept pinned byte arrays or pointers to avoid repeated conversions

4. **Batch operations**: Generate methods that accept arrays of key-value pairs to amortize marshaling overhead

## Questions for SWIG Community

1. **Why do Java and .NET use different approaches?** Java uses `byte[]` parameters while .NET uses `string` parameters for the same interface. Is this intentional?

2. **String Marshaling Optimization**: Are there SWIG directives, typemaps, or configuration options that can make .NET string marshaling as efficient as Java's byte array approach?

3. **Custom Typemaps**: Can we create custom typemaps for .NET that use `byte[]` parameters similar to Java's approach?

4. **UTF-8 Native Support**: Are there plans to optimize .NET string marshaling in SWIG to avoid UTF-16→UTF-8 conversion overhead?

5. **Best Practices**: What are the recommended approaches for high-performance string handling in SWIG .NET bindings when processing thousands of strings per second?

## SWIG Configuration
- **Version**: 4.0.2
- **Languages**: C# (.NET 8.0) and Java (OpenJDK 21)
- **Native Library**: C++ with extensive string processing
- **Use Case**: High-throughput device detection (target: 500K+ operations/sec)
- **Evidence**: Typically 10-20 string key-value pairs per detection

### SWIG Interface Files

**.NET SWIG Interface:**
- Main interface: https://github.com/51Degrees/device-detection-dotnet/blob/master/FiftyOne.DeviceDetection.Hash.Engine.OnPremise/hash_csharp.i
- Core interface: https://github.com/51Degrees/device-detection-dotnet/blob/master/FiftyOne.DeviceDetection.Hash.Engine.OnPremise/device-detection-cxx/src/hash/hash.i
- Generated P/Invoke: https://github.com/51Degrees/device-detection-dotnet/blob/master/FiftyOne.DeviceDetection.Hash.Engine.OnPremise/Interop/Swig/DeviceDetectionHashEngineModulePINVOKE.cs
- Wrapper classes: https://github.com/51Degrees/device-detection-dotnet/tree/master/FiftyOne.DeviceDetection.Hash.Engine.OnPremise/Interop/Swig

**Java SWIG Interface:**
- Main interface: https://github.com/51Degrees/device-detection-java/blob/master/device-detection.hash.engine.on-premise/src/main/cxx/hash_java.i
- Generated JNI: https://github.com/51Degrees/device-detection-java/blob/master/device-detection.hash.engine.on-premise/src/main/java/fiftyone/devicedetection/hash/engine/onpremise/interop/swig/DeviceDetectionHashEngineModuleJNI.java
- C++ wrapper: https://github.com/51Degrees/device-detection-java/blob/master/device-detection.hash.engine.on-premise/src/main/cxx/Java_Hash_Engine.cpp

## Reproducible Test Case

### Benchmark Code Locations
- **.NET Performance Benchmark**: 
  - https://github.com/51Degrees/device-detection-dotnet-examples/blob/master/Examples/OnPremise/Performance-Console/Program.cs
  - Run with: `cd Examples/OnPremise/Performance-Console && dotnet run`
  
- **Java Performance Benchmark**: 
  - https://github.com/51Degrees/device-detection-java-examples/blob/master/console/src/main/java/fiftyone/devicedetection/examples/console/PerformanceBenchmark.java
  - Run with: `cd console && mvn compile && mvn exec:java -Dexec.mainClass="fiftyone.devicedetection.examples.console.PerformanceBenchmark"`

### Required Data Files
Both benchmarks require:
- `TAC-HashV41.hash` (device detection data file - or use `51Degrees-LiteV4.1.hash` from Git LFS)
- `20000 Evidence Records.yml` (test evidence file)

These can be obtained from: https://github.com/51Degrees/device-detection-data
Note: The repository uses Git LFS, so make sure to run `git lfs pull` after cloning

### Core Library Repositories
- **.NET Implementation**: https://github.com/51Degrees/device-detection-dotnet
- **Java Implementation**: https://github.com/51Degrees/device-detection-java  
- **Native C++ Library**: https://github.com/51Degrees/device-detection-cxx

## Expected Outcome
We're seeking guidance on achieving Java-level performance in .NET bindings through SWIG configuration rather than application-level workarounds. The 2.5x performance gap makes .NET unsuitable for high-throughput scenarios where Java excels.

Any insights on optimizing SWIG-generated .NET string marshaling would be greatly appreciated! We're happy to test patches or provide additional profiling data as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI #524

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI

Problem Statement

Performance Comparison

Profiling Results

Root Cause Analysis

String Marshaling Overhead

Evidence Processing Comparison

Our Current Workaround

Potential Solutions

Questions for SWIG Community

SWIG Configuration

SWIG Interface Files

Reproducible Test Case

Benchmark Code Locations

Required Data Files

Core Library Repositories

Expected Outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation	Detections/Second	Performance Gap
Java (JNI)	540,541	Baseline
.NET (P/Invoke)	212,745	2.5x slower
.NET (with UTF-8 workaround)	~357,000	1.5x slower

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI #524

Description

SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI

Problem Statement

Performance Comparison

Profiling Results

Root Cause Analysis

String Marshaling Overhead

Evidence Processing Comparison

Our Current Workaround

Potential Solutions

Questions for SWIG Community

SWIG Configuration

SWIG Interface Files

Reproducible Test Case

Benchmark Code Locations

Required Data Files

Core Library Repositories

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions