SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI
Problem Statement
We've identified and profiled a significant performance bottleneck in SWIG-generated .NET bindings compared to Java bindings for the same native C++ library. The .NET implementation is 2.5x slower than Java despite both using identical native code.
Performance Comparison
| Implementation |
Detections/Second |
Performance Gap |
| Java (JNI) |
540,541 |
Baseline |
| .NET (P/Invoke) |
212,745 |
2.5x slower |
| .NET (with UTF-8 workaround) |
~357,000 |
1.5x slower |
Profiling Results
We instrumented the .NET code to measure time spent in different operations during device detection:
Without UTF-8 Preprocessing:
Total detections: 5,000
Total time: 100.0 ms
Time breakdown:
Process(): 66.0 ms (66.0%) ← SWIG-generated native call
Other operations: 34.0 ms (34.0%)
Per-detection: 0.0200 ms/detection
With UTF-8 Preprocessing Workaround:
Total detections: 5,000
Total time: 44.0 ms
Time breakdown:
Process(): 26.0 ms (59.1%) ← Same native call, 2.5x faster
Other operations: 18.0 ms (40.9%)
Per-detection: 0.0088 ms/detection (2.27x improvement)
Root Cause Analysis
String Marshaling Overhead
66% of .NET detection time is spent in the SWIG-generated Process() call due to string marshaling.
The marshaling overhead occurs at the P/Invoke boundary when calling native methods like:
[DllImport("FiftyOne.DeviceDetection.Hash.Engine.OnPremise.Native.dll")]
public static extern IntPtr EngineHashSwig_process__SWIG_1(HandleRef jarg1, string jarg2);
Every string parameter undergoes automatic marshaling by the .NET runtime:
.NET (Generated P/Invoke):
[DllImport("Native.dll")]
public static extern void MapStringStringSwig_Add(HandleRef jarg1, string jarg2, string jarg3);
- UTF-16 to ASCII conversion happens for every string parameter (required by native code)
- This is fundamentally more expensive than Java's UTF-8 to ASCII conversion
- .NET runtime performs this conversion at every P/Invoke call
- The same strings are converted repeatedly in high-throughput scenarios
Java (Generated JNI):
public final static native void Evidence_AddFromBytes(long jarg1, EvidenceBaseSwig jarg1_, byte[] jarg2, byte[] jarg4);
- Java strings are stored as UTF-8 internally (ASCII is a subset of UTF-8)
- Direct conversion from UTF-8 to ASCII without encoding overhead
- Byte arrays passed directly to native code with minimal marshaling cost
Evidence Processing Comparison
.NET Approach:
// String passed directly - UTF-16 to ASCII conversion overhead
relevantEvidence.Add(new KeyValuePair<string, string>(
evidenceItem.Key, // UTF-16 → ASCII conversion at P/Invoke
evidenceItem.Value.ToString())); // UTF-16 → ASCII conversion at P/Invoke
Java Approach:
// Efficient byte array conversion - UTF-8 to ASCII (minimal overhead)
relevantEvidence.addFromBytes(
Swig.asBytes(evidenceItem.getKey()), // UTF-8 → ASCII (straightforward)
Swig.asBytes(evidenceItem.getValue().toString()));
Our Current Workaround
We implemented a proof-of-concept UTF-8 preprocessing approach in our performance benchmarks to demonstrate the impact of string marshaling:
// Pre-encode to UTF-8 bytes and back to string
var utf8Bytes = Encoding.UTF8.GetBytes(strValue);
utf8Evidence[kvp.Key] = Encoding.UTF8.GetString(utf8Bytes);
This preprocessing was implemented experimentally to test whether string encoding affects performance. Under controlled conditions with 5,000 detections:
- Process() time reduced from 66.0ms to 26.0ms (2.5x faster)
- Overall performance improved by 2.27x (0.0200 to 0.0088 ms/detection)
- Process() percentage dropped from 66% to 59.1% of total execution time
While this preprocessing does show measurable improvement, it's not a practical solution because:
- .NET strings remain UTF-16 internally regardless of preprocessing
- Users would need to preprocess all their evidence data
- The improvement suggests SWIG's string marshaling could be optimized
- Java achieves even better performance without any preprocessing
The key finding is that 66% of execution time is spent in the SWIG-generated Process() call. Further analysis comparing with Java's performance reveals:
- ~86% of the Process() time is string marshaling overhead (0.01135 ms out of 0.0132 ms per detection)
- Only ~14% is actual native code execution (0.00185 ms out of 0.0132 ms per detection)
- Java's efficient byte[] marshaling achieves near-native performance (0.00185 ms/detection)
- .NET's string marshaling adds 6x more overhead than the actual native computation
This confirms that SWIG's .NET string marshaling is the primary bottleneck, not the native library performance.
Potential Solutions
Since the native code requires ASCII strings, the conversion is unavoidable. However, SWIG could generate more efficient bindings by:
-
Adopt Java's approach for .NET: Generate methods that accept byte[] parameters
public static extern void Evidence_AddFromBytes(HandleRef jarg1, byte[] key, byte[] value);
This would allow developers to control when conversion happens and cache results.
-
Generate overloads: Provide both string and byte[] versions
public void Add(string key, string value); // Convenience method
public void AddBytes(byte[] key, byte[] value); // Performance method
-
Use unsafe code: Generate methods that accept pinned byte arrays or pointers to avoid repeated conversions
-
Batch operations: Generate methods that accept arrays of key-value pairs to amortize marshaling overhead
Questions for SWIG Community
-
Why do Java and .NET use different approaches? Java uses byte[] parameters while .NET uses string parameters for the same interface. Is this intentional?
-
String Marshaling Optimization: Are there SWIG directives, typemaps, or configuration options that can make .NET string marshaling as efficient as Java's byte array approach?
-
Custom Typemaps: Can we create custom typemaps for .NET that use byte[] parameters similar to Java's approach?
-
UTF-8 Native Support: Are there plans to optimize .NET string marshaling in SWIG to avoid UTF-16→UTF-8 conversion overhead?
-
Best Practices: What are the recommended approaches for high-performance string handling in SWIG .NET bindings when processing thousands of strings per second?
SWIG Configuration
- Version: 4.0.2
- Languages: C# (.NET 8.0) and Java (OpenJDK 21)
- Native Library: C++ with extensive string processing
- Use Case: High-throughput device detection (target: 500K+ operations/sec)
- Evidence: Typically 10-20 string key-value pairs per detection
SWIG Interface Files
.NET SWIG Interface:
Java SWIG Interface:
Reproducible Test Case
Benchmark Code Locations
Required Data Files
Both benchmarks require:
TAC-HashV41.hash (device detection data file - or use 51Degrees-LiteV4.1.hash from Git LFS)
20000 Evidence Records.yml (test evidence file)
These can be obtained from: https://github.com/51Degrees/device-detection-data
Note: The repository uses Git LFS, so make sure to run git lfs pull after cloning
Core Library Repositories
Expected Outcome
We're seeking guidance on achieving Java-level performance in .NET bindings through SWIG configuration rather than application-level workarounds. The 2.5x performance gap makes .NET unsuitable for high-throughput scenarios where Java excels.
Any insights on optimizing SWIG-generated .NET string marshaling would be greatly appreciated! We're happy to test patches or provide additional profiling data as needed.
SWIG Performance Issue: .NET String Marshaling Creates 2.5x Performance Gap vs Java JNI
Problem Statement
We've identified and profiled a significant performance bottleneck in SWIG-generated .NET bindings compared to Java bindings for the same native C++ library. The .NET implementation is 2.5x slower than Java despite both using identical native code.
Performance Comparison
Profiling Results
We instrumented the .NET code to measure time spent in different operations during device detection:
Without UTF-8 Preprocessing:
With UTF-8 Preprocessing Workaround:
Root Cause Analysis
String Marshaling Overhead
66% of .NET detection time is spent in the SWIG-generated
Process()call due to string marshaling.The marshaling overhead occurs at the P/Invoke boundary when calling native methods like:
Every string parameter undergoes automatic marshaling by the .NET runtime:
.NET (Generated P/Invoke):
Java (Generated JNI):
Evidence Processing Comparison
.NET Approach:
Java Approach:
Our Current Workaround
We implemented a proof-of-concept UTF-8 preprocessing approach in our performance benchmarks to demonstrate the impact of string marshaling:
This preprocessing was implemented experimentally to test whether string encoding affects performance. Under controlled conditions with 5,000 detections:
While this preprocessing does show measurable improvement, it's not a practical solution because:
The key finding is that 66% of execution time is spent in the SWIG-generated Process() call. Further analysis comparing with Java's performance reveals:
This confirms that SWIG's .NET string marshaling is the primary bottleneck, not the native library performance.
Potential Solutions
Since the native code requires ASCII strings, the conversion is unavoidable. However, SWIG could generate more efficient bindings by:
Adopt Java's approach for .NET: Generate methods that accept
byte[]parametersThis would allow developers to control when conversion happens and cache results.
Generate overloads: Provide both string and byte[] versions
Use unsafe code: Generate methods that accept pinned byte arrays or pointers to avoid repeated conversions
Batch operations: Generate methods that accept arrays of key-value pairs to amortize marshaling overhead
Questions for SWIG Community
Why do Java and .NET use different approaches? Java uses
byte[]parameters while .NET usesstringparameters for the same interface. Is this intentional?String Marshaling Optimization: Are there SWIG directives, typemaps, or configuration options that can make .NET string marshaling as efficient as Java's byte array approach?
Custom Typemaps: Can we create custom typemaps for .NET that use
byte[]parameters similar to Java's approach?UTF-8 Native Support: Are there plans to optimize .NET string marshaling in SWIG to avoid UTF-16→UTF-8 conversion overhead?
Best Practices: What are the recommended approaches for high-performance string handling in SWIG .NET bindings when processing thousands of strings per second?
SWIG Configuration
SWIG Interface Files
.NET SWIG Interface:
Java SWIG Interface:
Reproducible Test Case
Benchmark Code Locations
.NET Performance Benchmark:
cd Examples/OnPremise/Performance-Console && dotnet runJava Performance Benchmark:
cd console && mvn compile && mvn exec:java -Dexec.mainClass="fiftyone.devicedetection.examples.console.PerformanceBenchmark"Required Data Files
Both benchmarks require:
TAC-HashV41.hash(device detection data file - or use51Degrees-LiteV4.1.hashfrom Git LFS)20000 Evidence Records.yml(test evidence file)These can be obtained from: https://github.com/51Degrees/device-detection-data
Note: The repository uses Git LFS, so make sure to run
git lfs pullafter cloningCore Library Repositories
Expected Outcome
We're seeking guidance on achieving Java-level performance in .NET bindings through SWIG configuration rather than application-level workarounds. The 2.5x performance gap makes .NET unsuitable for high-throughput scenarios where Java excels.
Any insights on optimizing SWIG-generated .NET string marshaling would be greatly appreciated! We're happy to test patches or provide additional profiling data as needed.