From c7cf48a38f099abb4e2909d36cf43f8033c05f0f Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Wed, 15 Apr 2020 09:56:16 -0700 Subject: [PATCH 01/11] Add WASM RFC Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 559 ++++++++++++++++++++++++ 1 file changed, 559 insertions(+) create mode 100644 rfcs/2020-04-15-wasm-foreign-modules.md diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md new file mode 100644 index 0000000000000..a24b91f4baa6d --- /dev/null +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -0,0 +1,559 @@ +# RFC - - WASM Foreign Module Support + +Introduce Vector's foreign module support, a way for users to add custom functionality to Vector. This RFC introduces +WASM modules specifically, but we envision the possibility of other implementations (these will be a different RFC). +This RFC introduces Transforms and will be later amended with Source and Sink functionality. + + +## Motivation + +There is a plurality of reasons for us to wish for flexible, practical foreign module support to Vector. These may also + be regarded as 'Plugins' or 'Extensions.' WASM is the most versatile and obvious choice for a first implementation as + it can be targetted by a number of languages, and is intended to be OS, arch, and platform independent. + +Since the concept of a foreign module is very abstract, this section will instead discuss a number of seemingly +unrelated topics, all asking for different features. While foreign module support does not resolve all these problems, +it gives us or our users a path to a solution. + + +### Protobufs & other custom codecs + +We noted in discussions with several current and prospective users that custom protobuf support in the form of an +`encoding` option or a transform. + +Protobufs in their `proto` form aren't usable by the prevailing Rust Protobuf libraries in a dynamic way. Both +`prost` and `rust-protobuf` focus almost solely on compiling support for specific Protobufs **into** a Rust program. +While a `serde-protobuf` crate exists, we found it to be an order of magnitude slower than the other two options, we +also noted it lacked serialization support. + +While evaluating the possibility of adding serialization we noted that any implementation that operated dynamically +would be quite slow. Notably, it would **it would seriously impact overall event processing infrastructure** if Vector +was responsible for a slowdown. These organizations are choosing to use protobufs usually for speed and/or compatibility +guarantees. + +We should enable and encourage these goals, our users should be empowered to capture their events, and not worried about +processing load. So the question became: *How do we let users opt-in to custom protobuf support efficiently and without +great pain?* + + +### Increasing build complexity and our slow build times + +Recently, we've been discussing the trade offs associated with different code modularization techniques. We currently +make fairly heavy use of the Rust feature flag system. + +This is great because it enables us to build in multi-platform support as well as specialized (eg only 1 source) builds. +We also noted builds with only 1 or two feature flags are **significantly** faster than builds with all features +supported. + +One downside is our current strategy can require tests (See the `make check-features` task) to ensure their correctness. +The other is that we sometimes need to spend a lot of time weaving through `cfg(feature = "..")` flags. + +Having some solution to cleanly package independent modules of functionality would allow us to have a core Vector which +builds fast while keeping modularity features. + + +### Language runtime support + +Vector currently bundles a [`lua` transform](https://vector.dev/docs/reference/transforms/lua/), and there is ongoing +work on a [`javascript` transform](https://github.com/timberio/vector/issues/667). + +Currently, each language requires as separate runtime, and possibly separate integration subtleties like needing to call +a [GC](https://github.com/timberio/vector/pull/1990). It would be highly convenient if we could focus our efforts on one +consistent API for all languages, and spend this time saved providing a better, faster, more safe experience. + + +### A Plugin System + +As Vector grows and supports more integrations, our binary size and build complexity will inevitable grow. In informal +discussions we've previously noted the [module system of Terraform](https://www.terraform.io/docs/commands/init.html). +This kind of architecture can help us simplify the codebase, reduce binary sizes, formalize our APIs, and open Vector up +to a third-party module ecosystem. + +Terraform typically runs on CIs and developer machines, making portability valuable. For Terraform, users can follow the +[guide](https://www.terraform.io/docs/extend/writing-custom-providers.html) to write a Go-based provider, which they can +then distribute as a portable binary. Notably: This means either distributing an unoptimized binary, distributing a lot +of optimized binaries, or requiring folks build their own optimized binaries. Terraform doesn't care much about speed, +so unoptimized binaries are acceptable. + +Vector is in a slightly different position than Terraform though! Vector runs primarily in servers and even end user +machines. We can't expect users to have build tooling installed on their servers (it's a security risk!) and we +definitely can't expect it on an end-user machines. Most people just aren't that interested in computers. Vector has +different performance needs, too. While folk's aren't generally wanting to execute terraform providers hundreds of +thousands (or millions) of times per second, they are absolutely doing that with Vector. + +When we're processing the firehose of events originating from a modern infrastructure every millisecond counts. Vector +needs a way to ship portable, *optimizable* modules if we ever hope of making this a reality. + + +### Simplifying common basic transforms + +We would like to avoid asking users to chain together a number of transforms to accomplish simple tasks. This is +evidenced by issues like [#750](https://github.com/timberio/vector/issues/750), +[#1653](https://github.com/timberio/vector/issues/1653), and [#1926](https://github.com/timberio/vector/issues/1926). +Having to, for example, use two transforms just to add 1 field and remove another is very verbose. + +We noted that the existing lua runtime was able to accomplish these tasks quite elegantly, however it was an order of +magnitude slower than a native transform. + +> TODO: Proof + +Users shouldn't pay a high price just for a few lines saved in a configuration file. They shouldn't feel frustration +when building these kinds of pipelines either. + + +### Exploring DSL driven pipelines + +We've discussed various refinements to our pipelines such as +[Pipelines Proposal V2](https://github.com/timberio/vector/issues/1679) and +[Proposal: Compose Transform](https://github.com/timberio/vector/issues/1653). Some of these discussions have been about +improving our TOML syntax, others about new syntax entirely, and others about how we could let users write in the +language of their choice. + +The path forward is unclear, but if we choose to adopt something new we must focus on providing a good user experience +as well as performance. Having a fast, simple, portable compile target could make this an easier effort. + + +### Ecosystem Harmony with Tremor + +We exist in an ecosystem, [Tremor](https://www.tremor.rs/) exists and we'd really love to be able to work in harmony +somehow. While both tools process events, Vector focuses on acting as a host-level or container-level last/first mile +router, Tremor focuses on servicing demanding workloads from it's position as an infrastructure-wide service. + +What if we could provide users of Tremor and Vector with some sort of familiar shared experience? What if we could share +functionality? What kind of framework could satisfy both our needs? There are so many questions! + +> TODO: We need to talk to them. + + +## Prior Art + +We have an existing `lua` runtime existing as transform, and there is currently an +[issue](https://github.com/timberio/vector/issues/667) regarding an eventual `javascript` transform. Neither of these +features reach the scope of this proposal, still, it is valuable to learn. + +While evaluating the `lua` transform to solve the Protobuf problem described in Motivations, we benchmarked an +implementation of `add_fields` that was already performing at approximately the same performance as the `serde-protobuf` +solution. + +We also noted that the existing `lua` v1 transform was quite simple. While this makes for nice usability, it means +things like on-start initializations aren't possible. In [v2](https://github.com/timberio/vector/pull/2000) `lua` has +learnt many new tricks, and this WASM support aims to match them. + +We also noted the capabilities and speed of WASM (particularly WASI) mean we could eventually support things like +sockets and files. + +Indeed, it is possible we will be able to let Vector build and run Lua code as a wasm module in the future. + + +## Guide-level Proposal + +Vector contains a WebAssembly (WASM) engine which allows you to write custom functionality for Vector. You might already +be familiar with the [Javascript](https://github.com/timberio/vector/pull/721) or +[Lua](https://vector.dev/docs/reference/transforms/lua/) transforms, WebAssembly is a bit different than that. + +While it enables more functionality and possibly better speeds, it also introduces complication to your system. + + +### When to build your own module + +Using the WASM engine you can write your own sources, transforms, sinks, codecs in any language that compiles down to a +`.wasm` file. Vector can then compile this file down to an optimized, platform specific module which it can use. This +allows Vector to work with internal or uncommon protocols or services, support new language transforms, or just write a +special transform to do *exactly* what you want. With the WASM engine, Vector puts you in command. + +There is a trade-off though! You'll need to embolden yourself, as WASM is a fledgling technology. When it first loads a +WASM file, Vector needs to spend some time building an optimized module before it can load it. You'll also be +responsible for the safety and correctness of your code, if your module crashes on an event, Vector will re-initialize +the module and try again on the next event. This should be modelled roughly off +[Erlang's OTP](https://erlang.org/doc/design_principles/sup_princ.html). + +Here's some examples of when a WASM module is a good fit: + +* You need to support a specific protobuf type. +* You have a source or sink which is not currently supported by Vector. +* You want to write a complex transform in your favorite language. + +Here's some examples of where WASM probably isn't right for you: + +* You only need basic functionality already offered by Vector. +* Your desired language doesn't support WASM as a compile target. +* You are using Vector on a non-Linux X86-64 bit compatible target. (Lucet, our Engine, only supports Linux at this +current time. Work on other operating systems is ongoing!) +* You need a battle-proven system, WASM support in Vector is young! + + +### How Vector modules work + +Vector modules are `.wasm` files (WASI or not) exposing a predefined interface. We provide a `foreign_modules::guest` +module for Rust projects located in `lib/foreign-modules` of the source tree, and we will be providing bindings for +other languages as we see user demand. + +Modules are sandboxed and can only interact with Vector over the Foreign Function Interface (FFI) which Vector exposes. +For each instance of a module, Vector holds some context. For example, when a transform module processes an event, the +context of the module contains an `Event` type. The guest can use calls like `get("foo")` and Vector perform the action +and return the result to the guest. + +A guest module is not able to access the memory of the Vector instance, but Vector is free to modify the guest memory. +For example, behind the scenes of a `get` call, the Guest must allocate some memory for Vector to write the result into. + +Due to the current semantics of the `vector::Event` type, passing data between WASM modules and the Vector host involves +serialization to JSON. We're investigating ways to speed this up. + + * It may be possible to add a `repr(C)` flag to Event. + * It may be worthwhile to explore an `Event` refactor. + * Other serialization options may be much faster but less compatible. + + +### Initialization + + +Regardless of whether of not the module uses WASI, or has access to a language specific binding, it can interact with +Vector. Vector works by invoking the modules public interface. First, as soon as the module is loaded, `register` is +called once, globally, per module. In this call, the module can configure how Vector sees it and talks to it. During +this time, the module can do any one time setup it needs. (Eg. Opening a socket, a file). + +If using the Vector guest API this can be done via: + +```rust +use foreign_modules::{Registration, roles::Sink}; + +#[no_mangle] +pub extern "C" fn init() -> *mut Registration { + &mut Registration::transform() + .set_wasi(true) as *mut Registration +} +``` + +What happens next depends on which type of module it is. + +### Module types + +* `Transform`: Modules of this role can be used as a transform `type`. + + ```rust + use foreign_modules::hostcall; + + // TODO: Add better FFI error handling! + #[no_mangle] + pub extern "C" fn process() -> usize { + if let Some(value) = hostcall::get("field").unwrap() { + hostcall::insert("field", value).unwrap(); + } + 0 + } + ``` + +> **TODO:** Sources, sinks, and codecs have no POC or formal specification, below is a work in progress. + +* `Source`: Modules of this role can be used as a source `type`. + + ```rust + use foreign_modules::hostcall; + + #[no_mangle] + // TODO: Add better FFI error handling! + pub extern "C" fn start() { + // TODO + } + ``` + +* `Sink`: Modules of this role can be used as a sink `type`. + + ```rust + use foreign_modules::hostcall; + + #[no_mangle] + // TODO: Add better FFI error handling! + pub extern "C" fn process() { + if let Some(value) = hostcall::get("field").unwrap() { + hostcall::insert("field", value).unwrap(); + } + } + ``` + +* `Codec`: + + ```rust + use foreign_modules::hostcall; + // TODO: Add better FFI error handling! + #[no_mangle] + pub extern "C" fn process() { + if let Some(value) = hostcall::get("field").unwrap() { + hostcall::insert("field", value).unwrap(); + } + } + ``` + + +### Your first module (Rust) + +To create your first module, start with a working [Rust toolchain](https://rustup.rs/) add the `wasm32-wasi` +toolchain's `nightly` version: + +```bash +rustup target add wasm32-wasi --toolchain nightly +``` + +Then, create a module: + +```bash +cargo +nightly init --lib echo +``` + +In your `Cargo.toml` fill in: + +```toml +[lib] +crate-type = ["cdylib"] + +[dependencies] +foreign_modules = { path = "/path/to/your/clone/of/vector/lib/foreign-modules"} +serde = { version = "1", features = ["derive"] } +``` + +Next, in your `lib.rs` file: + +```rust +use foreign_modules::{Registration, hostcall}; +use serde::{Serialize, Deserialize}; + +#[no_mangle] +pub extern "C" fn init() -> *mut Registration { + &mut Registration::transform() + .set_wasi(true) as *mut Registration +} + +// TODO: Add better FFI error handling! +#[no_mangle] +pub extern "C" fn process() -> usize { + let result = hostcall::get("message"); + + match result.unwrap() { + Some(value) => { + hostcall::insert("echo", value); + 0 + }, + None => { + 0 + }, + } +} + +#[no_mangle] +pub extern "C" fn shutdown() { + (); +} +``` + +Now to build it: + +```bash +cargo +nightly build --target wasm32-wasi --release +``` + +In your `target/wasm-wasi/release/` verify that the `echo.wasm` file exists. Next edit your Vector `config.toml`: + +```toml +data_dir = "/var/lib/vector/" +dns_servers = [] + +[sources.source0] +max_length = 102400 +type = "stdin" + +[transforms.demo] +inputs = ["source0"] +type = "wasm" +module = "target/wasm32-wasi/release/echo.wasm" + +[sinks.sink0] +healthcheck = true +inputs = ["demo"] +type = "console" +encoding = "json" +buffer.type = "memory" +buffer.max_events = 500 +buffer.when_full = "block" + +[[tests]] + name = "demo-tester" + [tests.input] + insert_at = "demo" + type = "log" + [tests.input.log_fields] + "message" = "foo" + [[tests.outputs]] + extract_from = "demo" + [[tests.outputs.conditions]] + "echo.equals" = "foo" +``` + +Now try `vector test config.toml` and Vector will go ahead and build an optimized `cache/echo.so` artifact then run it +for a unit test. + +```bash +$ vector test config.toml +Running config.toml tests +test config.toml: demo-tester ... passed +``` + +Congratulations, you're ready for a new frontier! + +### Resources & templates + +* [Rustinomicon](https://doc.rust-lang.org/nomicon/) +* [Lucet Docs](https://bytecodealliance.github.io/lucet/) +* [WASI](https://wasi.dev/) + + +## Sales Pitch + +Integrating a flexible, practical, simple WASM module support means we can: + +* Expose one common interface to any existing and future languages supported by WASM. +* Allow large-scale users to use custom encodings. +* Support and advertise a plugin system. +* Empower us to modularize Vector's components and increase maintainability. +* Share a common plugin format with Tremor. +* In the future, support something like AssemblyScript as a first class language. + + +## Drawbacks + +WASM is still **very young**. We will be early adopters and will pay a high adoption price. Additionally, we may find +ourselves using unstable functionality of some crates, including a non-trivial amount of `unsafe`, which may lead to +unforseen consequences. + +Our **binary size will bloom**. We currently (in unoptimized POC implementations) see our binary size grow by over 60 +MB. The Lucet team is aware of this issue, and in some preliminary discussions this binary size increase is unexpected and +we expect it to improve. If this does not improve, we can consider adopting `wasmtime`, lucet's sister JIT implementation +which has a smaller binary footprint. + +Our team will need to diagnose, support, and evolve an arguably rather **complex API** for WASM modules to work +correctly. This will mean we'll have to think of our system very abstractly, and has many questions around API stability +and compatibility. + +This feature is **advanced and may invoke user confusion**. Novice or casual users may not fully grasp the subtleties +and limitations of WASM. We must practice caution around the user experience of this feature to ensure novices and +advanced users can understand what is happening in a Vector pipeline utilizing WASM modules. + + +## Outstanding Questions + +Since this is a broad reaching feature with a number of green sky ideas, we should discuss these questions: + + +### Agree on breadth and depth of implementation + +We should support modules so broadly? + +* As sinks? For this we should consider how we handle batching, partitioning, etc, more. + - **Temporary Conclusion:** Held off on deciding. Some interest. +* As sources? This requires thinking about how we can do event sourcing. + - **Temporary Conclusion:** Held off on deciding. Some interest. +* What about the idea of codecs being separate? + - **Temporary Conclusion:** Held off on deciding. Some interest. +* This RFC makes some provisions for a future Event refactor, is it possible that this might happen? + - **Temporary Conclusion:** Held off on deciding. Some interest. + + +### Consider supporting a blessed compiler + +Our original discussions included the idea of a blessed compiler and a UX similar to our existing `lua` or `javascript` +transform. Indeed, our initial implementation should include just a transform, as it's by far the easiest place to +introduce this feature. + +During investigation of the initial POC we determined that our most desirable "blessed" language would be +AssemblyScript, unfortunately, it's currently not able to run inside Lucet. Work is ongoing and we expect it to be +possible soon. + +**Temporary Conclusion:** AssemblyScript and other languages we evaluated did not provide a way to package themselves as +WASM modules, making it hard to integrate with them. We are planning to revisit this at a later date when the ecosystem +is more mature. + + +### Consider API tradeoffs + +We should consider if we are satisfied with the idea of hostcalls being used for `get` and other API. We could also let +the host call the Guest allocate function and then pass it a C String pointer to let it work on. This, however, requires +serializing and deserializing the entire Event each time, which is a huge performance bottleneck. + +We could also consider adopting a new strategy for our `event` type. However, that work is outside the scope of this +RFC. The current structure of the `Event` type is already discussed in +[#1891](https://github.com/timberio/vector/issues/1891). + +We should also consider if we want to change how we handle codecs, since it is likely that WASM module use cases will +include wanting to add codec support to already existing sources. + +**Temporary Conclusion:** We will adopt hostcalls for the time being to allow for a minimal POC. We would like to +investigate these choices later, and we will make a decision before stabilizing WASM modules. + + +### Consider Packaging Approach + +We will need to expose some Vector guest API to users. This will require ABI compatibility with Vector, so it makes +sense to keep it in the same repository (and maybe the same crate?) as Vector. We can consider either making Vector able +to be built by WASI targets (more feature flags) or using a workspace and having the guest API as a workspace member we +publish. + +A couple good questions to consider: + +* Do we one day want to support a cross platform `cargo install vector` option for installing a Vector binary? +* Should a user import `vector::...` to use the Vector guest API in their Rust module? +* Vector's internal API is largely undocumented and kind of a mess. Should we hide/clean the irrelevant stuff? + +**Conclusion:** We are including a workspace member `foreign_modules` in the `lib/foreign-modules` directory. +This may be renamed and/or published in the future. + + +### Consider observability + +How can we let users see what's happening in WASM modules? Can we use tracing somehow? Lucet supports tracing, perhaps +we could hook in somehow? + +**Conclusion:** Preliminary results show we may be able to allow guests to write with the current tracing subscriber. + + +### Consider Platform Support + +There is ongoing work to support more platforms in Lucet, here are how things stand today: + + +Platform support: + +* [x] Linux (x86_64) +* [ ] Linux (ARMv7) (Likely never) +* [ ] Linux (Aarch64) (Probably coming, ARM contributing) +* [ ] Mac (x86_64) + + https://github.com/bytecodealliance/lucet/pull/437 +* [ ] Windows (x86_64) + + https://github.com/bytecodealliance/wasmtime/pull/1216 + + https://github.com/bytecodealliance/lucet/issues/442 + + https://github.com/bytecodealliance/lucet/pull/437 +* [ ] FreeBSD (x86_64) + + https://github.com/bytecodealliance/lucet/pull/419 + + https://github.com/bytecodealliance/lucet/pull/437 + +**Temporary Conclusion:** The Lucet team desires to support other platforms in the near future. Since the lion's share +of Vector usage is on X86_64 Linux, and this platform is already supported, we decided to adopt Lucet for now. We could +consider adopting `wasmtime` in the future if lucet does not eventually reach it in terms of platform parity. + + +## Plan of attack + +Incremental steps that execute this change. + +- v0.1: (Done) This draft forms the groundwork for this RFC, and demos a POC of how a theoretical user could use this + for a protobuf decoding transform, and permitting the first `wasm` transform test to pass. +- v0.2: The POC of the initial transform has it's v1 +- v0.3: We have benchmarks of the initial transform POC. +- v0.4: Compile artifact caching is implemented, so Vector doesn't unnecessarily recompile wasm modules. +- v0.5: Guest API expanded to support majority of Event API. +- v0.6: We talk to a couple known users who would be interested in this feature and let them test it out. +- v0.7: This RFC is amended to include Sinks and Sources information. +- v0.8: This RFC is amended to include information about possible codec changes. +- v0.9: Source and Sink implementation POCs made. +- v0.10: Final APIs specced and tested. Source, Sink, Transform POCs are in tree and running as tests/benches. +- v1.0: This RFC is complete and we announce with the above guide. + +Note: This can be filled out during the review process. From 40de4c52c58580e96ee73fa32f1133e1cbaea16e Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Wed, 15 Apr 2020 10:43:02 -0700 Subject: [PATCH 02/11] Fix lints Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 50 ++++++++++++------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index a24b91f4baa6d..181e3a88d3bf5 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -199,9 +199,9 @@ For example, behind the scenes of a `get` call, the Guest must allocate some mem Due to the current semantics of the `vector::Event` type, passing data between WASM modules and the Vector host involves serialization to JSON. We're investigating ways to speed this up. - * It may be possible to add a `repr(C)` flag to Event. - * It may be worthwhile to explore an `Event` refactor. - * Other serialization options may be much faster but less compatible. +* It may be possible to add a `repr(C)` flag to Event. +* It may be worthwhile to explore an `Event` refactor. +* Other serialization options may be much faster but less compatible. ### Initialization @@ -448,13 +448,13 @@ Since this is a broad reaching feature with a number of green sky ideas, we shou We should support modules so broadly? * As sinks? For this we should consider how we handle batching, partitioning, etc, more. - - **Temporary Conclusion:** Held off on deciding. Some interest. + * **Temporary Conclusion:** Held off on deciding. Some interest. * As sources? This requires thinking about how we can do event sourcing. - - **Temporary Conclusion:** Held off on deciding. Some interest. + * **Temporary Conclusion:** Held off on deciding. Some interest. * What about the idea of codecs being separate? - - **Temporary Conclusion:** Held off on deciding. Some interest. + * **Temporary Conclusion:** Held off on deciding. Some interest. * This RFC makes some provisions for a future Event refactor, is it possible that this might happen? - - **Temporary Conclusion:** Held off on deciding. Some interest. + * **Temporary Conclusion:** Held off on deciding. Some interest. ### Consider supporting a blessed compiler @@ -486,7 +486,7 @@ We should also consider if we want to change how we handle codecs, since it is l include wanting to add codec support to already existing sources. **Temporary Conclusion:** We will adopt hostcalls for the time being to allow for a minimal POC. We would like to -investigate these choices later, and we will make a decision before stabilizing WASM modules. +investigate these choices lat er, and we will make a decision before stabilizing WASM modules. ### Consider Packaging Approach @@ -525,14 +525,14 @@ Platform support: * [ ] Linux (ARMv7) (Likely never) * [ ] Linux (Aarch64) (Probably coming, ARM contributing) * [ ] Mac (x86_64) - + https://github.com/bytecodealliance/lucet/pull/437 + * https://github.com/bytecodealliance/lucet/pull/437 * [ ] Windows (x86_64) - + https://github.com/bytecodealliance/wasmtime/pull/1216 - + https://github.com/bytecodealliance/lucet/issues/442 - + https://github.com/bytecodealliance/lucet/pull/437 + * https://github.com/bytecodealliance/wasmtime/pull/1216 + * https://github.com/bytecodealliance/lucet/issues/442 + * https://github.com/bytecodealliance/lucet/pull/437 * [ ] FreeBSD (x86_64) - + https://github.com/bytecodealliance/lucet/pull/419 - + https://github.com/bytecodealliance/lucet/pull/437 + * https://github.com/bytecodealliance/lucet/pull/419 + * https://github.com/bytecodealliance/lucet/pull/437 **Temporary Conclusion:** The Lucet team desires to support other platforms in the near future. Since the lion's share of Vector usage is on X86_64 Linux, and this platform is already supported, we decided to adopt Lucet for now. We could @@ -543,17 +543,17 @@ consider adopting `wasmtime` in the future if lucet does not eventually reach it Incremental steps that execute this change. -- v0.1: (Done) This draft forms the groundwork for this RFC, and demos a POC of how a theoretical user could use this +* v0.1: (Done) This draft forms the groundwork for this RFC, and demos a POC of how a theoretical user could use this for a protobuf decoding transform, and permitting the first `wasm` transform test to pass. -- v0.2: The POC of the initial transform has it's v1 -- v0.3: We have benchmarks of the initial transform POC. -- v0.4: Compile artifact caching is implemented, so Vector doesn't unnecessarily recompile wasm modules. -- v0.5: Guest API expanded to support majority of Event API. -- v0.6: We talk to a couple known users who would be interested in this feature and let them test it out. -- v0.7: This RFC is amended to include Sinks and Sources information. -- v0.8: This RFC is amended to include information about possible codec changes. -- v0.9: Source and Sink implementation POCs made. -- v0.10: Final APIs specced and tested. Source, Sink, Transform POCs are in tree and running as tests/benches. -- v1.0: This RFC is complete and we announce with the above guide. +* v0.2: The POC of the initial transform has it's v1 +* v0.3: We have benchmarks of the initial transform POC. +* v0.4: Compile artifact caching is implemented, so Vector doesn't unnecessarily recompile wasm modules. +* v0.5: Guest API expanded to support majority of Event API. +* v0.6: We talk to a couple known users who would be interested in this feature and let them test it out. +* v0.7: This RFC is amended to include Sinks and Sources information. +* v0.8: This RFC is amended to include information about possible codec changes. +* v0.9: Source and Sink implementation POCs made. +* v0.10: Final APIs specced and tested. Source, Sink, Transform POCs are in tree and running as tests/benches. +* v1.0: This RFC is complete and we announce with the above guide. Note: This can be filled out during the review process. From 025b5f673cf4622ce5d211600ddc46a3a9dd5c11 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Wed, 15 Apr 2020 10:52:15 -0700 Subject: [PATCH 03/11] Add title Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index 181e3a88d3bf5..58bd95a047b2c 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -1,4 +1,4 @@ -# RFC - - WASM Foreign Module Support +# RFC #2341 - 2020-04-15 - WASM Foreign Module Support Introduce Vector's foreign module support, a way for users to add custom functionality to Vector. This RFC introduces WASM modules specifically, but we envision the possibility of other implementations (these will be a different RFC). From 1d20b54aab558429c19d83ba6695f50fa31ce4d2 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Mon, 4 May 2020 14:48:37 -0700 Subject: [PATCH 04/11] Some rewording Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 26 ++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index 58bd95a047b2c..72b01ce510ca3 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -1,18 +1,18 @@ -# RFC #2341 - 2020-04-15 - WASM Foreign Module Support +# RFC #2341 - 2020-04-15 - WASM Plugin Support -Introduce Vector's foreign module support, a way for users to add custom functionality to Vector. This RFC introduces -WASM modules specifically, but we envision the possibility of other implementations (these will be a different RFC). +Introduce plugin support to Vector, providing a way for users to add custom functionality to Vector. This RFC introduces +WASM plugins specifically, but we envision the possibility of other implementations that may reuse ideas from this RFC. This RFC introduces Transforms and will be later amended with Source and Sink functionality. ## Motivation -There is a plurality of reasons for us to wish for flexible, practical foreign module support to Vector. These may also - be regarded as 'Plugins' or 'Extensions.' WASM is the most versatile and obvious choice for a first implementation as - it can be targetted by a number of languages, and is intended to be OS, arch, and platform independent. +There is a plurality of reasons for us to wish for flexible, practical plugin support to Vector. These may also +be regarded as 'Foreign Modules' or 'Extensions.' WASM is the most versatile and obvious choice for an implementation +as it can be targeted by a number of languages and is intended to be OS, arch, and platform independent. -Since the concept of a foreign module is very abstract, this section will instead discuss a number of seemingly -unrelated topics, all asking for different features. While foreign module support does not resolve all these problems, +Since the concept of a plugin is a abstract, this section will instead discuss a number of seemingly +unrelated topics, all asking for different features. While plugin support does not resolve all these problems, it gives us or our users a path to a solution. @@ -23,8 +23,8 @@ We noted in discussions with several current and prospective users that custom p Protobufs in their `proto` form aren't usable by the prevailing Rust Protobuf libraries in a dynamic way. Both `prost` and `rust-protobuf` focus almost solely on compiling support for specific Protobufs **into** a Rust program. -While a `serde-protobuf` crate exists, we found it to be an order of magnitude slower than the other two options, we -also noted it lacked serialization support. +While a `serde-protobuf` crate exists and can do dynamic decoding, we found it to be an order of magnitude slower than +the other two options, we also noted it lacked serialization support. While evaluating the possibility of adding serialization we noted that any implementation that operated dynamically would be quite slow. Notably, it would **it would seriously impact overall event processing infrastructure** if Vector @@ -59,10 +59,10 @@ work on a [`javascript` transform](https://github.com/timberio/vector/issues/667 Currently, each language requires as separate runtime, and possibly separate integration subtleties like needing to call a [GC](https://github.com/timberio/vector/pull/1990). It would be highly convenient if we could focus our efforts on one -consistent API for all languages, and spend this time saved providing a better, faster, more safe experience. +consistent API for all languages, and spend this time saved providing a better, faster, safer experience. -### A Plugin System +### Modularized functionality As Vector grows and supports more integrations, our binary size and build complexity will inevitable grow. In informal discussions we've previously noted the [module system of Terraform](https://www.terraform.io/docs/commands/init.html). @@ -95,7 +95,7 @@ Having to, for example, use two transforms just to add 1 field and remove anothe We noted that the existing lua runtime was able to accomplish these tasks quite elegantly, however it was an order of magnitude slower than a native transform. -> TODO: Proof +> TODO Users shouldn't pay a high price just for a few lines saved in a configuration file. They shouldn't feel frustration when building these kinds of pipelines either. From ca3e5f5a78651d55e7a56e8cbeb80eebb4101666 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Mon, 4 May 2020 15:36:24 -0700 Subject: [PATCH 05/11] Add lua benches Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index 72b01ce510ca3..88acf87706254 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -95,7 +95,10 @@ Having to, for example, use two transforms just to add 1 field and remove anothe We noted that the existing lua runtime was able to accomplish these tasks quite elegantly, however it was an order of magnitude slower than a native transform. -> TODO +```bash +lua_add_fields/native time: [85.110 ms 85.159 ms 85.222 ms] +lua_add_fields/v1 time: [303.43 ms 306.10 ms 308.00 ms] +``` Users shouldn't pay a high price just for a few lines saved in a configuration file. They shouldn't feel frustration when building these kinds of pipelines either. From 413a2e3002b4a68dd160acecabe2ee93477b5bc8 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Tue, 5 May 2020 08:57:40 -0700 Subject: [PATCH 06/11] Touchup RFC Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 99 +++++++------------------ 1 file changed, 28 insertions(+), 71 deletions(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index 88acf87706254..d810faa59d2af 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -125,7 +125,7 @@ router, Tremor focuses on servicing demanding workloads from it's position as an What if we could provide users of Tremor and Vector with some sort of familiar shared experience? What if we could share functionality? What kind of framework could satisfy both our needs? There are so many questions! -> TODO: We need to talk to them. +We are in communication with them and this RFC may be amended in the future to include cross compatibility details. ## Prior Art @@ -160,21 +160,21 @@ While it enables more functionality and possibly better speeds, it also introduc ### When to build your own module Using the WASM engine you can write your own sources, transforms, sinks, codecs in any language that compiles down to a -`.wasm` file. Vector can then compile this file down to an optimized, platform specific module which it can use. This -allows Vector to work with internal or uncommon protocols or services, support new language transforms, or just write a -special transform to do *exactly* what you want. With the WASM engine, Vector puts you in command. +`.wasm` or `.wat` file. Vector can then compile this file down to an optimized, platform specific module which it can +use. This allows Vector to work with internal or uncommon protocols or services, support new language transforms, or +just write a special transform to do *exactly* what you want. With the WASM engine, Vector puts you in command. There is a trade-off though! You'll need to embolden yourself, as WASM is a fledgling technology. When it first loads a WASM file, Vector needs to spend some time building an optimized module before it can load it. You'll also be responsible for the safety and correctness of your code, if your module crashes on an event, Vector will re-initialize -the module and try again on the next event. This should be modelled roughly off +the module and try again on the next event. We try to follow the ideas of [Erlang's OTP](https://erlang.org/doc/design_principles/sup_princ.html). Here's some examples of when a WASM module is a good fit: * You need to support a specific protobuf type. -* You have a source or sink which is not currently supported by Vector. -* You want to write a complex transform in your favorite language. +* You require a source or sink which is not currently supported by Vector. +* You want to write a complex transform in Rust. (Other languages will be library supported soon) Here's some examples of where WASM probably isn't right for you: @@ -187,24 +187,19 @@ current time. Work on other operating systems is ongoing!) ### How Vector modules work -Vector modules are `.wasm` files (WASI or not) exposing a predefined interface. We provide a `foreign_modules::guest` -module for Rust projects located in `lib/foreign-modules` of the source tree, and we will be providing bindings for +Vector modules are `.wasm` files (WASI or not) exposing a predefined interface. We provide a `vector_wasm` +crate for Rust projects located in `lib/vector-wasm` of the source tree, and we will be providing bindings for other languages as we see user demand. Modules are sandboxed and can only interact with Vector over the Foreign Function Interface (FFI) which Vector exposes. For each instance of a module, Vector holds some context. For example, when a transform module processes an event, the -context of the module contains an `Event` type. The guest can use calls like `get("foo")` and Vector perform the action -and return the result to the guest. +context of the module contains an event, as well as the configuration used to create the transform. A guest module is not able to access the memory of the Vector instance, but Vector is free to modify the guest memory. For example, behind the scenes of a `get` call, the Guest must allocate some memory for Vector to write the result into. Due to the current semantics of the `vector::Event` type, passing data between WASM modules and the Vector host involves -serialization to JSON. We're investigating ways to speed this up. - -* It may be possible to add a `repr(C)` flag to Event. -* It may be worthwhile to explore an `Event` refactor. -* Other serialization options may be much faster but less compatible. +serialization. This may change in the future, resulting in an increase major version of the `vector-wasm` crate. ### Initialization @@ -212,7 +207,7 @@ serialization to JSON. We're investigating ways to speed this up. Regardless of whether of not the module uses WASI, or has access to a language specific binding, it can interact with Vector. Vector works by invoking the modules public interface. First, as soon as the module is loaded, `register` is -called once, globally, per module. In this call, the module can configure how Vector sees it and talks to it. During +called once per module. In this call, the module can configure how Vector sees it and talks to it. During this time, the module can do any one time setup it needs. (Eg. Opening a socket, a file). If using the Vector guest API this can be done via: @@ -246,61 +241,22 @@ What happens next depends on which type of module it is. } ``` -> **TODO:** Sources, sinks, and codecs have no POC or formal specification, below is a work in progress. - -* `Source`: Modules of this role can be used as a source `type`. - - ```rust - use foreign_modules::hostcall; - - #[no_mangle] - // TODO: Add better FFI error handling! - pub extern "C" fn start() { - // TODO - } - ``` - -* `Sink`: Modules of this role can be used as a sink `type`. - - ```rust - use foreign_modules::hostcall; - - #[no_mangle] - // TODO: Add better FFI error handling! - pub extern "C" fn process() { - if let Some(value) = hostcall::get("field").unwrap() { - hostcall::insert("field", value).unwrap(); - } - } - ``` - -* `Codec`: - - ```rust - use foreign_modules::hostcall; - // TODO: Add better FFI error handling! - #[no_mangle] - pub extern "C" fn process() { - if let Some(value) = hostcall::get("field").unwrap() { - hostcall::insert("field", value).unwrap(); - } - } - ``` +> **TODO:** Sources, sinks, and codecs have no POC or formal specification. ### Your first module (Rust) To create your first module, start with a working [Rust toolchain](https://rustup.rs/) add the `wasm32-wasi` -toolchain's `nightly` version: +toolchain: ```bash -rustup target add wasm32-wasi --toolchain nightly +rustup target add wasm32-wasi ``` Then, create a module: ```bash -cargo +nightly init --lib echo +cargo init --lib echo ``` In your `Cargo.toml` fill in: @@ -310,14 +266,15 @@ In your `Cargo.toml` fill in: crate-type = ["cdylib"] [dependencies] -foreign_modules = { path = "/path/to/your/clone/of/vector/lib/foreign-modules"} +# This will be published at a later date. +vector-wasm = { path = "/path/to/your/clone/of/vector/lib/vector-wasm"} serde = { version = "1", features = ["derive"] } ``` Next, in your `lib.rs` file: ```rust -use foreign_modules::{Registration, hostcall}; +use vector_wasm::{Registration, hostcall}; use serde::{Serialize, Deserialize}; #[no_mangle] @@ -425,7 +382,7 @@ Integrating a flexible, practical, simple WASM module support means we can: WASM is still **very young**. We will be early adopters and will pay a high adoption price. Additionally, we may find ourselves using unstable functionality of some crates, including a non-trivial amount of `unsafe`, which may lead to -unforseen consequences. +unforeseen consequences. Our **binary size will bloom**. We currently (in unoptimized POC implementations) see our binary size grow by over 60 MB. The Lucet team is aware of this issue, and in some preliminary discussions this binary size increase is unexpected and @@ -433,8 +390,8 @@ we expect it to improve. If this does not improve, we can consider adopting `was which has a smaller binary footprint. Our team will need to diagnose, support, and evolve an arguably rather **complex API** for WASM modules to work -correctly. This will mean we'll have to think of our system very abstractly, and has many questions around API stability -and compatibility. +correctly. This will mean we'll have to think of our system very abstractly, and opens many questions around API +stability and compatibility. This feature is **advanced and may invoke user confusion**. Novice or casual users may not fully grasp the subtleties and limitations of WASM. We must practice caution around the user experience of this feature to ensure novices and @@ -463,7 +420,7 @@ We should support modules so broadly? ### Consider supporting a blessed compiler Our original discussions included the idea of a blessed compiler and a UX similar to our existing `lua` or `javascript` -transform. Indeed, our initial implementation should include just a transform, as it's by far the easiest place to +transform. Indeed, our initial implementation includes just a transform, as it's by far the easiest place to introduce this feature. During investigation of the initial POC we determined that our most desirable "blessed" language would be @@ -489,7 +446,7 @@ We should also consider if we want to change how we handle codecs, since it is l include wanting to add codec support to already existing sources. **Temporary Conclusion:** We will adopt hostcalls for the time being to allow for a minimal POC. We would like to -investigate these choices lat er, and we will make a decision before stabilizing WASM modules. +investigate these choices later, and we will make a decision before stabilizing WASM modules. ### Consider Packaging Approach @@ -505,7 +462,7 @@ A couple good questions to consider: * Should a user import `vector::...` to use the Vector guest API in their Rust module? * Vector's internal API is largely undocumented and kind of a mess. Should we hide/clean the irrelevant stuff? -**Conclusion:** We are including a workspace member `foreign_modules` in the `lib/foreign-modules` directory. +**Conclusion:** We are including a workspace member `vector-wasm` in the `vector-wasm` directory. This may be renamed and/or published in the future. @@ -548,9 +505,9 @@ Incremental steps that execute this change. * v0.1: (Done) This draft forms the groundwork for this RFC, and demos a POC of how a theoretical user could use this for a protobuf decoding transform, and permitting the first `wasm` transform test to pass. -* v0.2: The POC of the initial transform has it's v1 -* v0.3: We have benchmarks of the initial transform POC. -* v0.4: Compile artifact caching is implemented, so Vector doesn't unnecessarily recompile wasm modules. +* v0.2: (Done) The POC of the initial transform has it's v1 +* v0.3: (Done) We have benchmarks of the initial transform POC. +* v0.4: (Done) Compile artifact caching is implemented, so Vector doesn't unnecessarily recompile wasm modules. * v0.5: Guest API expanded to support majority of Event API. * v0.6: We talk to a couple known users who would be interested in this feature and let them test it out. * v0.7: This RFC is amended to include Sinks and Sources information. From 127876bda68d8119bef4104b068a729f29bc3ec2 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Wed, 6 May 2020 16:32:08 -0700 Subject: [PATCH 07/11] Add benchmarks of native and lua Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 6 +- rfcs/2020-04-15-wasm-plugins/lua-8.svg | 455 ++++++++++++++++++ rfcs/2020-04-15-wasm-plugins/native-8.svg | 410 ++++++++++++++++ website/docs/reference/transforms/wasm.md | 143 ++++++ website/docs/reference/transforms/wasm.md.erb | 28 ++ 5 files changed, 1038 insertions(+), 4 deletions(-) create mode 100644 rfcs/2020-04-15-wasm-plugins/lua-8.svg create mode 100644 rfcs/2020-04-15-wasm-plugins/native-8.svg create mode 100644 website/docs/reference/transforms/wasm.md create mode 100644 website/docs/reference/transforms/wasm.md.erb diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index d810faa59d2af..41926a2da9e94 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -95,10 +95,8 @@ Having to, for example, use two transforms just to add 1 field and remove anothe We noted that the existing lua runtime was able to accomplish these tasks quite elegantly, however it was an order of magnitude slower than a native transform. -```bash -lua_add_fields/native time: [85.110 ms 85.159 ms 85.222 ms] -lua_add_fields/v1 time: [303.43 ms 306.10 ms 308.00 ms] -``` +![The Lua transform (v2) operating on a log event with 8 string fields.](./2020-04-15-wasm-plugins/lua-8.svg) +![The native `add_fields` transform operating on a log event with 8 string fields.](./2020-04-15-wasm-plugins/native-8.svg) Users shouldn't pay a high price just for a few lines saved in a configuration file. They shouldn't feel frustration when building these kinds of pipelines either. diff --git a/rfcs/2020-04-15-wasm-plugins/lua-8.svg b/rfcs/2020-04-15-wasm-plugins/lua-8.svg new file mode 100644 index 0000000000000..920b3a4aafc55 --- /dev/null +++ b/rfcs/2020-04-15-wasm-plugins/lua-8.svg @@ -0,0 +1,455 @@ + + + +Gnuplot +Produced by GNUPLOT 5.2 patchlevel 8 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + 1 + + + + + 2 + + + + + 3 + + + + + 4 + + + + + 5 + + + + + 6 + + + + + 7 + + + + + 8 + + + + + 11.1 + + + + + 11.2 + + + + + 11.3 + + + + + 11.4 + + + + + 11.5 + + + + + 11.6 + + + + + 11.7 + + + + + 11.8 + + + + + 11.9 + + + + + 0 + + + + + 0.5 + + + + + 1 + + + + + 1.5 + + + + + 2 + + + + + 2.5 + + + + + 3 + + + + + 3.5 + + + + + 4 + + + + + 4.5 + + + + + + + + + Iterations (x 103) + + + + + Density (a.u.) + + + + + Average time (us) + + + + + wasm/add_fields/lua/8 + + + + + PDF + + + PDF + + + + + + + + + + Mean + + + + + Mean + + + + + + "Clean" sample + + + + + "Clean" sample + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Mild outliers + + + Mild outliers + + + + + + + + + Severe outliers + + + Severe outliers + + + + + + + + gnuplot_plot_6 + + + + + + gnuplot_plot_7 + + + + gnuplot_plot_8 + + + + gnuplot_plot_9 + + + + + + + + + + + + + + diff --git a/rfcs/2020-04-15-wasm-plugins/native-8.svg b/rfcs/2020-04-15-wasm-plugins/native-8.svg new file mode 100644 index 0000000000000..2fee255e59615 --- /dev/null +++ b/rfcs/2020-04-15-wasm-plugins/native-8.svg @@ -0,0 +1,410 @@ + + + +Gnuplot +Produced by GNUPLOT 5.2 patchlevel 8 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + 10 + + + + + 20 + + + + + 30 + + + + + 40 + + + + + 50 + + + + + 60 + + + + + 0.9 + + + + + 0.92 + + + + + 0.94 + + + + + 0.96 + + + + + 0.98 + + + + + 1 + + + + + 1.02 + + + + + 0 + + + + + 5 + + + + + 10 + + + + + 15 + + + + + 20 + + + + + 25 + + + + + 30 + + + + + + + + + Iterations (x 103) + + + + + Density (a.u.) + + + + + Average time (us) + + + + + wasm/add_fields/native/8 + + + + + PDF + + + PDF + + + + + + + + + + Mean + + + + + Mean + + + + + + "Clean" sample + + + + + "Clean" sample + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Mild outliers + + + Mild outliers + + + + + + + + gnuplot_plot_5 + + + + + + gnuplot_plot_6 + + + + gnuplot_plot_7 + + + + gnuplot_plot_8 + + + + + + + + + + + + + + diff --git a/website/docs/reference/transforms/wasm.md b/website/docs/reference/transforms/wasm.md new file mode 100644 index 0000000000000..b8538fab8ccf7 --- /dev/null +++ b/website/docs/reference/transforms/wasm.md @@ -0,0 +1,143 @@ +--- +last_modified_on: "2020-05-06" +component_title: "WASM" +description: "The Vector `wasm` transform accepts and outputs `log` events, allowing you to execute **experimental** WASM plugins." +event_types: ["log"] +function_category: "program" +issues_url: https://github.com/timberio/vector/issues?q=is%3Aopen+is%3Aissue+label%3A%22transform%3A+wasm%22 +sidebar_label: "wasm|[\"log\"]" +source_url: https://github.com/timberio/vector/tree/master/src/transforms/wasm.rs +status: "beta" +title: "WASM Transform" +--- + +import Fields from '@site/src/components/Fields'; +import Field from '@site/src/components/Field'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +The Vector `wasm` transform +accepts and outputs [`log`][docs.data-model.log] events, allowing you to +execute **experimental** WASM plugins. + + + +## Configuration + + + + +```toml title="vector.toml" +[transforms.my_transform_id] + type = "wasm" # required + inputs = ["my-source-or-transform-id"] # required + heap_max_size = 10485760 # optional, default + module = "./modules/example.wasm" # required +``` + + + + +```toml title="vector.toml" +[transforms.my_transform_id] + type = "wasm" # required + inputs = ["my-source-or-transform-id"] # required + heap_max_size = 10485760 # optional, default + module = "./modules/example.wasm" # required +``` + + + + +## Options + + + + +### heap_max_size + +The maximum size of the heap of this module, in bytes. (This includes the +module itself, default is 10 MB.) + + + + + + +### module + +The file path of the `.wasm` module. + + + + + + +## Examples + +Given the following configuration: + + + +```toml title="vector.toml" +[transforms.test] + inputs = [...] + type = "wasm" + module = "module.wasm" + wasi = true +``` + +Accompanied by a `module.wasm` file built via `cargo +nightly --target wasm32-wasi ...`, Vector will use the module as a +custom transform. + +## How It Works + +### Environment Variables + +Environment variables are supported through all of Vector's configuration. +Simply add `${MY_ENV_VAR}` in your Vector configuration file and the variable +will be replaced before being evaluated. + +You can learn more in the +[Environment Variables][docs.configuration#environment-variables] section. + + +[docs.configuration#environment-variables]: /docs/setup/configuration/#environment-variables +[docs.data-model.log]: /docs/about/data-model/log/ diff --git a/website/docs/reference/transforms/wasm.md.erb b/website/docs/reference/transforms/wasm.md.erb new file mode 100644 index 0000000000000..1c574fa4811ef --- /dev/null +++ b/website/docs/reference/transforms/wasm.md.erb @@ -0,0 +1,28 @@ +<%- component = metadata.transforms.wasm -%> + +<%= component_header(component) %> + +## Configuration + +<%= component_config_example(component) %> + +## Options + +<%= fields(component.specific_options_list, heading_depth: 3) %> + +<%- if component.env_vars_list.any? -%> +## Env Vars + +<%= fields(component.env_vars_list, heading_depth: 3) %> + +<%- end -%> +<%= component_fields(component, heading_depth: 2) -%> +<%- if component.examples.any? -%> +## Examples + +<%= component_examples(component) %> + +<%- end -%> +## How It Works [[sort]] + +<%= component_sections(component) %> \ No newline at end of file From c6fcff4854559fbfd314c6493c4bbaee67274edc Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Wed, 6 May 2020 17:07:36 -0700 Subject: [PATCH 08/11] Review and refine RFC Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-foreign-modules.md | 116 +++++++++++++----------- 1 file changed, 61 insertions(+), 55 deletions(-) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-foreign-modules.md index 41926a2da9e94..d2850a499181f 100644 --- a/rfcs/2020-04-15-wasm-foreign-modules.md +++ b/rfcs/2020-04-15-wasm-foreign-modules.md @@ -72,17 +72,17 @@ to a third-party module ecosystem. Terraform typically runs on CIs and developer machines, making portability valuable. For Terraform, users can follow the [guide](https://www.terraform.io/docs/extend/writing-custom-providers.html) to write a Go-based provider, which they can then distribute as a portable binary. Notably: This means either distributing an unoptimized binary, distributing a lot -of optimized binaries, or requiring folks build their own optimized binaries. Terraform doesn't care much about speed, -so unoptimized binaries are acceptable. +of optimized binaries (per architecture), or requiring folks build their own optimized binaries. Terraform doesn't care +much about speed, so unoptimized binaries are acceptable. Vector is in a slightly different position than Terraform though! Vector runs primarily in servers and even end user machines. We can't expect users to have build tooling installed on their servers (it's a security risk!) and we -definitely can't expect it on an end-user machines. Most people just aren't that interested in computers. Vector has -different performance needs, too. While folk's aren't generally wanting to execute terraform providers hundreds of +definitely can't expect it on end-user machines. *Most people just aren't that interested in computers.* Vector has +different performance needs, too. While folks aren't generally wanting to execute terraform providers hundreds of thousands (or millions) of times per second, they are absolutely doing that with Vector. When we're processing the firehose of events originating from a modern infrastructure every millisecond counts. Vector -needs a way to ship portable, *optimizable* modules if we ever hope of making this a reality. +needs a way to ship portable, *optimizable* plugins if we ever hope of making this a reality. ### Simplifying common basic transforms @@ -123,7 +123,8 @@ router, Tremor focuses on servicing demanding workloads from it's position as an What if we could provide users of Tremor and Vector with some sort of familiar shared experience? What if we could share functionality? What kind of framework could satisfy both our needs? There are so many questions! -We are in communication with them and this RFC may be amended in the future to include cross compatibility details. +We are in communication with them and this RFC will amended in the future to include cross compatibility details if any +emerge. ## Prior Art @@ -138,26 +139,28 @@ solution. We also noted that the existing `lua` v1 transform was quite simple. While this makes for nice usability, it means things like on-start initializations aren't possible. In [v2](https://github.com/timberio/vector/pull/2000) `lua` has -learnt many new tricks, and this WASM support aims to match them. +learnt many new tricks, and this WASM support shall match them. We also noted the capabilities and speed of WASM (particularly WASI) mean we could eventually support things like sockets and files. -Indeed, it is possible we will be able to let Vector build and run Lua code as a wasm module in the future. +Indeed, it is possible we will be able to let Vector build and run Lua code as a WASM module in the future. ## Guide-level Proposal Vector contains a WebAssembly (WASM) engine which allows you to write custom functionality for Vector. You might already -be familiar with the [Javascript](https://github.com/timberio/vector/pull/721) or -[Lua](https://vector.dev/docs/reference/transforms/lua/) transforms, WebAssembly is a bit different than that. +be familiar with the [Lua](https://vector.dev/docs/reference/transforms/lua/) transform, WebAssembly is a bit different. -While it enables more functionality and possibly better speeds, it also introduces complication to your system. +While it enables more functionality and possibly better speeds, it also introduces complication to your system. You'll +need to learn a bit about Foreign Function Interfaces (FFI), a bit about memory management, and build your WASM +artifact. Don't worry: We'll walk you through it. +It's a new frontier, and you've got a friend in Vector. ### When to build your own module -Using the WASM engine you can write your own sources, transforms, sinks, codecs in any language that compiles down to a +Using the WASM engine you can write your own transforms, sinks, codecs in any language that compiles down to a `.wasm` or `.wat` file. Vector can then compile this file down to an optimized, platform specific module which it can use. This allows Vector to work with internal or uncommon protocols or services, support new language transforms, or just write a special transform to do *exactly* what you want. With the WASM engine, Vector puts you in command. @@ -168,52 +171,54 @@ responsible for the safety and correctness of your code, if your module crashes the module and try again on the next event. We try to follow the ideas of [Erlang's OTP](https://erlang.org/doc/design_principles/sup_princ.html). -Here's some examples of when a WASM module is a good fit: +Some examples of when a WASM plugin is a good fit: * You need to support a specific protobuf type. * You require a source or sink which is not currently supported by Vector. -* You want to write a complex transform in Rust. (Other languages will be library supported soon) +* You want to write a complex transform in Rust. (Other languages will be library supported according to demand and + resources) -Here's some examples of where WASM probably isn't right for you: +Some examples of where WASM plugin probably isn't right for you: * You only need basic functionality already offered by Vector. -* Your desired language doesn't support WASM as a compile target. -* You are using Vector on a non-Linux X86-64 bit compatible target. (Lucet, our Engine, only supports Linux at this -current time. Work on other operating systems is ongoing!) +* Your desired language doesn't support compiling to WASM with WASI. +* You are using Vector on a non-Linux X86-64 bit compatible target. ([Lucet](https://bytecodealliance.github.io/lucet), + our Engine, only supports Linux at this current time. Work on other operating systems is ongoing!) * You need a battle-proven system, WASM support in Vector is young! -### How Vector modules work +### How Vector plugins work -Vector modules are `.wasm` files (WASI or not) exposing a predefined interface. We provide a `vector_wasm` +Vector plugins are `.wasm` files (WASI or not) exposing a predefined interface. We provide a `vector_wasm` crate for Rust projects located in `lib/vector-wasm` of the source tree, and we will be providing bindings for other languages as we see user demand. -Modules are sandboxed and can only interact with Vector over the Foreign Function Interface (FFI) which Vector exposes. +Plugins are sandboxed and can only interact with Vector over the Foreign Function Interface (FFI) which Vector exposes. For each instance of a module, Vector holds some context. For example, when a transform module processes an event, the context of the module contains an event, as well as the configuration used to create the transform. A guest module is not able to access the memory of the Vector instance, but Vector is free to modify the guest memory. For example, behind the scenes of a `get` call, the Guest must allocate some memory for Vector to write the result into. -Due to the current semantics of the `vector::Event` type, passing data between WASM modules and the Vector host involves +Due to the current semantics of the `vector::Event` type, passing data between WASM plugin and the Vector host involves serialization. This may change in the future, resulting in an increase major version of the `vector-wasm` crate. ### Initialization -Regardless of whether of not the module uses WASI, or has access to a language specific binding, it can interact with -Vector. Vector works by invoking the modules public interface. First, as soon as the module is loaded, `register` is +Regardless of whether of not the plugin uses WASI, or has access to a language specific binding, it can interact with +Vector. Vector works by invoking the plugins public interface. First, as soon as the module loads, `register` is called once per module. In this call, the module can configure how Vector sees it and talks to it. During -this time, the module can do any one time setup it needs. (Eg. Opening a socket, a file). +this time, the module can do one time setup it needs. (Eg. Opening a socket, a file). If using the Vector guest API this can be done via: ```rust -use foreign_modules::{Registration, roles::Sink}; +use vector_wasm::{Registration, roles::Sink}; #[no_mangle] +// TODO: This is unsafe -- Needs a fix. pub extern "C" fn init() -> *mut Registration { &mut Registration::transform() .set_wasi(true) as *mut Registration @@ -227,15 +232,19 @@ What happens next depends on which type of module it is. * `Transform`: Modules of this role can be used as a transform `type`. ```rust - use foreign_modules::hostcall; - - // TODO: Add better FFI error handling! + use vector_wasm::hostcall; #[no_mangle] - pub extern "C" fn process() -> usize { - if let Some(value) = hostcall::get("field").unwrap() { - hostcall::insert("field", value).unwrap(); - } - 0 + pub extern "C" fn process(data: u64, length: u64) -> usize { + let data = unsafe { + std::ptr::slice_from_raw_parts_mut(data as *mut u8, length as usize) + .as_mut() + .unwrap() + }; + let mut event: BTreeMap = serde_json::from_slice(data).unwrap(); + event.insert("new_field".into(), "new_value".into()); + event.insert("new_field_2".into(), "new_value_2".into()); + hostcall::emit(serde_json::to_vec(&event).unwrap()); + 1 } ``` @@ -272,31 +281,28 @@ serde = { version = "1", features = ["derive"] } Next, in your `lib.rs` file: ```rust -use vector_wasm::{Registration, hostcall}; -use serde::{Serialize, Deserialize}; +use serde_json::Value; +use std::collections::BTreeMap; +use vector_wasm::{hostcall, Registration}; #[no_mangle] pub extern "C" fn init() -> *mut Registration { - &mut Registration::transform() - .set_wasi(true) as *mut Registration + &mut Registration::transform().set_wasi(true) as *mut Registration } -// TODO: Add better FFI error handling! #[no_mangle] -pub extern "C" fn process() -> usize { - let result = hostcall::get("message"); - - match result.unwrap() { - Some(value) => { - hostcall::insert("echo", value); - 0 - }, - None => { - 0 - }, - } +pub extern "C" fn process(data: u64, length: u64) -> usize { + let data = unsafe { + std::ptr::slice_from_raw_parts_mut(data as *mut u8, length as usize) + .as_mut() + .unwrap() + }; + let mut event: BTreeMap = serde_json::from_slice(data).unwrap(); + event.insert("new_field".into(), "new_value".into()); + event.insert("new_field_2".into(), "new_value_2".into()); + hostcall::emit(serde_json::to_vec(&event).unwrap()); + 1 } - #[no_mangle] pub extern "C" fn shutdown() { (); @@ -366,7 +372,7 @@ Congratulations, you're ready for a new frontier! ## Sales Pitch -Integrating a flexible, practical, simple WASM module support means we can: +Integrating a flexible, practical, simple WASM plugin support means we can: * Expose one common interface to any existing and future languages supported by WASM. * Allow large-scale users to use custom encodings. @@ -403,7 +409,7 @@ Since this is a broad reaching feature with a number of green sky ideas, we shou ### Agree on breadth and depth of implementation -We should support modules so broadly? +We should support plugins so broadly? * As sinks? For this we should consider how we handle batching, partitioning, etc, more. * **Temporary Conclusion:** Held off on deciding. Some interest. @@ -466,7 +472,7 @@ This may be renamed and/or published in the future. ### Consider observability -How can we let users see what's happening in WASM modules? Can we use tracing somehow? Lucet supports tracing, perhaps +How can we let users see what's happening in WASM plugins? Can we use tracing somehow? Lucet supports tracing, perhaps we could hook in somehow? **Conclusion:** Preliminary results show we may be able to allow guests to write with the current tracing subscriber. From 63d3f415b127fc84dcf51cd614fa82428cbe3f6b Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Thu, 7 May 2020 09:50:58 -0700 Subject: [PATCH 09/11] Move rfc Signed-off-by: Ana Hobden --- ...0-04-15-wasm-foreign-modules.md => 2020-04-15-wasm-plugins.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{2020-04-15-wasm-foreign-modules.md => 2020-04-15-wasm-plugins.md} (100%) diff --git a/rfcs/2020-04-15-wasm-foreign-modules.md b/rfcs/2020-04-15-wasm-plugins.md similarity index 100% rename from rfcs/2020-04-15-wasm-foreign-modules.md rename to rfcs/2020-04-15-wasm-plugins.md From 342da62994e4579d9e2008635e6744958c846306 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Thu, 7 May 2020 09:51:52 -0700 Subject: [PATCH 10/11] Update Registration docs Signed-off-by: Ana Hobden --- rfcs/2020-04-15-wasm-plugins.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/2020-04-15-wasm-plugins.md b/rfcs/2020-04-15-wasm-plugins.md index d2850a499181f..2882e84817bfa 100644 --- a/rfcs/2020-04-15-wasm-plugins.md +++ b/rfcs/2020-04-15-wasm-plugins.md @@ -219,9 +219,9 @@ use vector_wasm::{Registration, roles::Sink}; #[no_mangle] // TODO: This is unsafe -- Needs a fix. -pub extern "C" fn init() -> *mut Registration { - &mut Registration::transform() - .set_wasi(true) as *mut Registration +pub extern "C" fn init() { + Registration::transform() + .register() } ``` From 41d90b7df1995429d179c89a4f1c176d72a263f1 Mon Sep 17 00:00:00 2001 From: Ana Hobden Date: Thu, 7 May 2020 10:01:20 -0700 Subject: [PATCH 11/11] Remove files that weren't created. Signed-off-by: Ana Hobden --- website/docs/reference/transforms/wasm.md | 143 ------------------ website/docs/reference/transforms/wasm.md.erb | 28 ---- 2 files changed, 171 deletions(-) delete mode 100644 website/docs/reference/transforms/wasm.md delete mode 100644 website/docs/reference/transforms/wasm.md.erb diff --git a/website/docs/reference/transforms/wasm.md b/website/docs/reference/transforms/wasm.md deleted file mode 100644 index b8538fab8ccf7..0000000000000 --- a/website/docs/reference/transforms/wasm.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -last_modified_on: "2020-05-06" -component_title: "WASM" -description: "The Vector `wasm` transform accepts and outputs `log` events, allowing you to execute **experimental** WASM plugins." -event_types: ["log"] -function_category: "program" -issues_url: https://github.com/timberio/vector/issues?q=is%3Aopen+is%3Aissue+label%3A%22transform%3A+wasm%22 -sidebar_label: "wasm|[\"log\"]" -source_url: https://github.com/timberio/vector/tree/master/src/transforms/wasm.rs -status: "beta" -title: "WASM Transform" ---- - -import Fields from '@site/src/components/Fields'; -import Field from '@site/src/components/Field'; -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; - -The Vector `wasm` transform -accepts and outputs [`log`][docs.data-model.log] events, allowing you to -execute **experimental** WASM plugins. - - - -## Configuration - - - - -```toml title="vector.toml" -[transforms.my_transform_id] - type = "wasm" # required - inputs = ["my-source-or-transform-id"] # required - heap_max_size = 10485760 # optional, default - module = "./modules/example.wasm" # required -``` - - - - -```toml title="vector.toml" -[transforms.my_transform_id] - type = "wasm" # required - inputs = ["my-source-or-transform-id"] # required - heap_max_size = 10485760 # optional, default - module = "./modules/example.wasm" # required -``` - - - - -## Options - - - - -### heap_max_size - -The maximum size of the heap of this module, in bytes. (This includes the -module itself, default is 10 MB.) - - - - - - -### module - -The file path of the `.wasm` module. - - - - - - -## Examples - -Given the following configuration: - - - -```toml title="vector.toml" -[transforms.test] - inputs = [...] - type = "wasm" - module = "module.wasm" - wasi = true -``` - -Accompanied by a `module.wasm` file built via `cargo +nightly --target wasm32-wasi ...`, Vector will use the module as a -custom transform. - -## How It Works - -### Environment Variables - -Environment variables are supported through all of Vector's configuration. -Simply add `${MY_ENV_VAR}` in your Vector configuration file and the variable -will be replaced before being evaluated. - -You can learn more in the -[Environment Variables][docs.configuration#environment-variables] section. - - -[docs.configuration#environment-variables]: /docs/setup/configuration/#environment-variables -[docs.data-model.log]: /docs/about/data-model/log/ diff --git a/website/docs/reference/transforms/wasm.md.erb b/website/docs/reference/transforms/wasm.md.erb deleted file mode 100644 index 1c574fa4811ef..0000000000000 --- a/website/docs/reference/transforms/wasm.md.erb +++ /dev/null @@ -1,28 +0,0 @@ -<%- component = metadata.transforms.wasm -%> - -<%= component_header(component) %> - -## Configuration - -<%= component_config_example(component) %> - -## Options - -<%= fields(component.specific_options_list, heading_depth: 3) %> - -<%- if component.env_vars_list.any? -%> -## Env Vars - -<%= fields(component.env_vars_list, heading_depth: 3) %> - -<%- end -%> -<%= component_fields(component, heading_depth: 2) -%> -<%- if component.examples.any? -%> -## Examples - -<%= component_examples(component) %> - -<%- end -%> -## How It Works [[sort]] - -<%= component_sections(component) %> \ No newline at end of file