diff --git a/ASimpleReply.md b/ASimpleReply.md index b208658f6..c99b69829 100644 --- a/ASimpleReply.md +++ b/ASimpleReply.md @@ -45,15 +45,15 @@ In any case, this problem (and its solution) seem... trivial to me. Doesn't car ## Visitation order #### "Finding a good visitation order is hard" -Yes, so don't bother. Use a worklist! Hard guaranteed linear time to completion, fast and simple. Back-edges are needed and they are well worth the cost. I start C2 without them, couldn't get closure with a few passes, added backedges and everything got lots *faster*. Edge maintenance code deals with the work, so I don't have to think about it, its all cache-resident L1 hits. Fast, fast, fast. +Yes, so don't bother. Use a worklist! Hard guaranteed linear time to completion, fast and simple. Back-edges are needed and they are well worth the cost. I started C2 without them, couldn't get closure with a few passes, added backedges and everything got lots *faster*. Edge maintenance code deals with the work, so I don't have to think about it, its all cache-resident L1 hits. Fast, fast, fast. ## Other things? #### "Dead code elimination" -free with backedges; basically when the backedge/ref-count drops to zero, recursively delete nodes. There's maybe 10 lines dedicated to this in Simple. +Free with backedges; basically when the backedge/ref-count drops to zero, recursively delete nodes. There's maybe 10 lines dedicated to this in Simple. #### "Hard to introduce new control flow" -Hello compiler writers? Sorry, can't hardly believe this one; editing a CFG is morally the same as editing the SoN, what's the issue? In the specific `min` case, the goal is to pick a place in the CFG to expand... which means a place needs to be picked, which for SoN usually means after Global Code Motion when "places" (e.g. classic CFG) is available again. +Hello compiler writers? Sorry, can't hardly believe this one; editing a CFG is morally the same as editing the SoN, what's the issue? In the specific `min` case, the goal is to pick a place in the CFG to expand... which means a place needs to be picked, which for SoN usually means after Global Code Motion, when "places" (e.g. classic CFG) is available again. #### "Hard to figure out what's inside a loop" Again, Hello compiler writers? C2 certainly does lots of aggressive loop optimizations - which start by running a standard SCC, finding loop headers, walking the graph area constrained by the loop, producing a loop body for further manipulations... yada yada. Basically, I build loops and a loop tree via the Olde Fashioned method of running SCC. @@ -62,7 +62,7 @@ Again, Hello compiler writers? C2 certainly does lots of aggressive loop optimi Compiling is fast. He-said-She-said. Neen-neener. Data? Perf discussions with facts? Apples to apples comparisons? No? So how about "compilation speed vs code quality didn't meet our goals"? #### "Cache-unfriendly" -Same (lack of) argument. Certainly when I did C2 I was highly focused on cache-friendly and based on the vast compile speedup I obtained over all competitions (at that time every AOT compiler, GCC, LLVM) I was largely successful. C2 compiles take far less footprint (both cache and memory) that my "competitors". Mostly C2 compiles, despite aggressive inlining, fit easily in L2, nearly always in L1. Getting this right is important, so I suspect other things happened to make it miss for V8. But also... no numbers from me either, so neen-neener again. +Same (lack of) argument. Certainly when I did C2 I was highly focused on cache-friendly and based on the vast compile speedup I obtained over all competitions (at that time every AOT compiler, GCC, LLVM) I was largely successful. C2 compiles take far less footprint (both cache and memory) that my "competitors". Mostly C2 compiles, despite aggressive inlining, fit easily in L2, nearly always in L1. Getting this right is important, so I suspect other things happened to make it miss for V8. But also... no numbers from me either, so neen-neener he-said-she-said again. ## postlog I'll point out most of the above discussion is things that surely made V8's life harder - and slower - and are things I never would have done. OTOH I've not done a JS compiler, so I have no facts here also. diff --git a/README.md b/README.md index 368d21b32..4fe43a435 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,7 @@ intermediate representation. The Simple language is styled after a subset of C or Java. +* [Chapter 0](chapter01/README.md): Motivation * [Chapter 1](chapter01/README.md): Script that returns an integer literal, i.e., an empty function that takes no arguments and returns a single integer value. The `return` statement. * [Chapter 2](chapter02/README.md): Simple binary arithmetic such as addition, subtraction, multiplication, division with constants. Peephole optimization / simple constant folding. diff --git a/chapter00/README.md b/chapter00/README.md new file mode 100644 index 000000000..4858d662e --- /dev/null +++ b/chapter00/README.md @@ -0,0 +1,188 @@ +# Chapter 0: Prolog and Motivation + +This repo is intended to demonstrate the Sea-of-Nodes compiler IR, and contains +a fully fledge stand-alone compiler. + +It is intended to be *readable* and *debuggable*, fast to learn and modify. It +is **not** designed to be a super fast compiler, although it is pretty quick. +It could be made much quicker with modest effort, and that may happen in some +later chapter. + +As a demonstration compiler, it is **not** intended to be a production ready +(although it has a fairly large test suite). + +This Ahead-of-Time compiler is not intended (yet) to be a Just-In-Time +compiler, although that may happen in a later chapter. + +## Target Audience + +My target audience is both for traditional compiler writers, and medium skill +programmers curious about compilers. I do expect some basics about how +compilers work; things like source-code-in and binaries-out. It will +definitely help to have some jargon words sorted out (see end of this document) +and be comfortable with under-graduate level graph algorithms. + + +## Why Java + +Simple is written in Java. Why? Because Java has shown itself to be fast to +learn, write, and *debug*. It is certainly fast enough for large scale batch jobs +(see [The One Billion Row Challenge](https://github.com/gunnarmorling/1brc)). + +A Just-In-Time variant of Simple may revist the implementation language +decision. + + +## Jargon + +Each of these terms is common in compilers, and there is a wealth of literature +available online for each of them. Wikipedia is a great starting point for +learning more. + + +* IR: Intermediate Representation - source code is hard to directly manipulate, + and machines only understand Machine Code. The IR bridges this gap; source + code is translated into an IR, the IR is manipulated (e.g. type-check and + optimize), and finally the IR is converted to machine code (binary format). + + The notion of an IR generally covers a high level CFG view, and a mid level BB + view and a low level instruction or opcode view. + + +* BB: Basic Block - a collection of IR instructions or opcodes. These are all + expected to execute from start to finish without any changes in control flow. + At different points in time, the opcodes might represent some high-level + language concept (e.g. allocation, or a function call), or might represent a + direct hardware instruction (`add r1,r2,r3` or `call malloc`). + + In a traditional compiler, all opcodes are kept inside BB's and some care has to + be taken to move opcodes from one BB to another. + + In the Sea of Nodes compiler, this restriction is dropped until right before + code generation. + + +* CFG: Control Flow Graph - a *graph*, where the *nodes* are BB's and the + *edges* represent changes in program execution flow. In the Sea of Nodes + compiler, the BB notion is mostly dropped and some normal-ish opcodes are the + CFG nodes. + + +* Nodes - nodes in a graph. For a traditional compiler there are two kinds of + nodes; nodes in a CFG are BB's and Nodes in a BB are instructions. For Sea of + Nodes, this distinction is blurred. The same kinds of Nodes and Edges are used + for both control flow and data. + + +* DU: Def-Use edges - a graph edge from a Defining node to a Using node. These + edges track the flow of values through a program and are required to give the + program its meaning. The reverse, Use-Def edges, are very useful for program + optimizations. Some early program IRs don't start with these, or periodically + build them from program names, and throw them away. + + Sea of Nodes starts with these def-use edges (and use-def) and maintain them + throughout the compiler's lifetime. + + The def-use edges can be thought of as a mapping between a node and its outputs. + Similarly, use-def edges can be thought of as a mapping between a node and its inputs. + + To represent the SoN graph visually we use `use-def` edges but nodes also + have `def-use` edges, it is just a matter of how we look at them visually. + + +* SSA: Static Single Assigment - a program shape where all program values are + statically assigned exactly once. Example before SSA: + +``` + if( rand ) x=2; // A first assignment to x + else x=3; // A second assignment to x + print(x); +``` + + Here `x` is assigned twice. SSA form will rename the `x` variables and add a + `Phi` function to make a program with the same semantics but all (renamed) + variables are assigned exactly once (x0, x1, x2 are all assigned once): + +``` + if( rand ) x_0=2; + else x_1=3; + x_2 = phi( x_0, x_1 ); + print(x_2); +``` + + Most source languages do not start this way. Most modern compilers move to SSA + at some point, because it allows for fast and simple optimizations (e.g. SCCP). + + +* Phi, PhiNode - Variously "funny", "fake" or the greek `Ø` character. A pure + mathematical "function" which picks between its arguments based on control + flow and a key part of SSA form. Scare quotes on "function" because normal + functions do not pick their arguments based on control flow. Phi functions + are generally implemented with zero cost in machine code, by carefully + arranging machine registers. + + +* AST: Abstract Syntax Tree - another IR variant, where the lexical structure + of the program has been converted to a tree. After some tree-based work, the + AST is generally converted to a graph-based IR, described above, because trees + are hard to optimize with. This is very common in most compilers, and Sea of + Nodes skips this step. + + +* SCC: Strongly Connected Components - the Tarjen algorithm for finding loops + in a graph. Loops carry most of the work in a program so its important to + optimize them more heavily. Acronym does end in a "P", see SCCP. + + +* SCCP: Sparse Conditional Constant Propagation - A particularly fast and + simple way to analysis an IR. Used to optimize programs, generally by + replacing computations (which require some work) with constants (which require + almost no work). Acronym ends in a "P", see SCC. + + +* "Peep", or Peephole Optimization - a transformation which relies on only + local information as-if viewing the program "through a peephole". Something + like replacing `add x+0` with `x`, which removes an `add` instruction. This + transformation is correct without regard to the rest of the program. All + compilers do some kind of peephole optimizations, Sea of Nodes makes extensive + use of these. + + +* Fixed Point - If, in a series of possible transformations, we keep applying + transforms until no more apply - and we hit the same state no matter what order + we apply those transforms, we have hit a *fixed point*. This is a key + mathematically term, and you can find plenty math jargon online about it. The + SCCP optimization will stop once it hits a *fixed point*. + + For Sea of Nodes, we use this concept for Peeps as well, applying peephole + transformations iteratively from a worklist until no more apply. By careful + design we will hit a *fixed point* - our program graph IR will be the same + shape, irregardless of transformation ordering. This lets use run the + peepholes from a simple worklist algorithm. + + +* Types - a set of values a particular program point and Node can take on at + runtime. In the program semantics literature, types are sometimes described + by the allowed operations (`int` types can `add`, and `pointer` types can + `dereference`), and sometimes described as a set of values. + + For Sea of Nodes, we use the "set of values" type concept, and we don't + actually have any use for a separate value implementation. The sets of + values can get fancy; integer types are represented with a range like + `[0..9]` and integer constants by a short range e.g. `[3]`, the maximum + integer type is `[MIN_INT..MAX_INT]`. Types exist for integers, floats, + structs/class/records, function pointers, memory and control flow. + + Types are an important class in Sea of Nodes, and likes nodes and edges it is + very common to manipulate types. + + +* Lattice - a mathematical concept representing relationships between members + of a set. See [Lattice](https://en.wikipedia.org/wiki/Lattice_(order)). + + For Sea of Nodes, we use a Lattice over a set of Types (which themselves are + sets of values). Our lattice has a number of important mathematical + properties: symmetric complete bounded (ranked), and these allow our random + worklist algorithms to hit a *fixed point* in fast linear time. For actual + day-to-day working in the compiler, the lattice is in the background and is + hardly ever seen. \ No newline at end of file diff --git a/chapter01/README.md b/chapter01/README.md index 17e24ea0c..7513bb263 100644 --- a/chapter01/README.md +++ b/chapter01/README.md @@ -33,20 +33,35 @@ To implement this simple language, we introduce a few key components and data st Here is the [complete language grammar](docs/01-grammar.md) for this chapter. +## Assumptions -## Implementation Language +We assume that the reader is familiar with traditional linear intermediate +representations, and is familiar with terms such as Basic Block, Control Flow +Graph, etc. A brief description is given in [Chapter +00](../chapter00/README.MD). If necessary the reader can consult a standard +compiler text book or online materials. -Our implementation language is Java. We chose Java as it is widely available and understood. +## Architecture -## Assumptions +We construct the intermediate Sea of Nodes (SoN) representation directly as we +parse the language, producing nodes for every interesting piece of program text. -We assume that the reader is familiar with traditional linear intermediate representations, and is familiar with terms such as Basic Block, Control Flow Graph, etc. No attempt is made to explain these topics. -If necessary the reader can consult a standard compiler text book. +The nodes are optimized during and after parsing, reaching a *fixed point* of +optimization. In later chapters we will add more language features and nodes +to match, and optimizations for those new nodes. -## Architecture +There is no Abstract Syntax Tree representation, which is very common in +compilers. The reason for this is to demonstrate a key benefit of the SoN IR: +a number of peephole optimizations can be performed while parsing a language. +This aspect is more fully explored from [Chapter 2](../chapter02/README.md) +onwards. + +The Nodes in a Sea of Nodes are coded directly as Java objects in the classic +object-oriented programming style. These Nodes are also nodes in a *graph* +(hence the name "nodes"), and there are *edges* between the nodes; these edges +are direct pointers to nodes (there is no edge structure which is somewhat +common in other graph representations). -We construct the intermediate Sea of Nodes (SoN) representation directly as we parse the language. There is no Abstract Syntax Tree representation. The reason for this is to demonstrate a key benefit of the SoN IR: -a number of pessimistic peephole optimizations can be performed while parsing a language. This aspect is more fully explored from [Chapter 2](../chapter02/README.md) onwards. ## Data Structures @@ -59,31 +74,39 @@ Our data structures are based upon the descriptions provided in following papers * [EasySSA](https://www.dropbox.com/scl/fi/0ww4sgl3ynep9hhe3i4xn/EasySSA.pdf?rlkey=2cp78hzxke62flkmyneiebzoz&dl=0) * [SeaOfNodes](https://www.dropbox.com/scl/fi/cxykfvlzsmlcatyg6rlbt/SeaOfNodes.pdf?rlkey=z6o7y3rwr6atrejilcze6r8x0&e=1&dl=0) -Following the lead from above, we represent our intermediate representation using an object oriented data model. Details of the -representation follow. ### Intermediate Representation as a Graph of Nodes -The intermediate representation is a graph of Node objects. The `Node` class is the base type for objects in the IR graph. -The `Node` class provides common capabilities that are inherited by all subtypes. -Each subtype implements semantics relevant to that subtype. +The intermediate representation is a graph of Node objects. The `Node` class is +the base type for objects in the IR graph. The `Node` class provides common +capabilities that are inherited by all subtypes. Each subtype implements +semantics relevant to that subtype. + +Each `Node` represents an instruction or opcode as it may appear in traditional IRs. -Each `Node` represents an "instruction" as it may appear in traditional IRs. ### Nodes are in a Graph -The key idea of the Sea of Nodes IR is that each Node is linked to other Nodes by def-use dependencies. -As this is such an important and fundamental aspect of the IR, it is important to understand how we implement this, and depict in graph visuals. +The key idea of the Sea of Nodes IR is that each Node is linked to other Nodes +by def-use dependencies. As this is such an important and fundamental aspect +of the IR, it is important to understand how we implement this, and depict in +graph visuals. -The base `Node` class maintains a list of Nodes that are inputs to it. An input is an edge from a "def" to a "use". What this means is that if `B` is definition, and `A` uses `B`, -then there is a def-use edge from `B` to `A`. +The base `Node` class maintains a list of Nodes that are inputs to it. An +input is an edge from a definition to a use, hence def-use. What this means is +that if `B` is definition, and `A` uses `B`, then there is a def-use edge from +`B` to `A`. + +The inputs of a node are ordered and the order has a semantic meaning. +The outputs are unordered at least until the program is scheduled. +The scheduling algorithm will be introduced in [Chapter 11](../chapter11/README.md) Visually we show an arrow from the "use" to the "def". Here is an example: ![Use Def](./docs/01-use-def.svg) -From an implementation point of view, our `Node` type also maintains a reverse link. -This means that in the above scenario: +From an implementation point of view, our `Node` type also maintains a reverse +link called a use-def edge. This means that in the above scenario: * Since `A` is a "use" of `B`, then `B` will appear in `A`'s list of inputs. * Conversely, `B` maintains a list of outputs, and `A` will appear in this list. @@ -131,23 +154,49 @@ The following control and data nodes appear in this chapter. | Return | Control | Represents the termination of a function | Predecessor control node, Data node value | Return value of the function | | Constant | Data | Represents constants such as integer literals | None, however Start node is set as input to enable graph walking | Value of the constant | -Within a traditional basic block, instructions are executed in sequence. In the Sea of Nodes model, the correct sequence of instructions is determined by a scheduling -algorithm that depends only on dependencies between nodes (including control dependencies) that are explicit as edges in the graph. This enables a number of optimizations -at very little cost (nearly always small constant time) because all dependencies are always available. + +Within a traditional basic block, instructions are executed in sequence. In +the Sea of Nodes model, the correct sequence of instructions is determined by a +scheduling algorithm that depends only on dependencies between nodes (including +control dependencies) that are explicit as edges in the graph. This enables a +number of optimizations at very little cost (nearly always small constant time) +because all dependencies are always available. + + +### No distinction between Control and Data + +Yes, obviously control nodes and data nodes are distinct - but from an IR +manipulation point-of-view, there is no distinction. This is very different +from traditional compilers, where functions to manipulation the CFG with BB +nodes are very different from the functions to manipulate BBs with +instructions. + +The same graph manipulation primitives work on both; e.g. `node.addDef(newDef)` +or `node.setDef(idx,newDef)`. Graph edges happen between nodes of all types, +and there is no difference between def-use edges carrying control or data. +Graph walker algorithms walk the same. Data flow, type flow, add, remove, +[fold, staple, and mutilate](https://idioms.thefreedictionary.com/fold%2c+spindle%2c+or+mutilate) +graph manipulations work all the same. + +This single-level design **greatly** simplifies the compiler, and makes it much +easier to do anything and everything. + ### Unique Node ID -Each node is assigned a unique dense integer Node ID when created. This ID is +Each node is assigned a unique dense integer Node ID when created. This ID is useful for debugging, efficiently computing equality and e.g. as an index into a bit vector, which in turn is used to efficiently visit a (possibly cyclic) graph. We discuss Node equality in [Chapter 9](../chapter09/README.md). + ### Start Node The Start node represents the start of the function. For now, we do not have any inputs to Start because our function does not yet accept parameters. When we add parameters the value of Start will be a tuple, and will require Projections to extract the values. We discuss this in detail in [Chapter 4](../chapter04/README.md). + ### Constant Node A Constant node represents a constant value. At present, the only constants @@ -183,5 +232,5 @@ return 1; * Control nodes appear as square boxes with yellow background * Control edges are in bold red -* The edges from Constants to Start are shown in dotted lines as these are not true control edges -* We label each edge with its position in the `_inputs` array, thus `0` means the edge is `_inputs[0]`. +* The edges from Constants to Start are shown in dotted lines as these are not true control edges (no semantic meaning) +* We label each edge with its position in the `_inputs` array, thus `0` means the edge is `_inputs[0]` diff --git a/chapter02/README.md b/chapter02/README.md index 8799863f9..935463d34 100644 --- a/chapter02/README.md +++ b/chapter02/README.md @@ -51,7 +51,7 @@ equality. This is by far the most common case. In [Chapter In both cases the choice of value-vs-reference equality is intentional: it is *never* correct to "just pick one or the other kind of equality". When in doubt check the context: only *Global Value Numbering* uses value equality; -everywhere we mean reference equality. +everywhere else we mean reference equality. ## Peephole Optimizations @@ -73,19 +73,23 @@ parse an `Add(1,2)`, the peephole rule for constant math replaces the Add with a constant `3`. At this point, we also *kill* the unused `Add`, which recursively may *kill* the unused constants `1` and `2`. +Here, figuring out that the addition becomes a constant`3` is called *constant +folding* and replacing the `Add` node with the constant `3` is called *constant +propagation*. In general, *folding* computes a constant from known inputs, and +*propagation* moves it to the uses which can enable more folding. -## Constant Folding and Constant Propagation +## Constants, Values, Types In this chapter and next we focus on a particular peephole optimization: -constant folding and constant propagation. Since we do not have non-constant values -until [Chapter 4](../chapter04/README.md), the main feature we demonstrate now is constant folding. -However, we introduce some additional ideas into the compiler at this stage, to -set the scene for Chapter 4. +constant folding and constant propagation. Since we do not have non-constant +values until [Chapter 4](../chapter04/README.md), the main feature we +demonstrate now is constant folding. However, we introduce some additional +ideas into the compiler at this stage to set the scene for Chapter 4. It is useful for the compiler to know at various points of the program whether -a node's value is a constant. The compiler can use this knowledge to perform various -optimizations such as: +a node's value is a constant. The compiler can use this knowledge to perform +various optimizations such as: * Evaluate expressions at compile time and replace an expression with a constant. This idea can be extended in a number of ways and is called @@ -129,8 +133,10 @@ Our lattice elements can be one of three types: * The lowest is "bottom", denoted by ⊥; assigning ⊥ means that we know that the Node's value is **not** a compile time constant. +`top` and `bottom` are often referred to as the *base cases* or *simple types* of the lattice. + An invariant of peephole optimizations is that the type of a Node always moves -*up* the lattice (towards "top"); peepholes are *pessmistic* assuming the worst +*up* the lattice (towards "top"); peepholes are *pessimistic* assuming the worst until they can prove better. A later *optimistic* optimization will start all Nodes at *top* and move Types *down* the lattice as eager assumptions are proven wrong. @@ -139,7 +145,7 @@ In later chapters we will explore extending this lattice, as it frequently forms the heart of core optimizations we want our compiler to do. We add a `_type` field to every Node, to store its current computed best -`Type`. We need a field to keep the optimizer runtime linear, and later when +`Type`. We need a field to keep the optimizer runtime linear(`TBD`), and later when doing an optimistic version of constant propagation (called [Sparse Conditional Constant Propagation](https://en.wikipedia.org/wiki/Sparse_conditional_constant_propagation)). @@ -149,9 +155,12 @@ Both nodes are equally peepholed and optimized, and this will be covered starting in [Chapter 4](../chapter04/README.md) and [Chapter 5](../chapter05/README.md). - There are other important properties of the Lattice that we discuss in [Chapter -4](../chapter04/README.md) and [Chapter 10](../chapter10/README.md), such as the "meet" and "join" operators and their rules. +4](../chapter04/README.md) and [Chapter 10](../chapter10/README.md), such as +the "meet" and "join" operators and their rules. + +Note: some lattice presentations will have the visual presentation reversed +from the direction we are using; this is generally obvious from context. ## Nodes Pre Peephole Optimization diff --git a/chapter04/README.md b/chapter04/README.md index a7b223b4c..cbcb87de9 100644 --- a/chapter04/README.md +++ b/chapter04/README.md @@ -226,6 +226,9 @@ define the resulting type when we combine integer values. In the lattice diagram you can start from the two elements being `meet` and follow the two arrows down the graph to the first point they meet. +In Simple the `meet` is the union of possible values as opposed to the intersection. + + | | IntBot | Con1 | Con2 | IntTop | |--------|--------|--------|--------|--------| | IntBot | IntBot | IntBot | IntBot | IntBot |