Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions ASimpleReply.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,15 @@ In any case, this problem (and its solution) seem... trivial to me. Doesn't car
## Visitation order

#### "Finding a good visitation order is hard"
Yes, so don't bother. Use a worklist! Hard guaranteed linear time to completion, fast and simple. Back-edges are needed and they are well worth the cost. I start C2 without them, couldn't get closure with a few passes, added backedges and everything got lots *faster*. Edge maintenance code deals with the work, so I don't have to think about it, its all cache-resident L1 hits. Fast, fast, fast.
Yes, so don't bother. Use a worklist! Hard guaranteed linear time to completion, fast and simple. Back-edges are needed and they are well worth the cost. I started C2 without them, couldn't get closure with a few passes, added backedges and everything got lots *faster*. Edge maintenance code deals with the work, so I don't have to think about it, its all cache-resident L1 hits. Fast, fast, fast.

## Other things?

#### "Dead code elimination"
free with backedges; basically when the backedge/ref-count drops to zero, recursively delete nodes. There's maybe 10 lines dedicated to this in Simple.
Free with backedges; basically when the backedge/ref-count drops to zero, recursively delete nodes. There's maybe 10 lines dedicated to this in Simple.

#### "Hard to introduce new control flow"
Hello compiler writers? Sorry, can't hardly believe this one; editing a CFG is morally the same as editing the SoN, what's the issue? In the specific `min` case, the goal is to pick a place in the CFG to expand... which means a place needs to be picked, which for SoN usually means after Global Code Motion when "places" (e.g. classic CFG) is available again.
Hello compiler writers? Sorry, can't hardly believe this one; editing a CFG is morally the same as editing the SoN, what's the issue? In the specific `min` case, the goal is to pick a place in the CFG to expand... which means a place needs to be picked, which for SoN usually means after Global Code Motion, when "places" (e.g. classic CFG) is available again.

#### "Hard to figure out what's inside a loop"
Again, Hello compiler writers? C2 certainly does lots of aggressive loop optimizations - which start by running a standard SCC, finding loop headers, walking the graph area constrained by the loop, producing a loop body for further manipulations... yada yada. Basically, I build loops and a loop tree via the Olde Fashioned method of running SCC.
Expand All @@ -62,7 +62,7 @@ Again, Hello compiler writers? C2 certainly does lots of aggressive loop optimi
Compiling is fast. He-said-She-said. Neen-neener. Data? Perf discussions with facts? Apples to apples comparisons? No? So how about "compilation speed vs code quality didn't meet our goals"?

#### "Cache-unfriendly"
Same (lack of) argument. Certainly when I did C2 I was highly focused on cache-friendly and based on the vast compile speedup I obtained over all competitions (at that time every AOT compiler, GCC, LLVM) I was largely successful. C2 compiles take far less footprint (both cache and memory) that my "competitors". Mostly C2 compiles, despite aggressive inlining, fit easily in L2, nearly always in L1. Getting this right is important, so I suspect other things happened to make it miss for V8. But also... no numbers from me either, so neen-neener again.
Same (lack of) argument. Certainly when I did C2 I was highly focused on cache-friendly and based on the vast compile speedup I obtained over all competitions (at that time every AOT compiler, GCC, LLVM) I was largely successful. C2 compiles take far less footprint (both cache and memory) that my "competitors". Mostly C2 compiles, despite aggressive inlining, fit easily in L2, nearly always in L1. Getting this right is important, so I suspect other things happened to make it miss for V8. But also... no numbers from me either, so neen-neener he-said-she-said again.

## postlog
I'll point out most of the above discussion is things that surely made V8's life harder - and slower - and are things I never would have done. OTOH I've not done a JS compiler, so I have no facts here also.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ intermediate representation.

The Simple language is styled after a subset of C or Java.

* [Chapter 0](chapter01/README.md): Motivation
* [Chapter 1](chapter01/README.md): Script that returns an integer literal, i.e., an empty function that takes no arguments and returns a single integer value. The `return` statement.
* [Chapter 2](chapter02/README.md): Simple binary arithmetic such as addition, subtraction, multiplication, division
with constants. Peephole optimization / simple constant folding.
Expand Down
188 changes: 188 additions & 0 deletions chapter00/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Chapter 0: Prolog and Motivation

This repo is intended to demonstrate the Sea-of-Nodes compiler IR, and contains
a fully fledge stand-alone compiler.

It is intended to be *readable* and *debuggable*, fast to learn and modify. It
is **not** designed to be a super fast compiler, although it is pretty quick.
It could be made much quicker with modest effort, and that may happen in some
later chapter.

As a demonstration compiler, it is **not** intended to be a production ready
(although it has a fairly large test suite).

This Ahead-of-Time compiler is not intended (yet) to be a Just-In-Time
compiler, although that may happen in a later chapter.

## Target Audience

My target audience is both for traditional compiler writers, and medium skill
programmers curious about compilers. I do expect some basics about how
compilers work; things like source-code-in and binaries-out. It will
definitely help to have some jargon words sorted out (see end of this document)
and be comfortable with under-graduate level graph algorithms.


## Why Java

Simple is written in Java. Why? Because Java has shown itself to be fast to
learn, write, and *debug*. It is certainly fast enough for large scale batch jobs
(see [The One Billion Row Challenge](https://github.com/gunnarmorling/1brc)).

A Just-In-Time variant of Simple may revist the implementation language
decision.


## Jargon

Each of these terms is common in compilers, and there is a wealth of literature
available online for each of them. Wikipedia is a great starting point for
learning more.


* IR: Intermediate Representation - source code is hard to directly manipulate,
and machines only understand Machine Code. The IR bridges this gap; source
code is translated into an IR, the IR is manipulated (e.g. type-check and
optimize), and finally the IR is converted to machine code (binary format).

The notion of an IR generally covers a high level CFG view, and a mid level BB
view and a low level instruction or opcode view.


* BB: Basic Block - a collection of IR instructions or opcodes. These are all
expected to execute from start to finish without any changes in control flow.
At different points in time, the opcodes might represent some high-level
language concept (e.g. allocation, or a function call), or might represent a
direct hardware instruction (`add r1,r2,r3` or `call malloc`).

In a traditional compiler, all opcodes are kept inside BB's and some care has to
be taken to move opcodes from one BB to another.

In the Sea of Nodes compiler, this restriction is dropped until right before
code generation.


* CFG: Control Flow Graph - a *graph*, where the *nodes* are BB's and the
*edges* represent changes in program execution flow. In the Sea of Nodes
compiler, the BB notion is mostly dropped and some normal-ish opcodes are the
CFG nodes.


* Nodes - nodes in a graph. For a traditional compiler there are two kinds of
nodes; nodes in a CFG are BB's and Nodes in a BB are instructions. For Sea of
Nodes, this distinction is blurred. The same kinds of Nodes and Edges are used
for both control flow and data.


* DU: Def-Use edges - a graph edge from a Defining node to a Using node. These
edges track the flow of values through a program and are required to give the
program its meaning. The reverse, Use-Def edges, are very useful for program
optimizations. Some early program IRs don't start with these, or periodically
build them from program names, and throw them away.

Sea of Nodes starts with these def-use edges (and use-def) and maintain them
throughout the compiler's lifetime.

The def-use edges can be thought of as a mapping between a node and its outputs.
Similarly, use-def edges can be thought of as a mapping between a node and its inputs.

To represent the SoN graph visually we use `use-def` edges but nodes also
have `def-use` edges, it is just a matter of how we look at them visually.


* SSA: Static Single Assigment - a program shape where all program values are
statically assigned exactly once. Example before SSA:

```
if( rand ) x=2; // A first assignment to x
else x=3; // A second assignment to x
print(x);
```

Here `x` is assigned twice. SSA form will rename the `x` variables and add a
`Phi` function to make a program with the same semantics but all (renamed)
variables are assigned exactly once (x0, x1, x2 are all assigned once):

```
if( rand ) x_0=2;
else x_1=3;
x_2 = phi( x_0, x_1 );
print(x_2);
```

Most source languages do not start this way. Most modern compilers move to SSA
at some point, because it allows for fast and simple optimizations (e.g. SCCP).


* Phi, PhiNode - Variously "funny", "fake" or the greek `Ø` character. A pure
mathematical "function" which picks between its arguments based on control
flow and a key part of SSA form. Scare quotes on "function" because normal
functions do not pick their arguments based on control flow. Phi functions
are generally implemented with zero cost in machine code, by carefully
arranging machine registers.


* AST: Abstract Syntax Tree - another IR variant, where the lexical structure
of the program has been converted to a tree. After some tree-based work, the
AST is generally converted to a graph-based IR, described above, because trees
are hard to optimize with. This is very common in most compilers, and Sea of
Nodes skips this step.


* SCC: Strongly Connected Components - the Tarjen algorithm for finding loops
in a graph. Loops carry most of the work in a program so its important to
optimize them more heavily. Acronym does end in a "P", see SCCP.


* SCCP: Sparse Conditional Constant Propagation - A particularly fast and
simple way to analysis an IR. Used to optimize programs, generally by
replacing computations (which require some work) with constants (which require
almost no work). Acronym ends in a "P", see SCC.


* "Peep", or Peephole Optimization - a transformation which relies on only
local information as-if viewing the program "through a peephole". Something
like replacing `add x+0` with `x`, which removes an `add` instruction. This
transformation is correct without regard to the rest of the program. All
compilers do some kind of peephole optimizations, Sea of Nodes makes extensive
use of these.


* Fixed Point - If, in a series of possible transformations, we keep applying
transforms until no more apply - and we hit the same state no matter what order
we apply those transforms, we have hit a *fixed point*. This is a key
mathematically term, and you can find plenty math jargon online about it. The
SCCP optimization will stop once it hits a *fixed point*.

For Sea of Nodes, we use this concept for Peeps as well, applying peephole
transformations iteratively from a worklist until no more apply. By careful
design we will hit a *fixed point* - our program graph IR will be the same
shape, irregardless of transformation ordering. This lets use run the
peepholes from a simple worklist algorithm.


* Types - a set of values a particular program point and Node can take on at
runtime. In the program semantics literature, types are sometimes described
by the allowed operations (`int` types can `add`, and `pointer` types can
`dereference`), and sometimes described as a set of values.

For Sea of Nodes, we use the "set of values" type concept, and we don't
actually have any use for a separate value implementation. The sets of
values can get fancy; integer types are represented with a range like
`[0..9]` and integer constants by a short range e.g. `[3]`, the maximum
integer type is `[MIN_INT..MAX_INT]`. Types exist for integers, floats,
structs/class/records, function pointers, memory and control flow.

Types are an important class in Sea of Nodes, and likes nodes and edges it is
very common to manipulate types.


* Lattice - a mathematical concept representing relationships between members
of a set. See [Lattice](https://en.wikipedia.org/wiki/Lattice_(order)).

For Sea of Nodes, we use a Lattice over a set of Types (which themselves are
sets of values). Our lattice has a number of important mathematical
properties: symmetric complete bounded (ranked), and these allow our random
worklist algorithms to hit a *fixed point* in fast linear time. For actual
day-to-day working in the compiler, the lattice is in the background and is
hardly ever seen.
Loading