ControlFlow

Control-Flow Lifting

Bytecode instructions can be grouped into blocks about which we can reason. Internally, we have a sequence of blocks. Each block has a sequence of instructions. Only the last instruction of a block may be a jump (GOTO) or conditional jump (IFEQ etc). Without such jump, blocks are executed sequentially. Return statements are special and are rewritten such that only the last statement in the last block may contain the only return statement in the method. Exceptions complicate things, creating smaller blocks and additional edges, discussed elsewhere.

We distinguish between the next block, that is the block that sequentially follows in the bytecode, and a block's successors, which might be one or two blocks anywhere in the method. The next block might be a successor, but does not have to be.

Basic design

Every block has a context condition that is stored in a distinct variable associated with this block. This condition is initially False for every block.

When executing the method, the first block receives the context of the method (provided as additional method parameter) as condition. (If the variable is not used elsewhere, we can just use the method's context parameter as condition variable for the first block.)

When executing a block with a contradictory condition, one can jump to the next block (not necessarily a successor).

After executing a block, this blocks condition is set to False and the successor blocks' conditions are updated:

if there is an unconditional jump, the successor block's condition is <successor.condition> or <thisblock.condition>
if there is a condition that evaluates to true under condition <A> (evaluated using a method like V.when), the then-successor's condition is <then-successor.condition> or (<thisblock.condition> and <A>) and the else-successor's condition is <else-successor.condition> or (<thisblock.condition> andNot <A>)

If either successor is before the current block and has a satisfiable condition, we jump back to the successor that is further back. If neither successor is before the current block, we proceed execution with the next block (not necessarily a successor). Note that only at most one successor can be before the current block due to the way bytecode instructions work.

Rationale

The intuition behind this approach is as follows:

During a method executions, we can generally execute every block multiple times with the same or with different conditions. Two blocks with mutually exclusive conditions can be executed in any order. Our splitting and joining of contexts at control-flow decisions ensures that at every time all blocks have mutually exclusive conditions. Therefore, we can nondeterministically execute any block with a satisfiable context at any time, iterating until all blocks have unsatisfiable contexts; the order in which satisfiable blocks are executed technically does not matter.
The or in the update of conditions on control-flow instructions ensures that we always merge executions if possible. When both branches of an if statement jump to the same block, the block is executed under a condition created from both branches.
The current design always jumps back to the earliest block that has a satisfiable condition. As after execution each block turns False, we can just move forward until we make a block further back satisfiable -- that's where we jump to. The simple invariant is that all blocks behind the current execution point have unsatisfiable conditions.
This insight allows to explore alternative schemes and potential optimizations (potentially requiring empirical exploration to identify what is most efficient)
- We could fork the execution on every decision and execute the function to the end, jumping back from there to the last decision and exploring the next path under a missing condition (similar to SPLat), but this would forgo any joining.
- We could also just go sequentially through the method multiple times until all blocks of a contradictory condition, but we may forgot sharing opportunities, that we currently enforce by executing later blocks only when all previous blocks have been executed.
- We can also define a different execution order for blocks, while still preserving the idea to only jump backward within that order. This might be worth considering if we identify a heuristic for identifying likely join points to maximize sharing.
- There might also be a more efficient mechanism in which we can chose at runtime which satisfiable block to execute next, for example executing those blocks that are least merged yet, jumping back and forth until all blocks have contradictory conditions.

Return instructions

Our scheme does not allow multiple return instructions, as we cannot terminate the execution prematurely and may need to return different values under different conditions. We therefore rewrite the bytecode: We insert a new variable for the return result and rewrite every return instruction as a (variational) write to that variable followed by a jump to a new block at the very end of the method. That new block reads the new variable and returns it. That block may not be executed before every other block in the method has an unsatisfiable condition, which our jump-backward heuristic conveniently ensures. Technically this last block does not need an own condition, it's executed with the method's context condition and does not perform any side effects.

Unbalanced stacks in blocks

A block may take a value from the stack left there by a previous block. If we jump around in different orders or execute alternative blocks that may both leave a variational value on the nonvariational stack, this scheme may break down.

A basic workaround is to ensure that all blocks are balanced toward the stack. That is they assume an empty stack coming into the block and leave an empty stack at the end. To ensure this, all extra stack values are written into local variables and read again into the stack in the next block. The variational ASTORE mechanism ensures that writes to variables are performed in the correct context only, thus merging results when multiple blocks would leave a value on the stack. Performing this rewrite requires that we identify which block consumes extra stack values and from which other block those are created, such that we can map the extra stack values to the correct variables (nontrivial analysis).

Potential optimization: To avoid a large number of extra variables and conditional store instructions, we might recognize common control-flow patterns and handle them as a single special block. For example, the following pattern of four blocks A, B, C, D is quite common: Block A has an if instruction with two successors: B and C. Both B and C jump to D. B and C each leave a value on the stack, which D consumes. Instead of creating an extra variable for the stack value, we could let both B and C each push their value, after A has pushed a copy of the condition on the stack, and then create a choice from both of them. However, it is unclear whether the additional analysis to identify whether both branches have been executed actually reduces the effort enough to make this optimization worth the effort.

Notes

Each block should have either have more than one successor or more than one predecessor (including exception edges). Otherwise blocks can be merged to reduce satisfiability checks and extra variables needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ControlFlow

Control-Flow Lifting

Basic design

Rationale

Return instructions

Unbalanced stacks in blocks

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally