PVM (Portable Virtual Machine) is an ultra-lightweight virtual machine designed to run on microcontrollers (MCUs) for small automation tasks. It occupies only 1.5 kB of ROM in average 32-bit ARM-based MCU. It provides a platform to execute bytecode generated by the MPC compiler, enabling efficient and portable code execution across different MCU architectures.
- Lightweight: Optimized for resource-constrained environments even low-range MCUs. Needs only 1.5 kB of ROM and a few bytes of RAM.
- Portable: Designed to run on various MCU architectures, needs no heap or other dynamic memory management routines and even standard libraries.
- Efficient: Minimizes memory usage with 8-bit opcode size and efficient constant load abilities. It takes only one byte to load literals from 0 to 127, and 2 to 6 bytes to load a 32-bit integer with the MSB set depending on constant optimization.
- Extensible: Supports built-in functions to extend functionality and communicate with peripherals.
- Safe: Includes error handling for stack underflow/overflow, invalid function indices, and program counter overrun.
The library provides a solid opcode executor that accepts only a virtual machine instance reference (pointer).
Supplementary routines for basic checking binary executable structure and function to reset an instance of a virtual machine are also provided.
PVM allows executing binary scripts compiled by the MPC compiler, which have a defined structure. The format of a PVM executable is presented in the table below:
| Field | Size in bytes | Description |
|---|---|---|
vm_version |
1 | The version code of the PVM required to run this executable. |
size |
2 | The total size of the executable in bytes excluding fixed size fields. |
functions_count |
1 | The number of functions defined in the executable. |
constants_count |
1 | The number of constants defined in the executable. |
main_variables_count |
1 | The number of main variables used by the executable. |
functions |
Variable | The array of function descriptors. Each function is described by a structure. |
constants |
Variable | The array of constants used by the executable. |
code |
Variable | The bytecode of the executable, containing the instructions to be executed. |
Fields sizes may be easily extended in future versions of the executable format extending limitation for number of corresponding elements
This field indicates the minimum version of the PVM required to execute this binary. It ensures that the executable is compatible with the version of the PVM running on the MCU. If the PVM version is lower than the specified minimum version, the executable will not be loaded.
This field specifies the total size of all variable fields in the executable in bytes excluding fixed size fields. It includes the size of functions description table, constants, and code sections. This value is also used to verify the integrity of the executable during loading.
This field specifies the number of functions defined in the executable. Each function is described by a structure that includes its address, argument size, variable size, return size, and flags indicating if the function is variadic or a system library function. The compiler sorts them by usage and removes unused user and system built-in functions from this table.
This field specifies the number of constants defined in the executable. Constants are used to store fixed values that do not change during the execution of the program. The compiler determines them walking through the compiled module then sorts by frequency of use to access efficiently. They are stored in an array following the functions section.
This field specifies the number of main function variables used by the executable. Main variables are not global
variables and not accessible throughout global keyword. They are initialized at the start of the program with 0 and
persist until the program terminates.
This variable-sized field is an array of function definitions. Each function is described by a function descriptor structure. The format of a PVM executable is presented in the table below:
| Field | Size | Description |
|---|---|---|
address |
2 bytes | The address of the function in the code section. This address points to the starting bytecode of the function. |
arguments_count |
1 byte | The number of the arguments passed to the function. This value indicates the number of bytes required to store the arguments on the data stack. |
variables_count |
1 byte | The number of the local variables used by the function. This value indicates the number of bytes required to store the local variables on the data stack. |
returns_count |
6 bits | The number of the return values from the function. This value indicates the number of bytes required to store the return values on the data stack. |
is_variadic |
1 bit | A flag indicating if the function accepts a variable number of arguments. If set, the function can accept additional arguments beyond the specified args_size. |
is_built_in |
1 bit | A flag indicating if the function is a system library function. System library functions are built-in functions provided by the PVM and are not part of the user-defined code. |
This variable-sized field is an array of constants used by the executable. Constants are fixed values that do not
change during the execution of the program. They are stored in an array following the functions section and are
accessed using the LDC instruction.
This variable-sized field contains the bytecode of the executable. The bytecode consists of a series of 8-bit instructions that are executed by the PVM. The instructions are designed to be compact and efficient, with a focus on minimizing resource usage. The code section follows the constants section in the executable.
The PVM instance maintains the state of the virtual machine, including several critical components that ensure the proper execution and management of the bytecode. Below is a detailed explanation of each field within the PVM instance:
This field holds the current time in milliseconds. It is used to track the elapsed time for sleep instructions. The
timer is set using the now_ms function, which returns the current time.
This field specifies the duration for which the PVM should sleep. When a sleep instruction (SLP) is executed, the
timeout value is set, and the PVM enters a sleep state until the specified duration has elapsed. The combination of the
timer and timeout fields allows the PVM to handle delays accurately.
The data stack is a crucial component of the PVM instance, used for storing temporary data during the execution of
instructions. It is implemented as an array of pvm_data_t with a fixed size defined by PVM_DATA_STACK_SIZE during
compile time.
The call stack is used to manage function calls and returns. It is implemented as an array of structures, each
containing the return address, the start of the variables in the data stack, the size of the arguments, and the
function index. The call stack size is defined by PVM_CALL_STACK_SIZE during compile time.
The call stack allows the PVM to keep track of the execution context for each function call, enabling nested function calls and proper return handling. The stack top pointer keeps track of the current position in the call stack.
The program counter (PC) is a register that points to the address of the next instruction to be executed. It is incremented automatically as instructions are fetched and executed.
This field is a pointer that keeps track of the current position in the data stack. It is incremented when data is pushed onto the stack and decremented when data is popped from the stack.
This field is a pointer that keeps track of the current position in the call stack. It is incremented when a function is called and decremented when a function returns.
The persistent data section of the PVM instance includes two fields: binding and exe.
This field is used to store binding-specific data that persists across resets of the PVM instance. It is a user-defined field that can be used to store context-specific information like id of a dedicated output of the MCU this virtual machine is tied to.
This field is a pointer to the PVM executable structure described above.
The persistent data section ensures that the PVM instance can be reset without losing the executable and binding information, allowing for consistent execution across resets.
PVM supports built-in functions to extend its functionality. These functions are implemented in C and can be called directly by the VM. See the Usage section for more information.
PVM provides comprehensive error handling to ensure robust operation. See the Usage section for more information.
- An MCU with a C compiler.
- The MPC compiler for compiling source code into PVM bytecode.
-
Clone the Repository:
git clone https://github.com/your-repo/pvm.git cd pvmOtherwise, add as a submodule.
-
Include PVM in Your Project:
- Add the PVM header file (
pvm.h) to your project or include theCMakeLists.txtfile. - Implement the necessary built-in functions and the
now_msfunction.
- Add the PVM header file (
-
Compile Your Script:
- Compile your script(s) with the MPC compiler to generate the PVM bytecode.
- Include the generated bytecode in your MCU firmware or implement a mechanism of dynamic uploading and storage.
Initialize the PVM instance and load the executable:
#include "pvm.h"
pvm_t vm;
const pvm_exe_t *exe = ...; // Load your executable here
void init_pvm() {
vm.persist.exe = exe;
pvm_reset(&vm);
}Execute the instructions in the PVM:
pvm_errno_t err = pvm_op(&vm);
if (err != PVM_NO_ERROR) {
// Handle error
}PVM includes a comprehensive error handling mechanism to manage runtime errors returned in the pvm_errno enum in the
PVM (Portable Virtual Machine) defines various error codes that can be returned by PVM functions to indicate different
types of errors. These error codes are used for error handling and debugging within the PVM. Here's a detailed
explanation of each enum value:
-
PVM_NO_ERROR: Indicates that no error has occurred. This is the default success code.
-
PVM_MAIN_RETURN: Indicates that the main function has returned. This is typically used to signal the end of the main program execution.
-
PVM_CALL_STACK_UNDERFLOW (PVM_MAIN_RETURN): Indicates that the call stack has underflowed. This means that an attempt was made to pop from an empty call stack. This error code is aliased to
PVM_MAIN_RETURN. -
PVM_CALL_STACK_OVERFLOW: Indicates that the call stack has overflowed. This means that an attempt was made to push more frames onto the call stack than it can hold.
-
PVM_DATA_STACK_UNDERFLOW: Indicates that the data stack has underflowed. This means that an attempt was made to pop from an empty data stack.
-
PVM_DATA_STACK_OVERFLOW: Indicates that the data stack has overflowed. This means that an attempt was made to push more values onto the data stack than it can hold.
-
PVM_ARG_OUT_OF_STACK: Indicates that there are not enough stack to hold arguments for a function call. This means that an attempt was made to call a function and push more arguments into the stack than it can hold.
-
PVM_VAR_OUT_OF_STACK: Indicates that there are not enough stack to hold variables for a function call. This means that an attempt was made to call a function and push more variables into the stack than it can hold.
-
PVM_RETURN_OUT_OF_STACK: Indicates that there are not enough stack to hold return values for a function call. This means that an attempt was made to call a function and push more return values into the stack than it can hold.
-
PVM_DATA_STACK_SMASHED: Indicates that the data stack has been corrupted. This means that the stack has been overwritten or otherwise tampered with, leading to an inconsistent state.
-
PVM_PC_OVERRUN: Indicates that the program counter (PC) has overrun. This means that the PC has exceeded the bounds of the executable code, typically due to an invalid jump or call instruction.
-
PVM_EXE_NO_FUNCTION: Indicates that a function index is out of bounds. This means that an attempt was made to call a function that does not exist in the executable's function table.
-
PVM_BUILTIN_NO_FUNCTION: Indicates that a built-in function index is out of bounds. This means that an attempt was made to call a built-in function that does not exist in the built-in function table.
-
PVM_NO_VARIABLE: Indicates that a variable index is out of bounds. This means that an attempt was made to access a variable that does not exist in the current scope.
-
PVM_NO_CONSTANT: Indicates that a constant index is out of bounds. This means that an attempt was made to access a constant that does not exist in the executable's constant table.
-
PVM_VARIADIC_SIZE: Indicates that the size of variadic arguments is incorrect. This means that the number of variadic arguments passed to a function does not match the expected number.
These error codes help in identifying and handling various runtime errors that can occur during the execution of the PVM. They provide a way to diagnose issues and ensure the robust operation of the virtual machine.
Implement the built-in functions required by your application:
void my_builtin_function(pvm_t *vm, pvm_data_t arguments[], pvm_data_stack_t args_size) {
// Implement your built-in function here
}
const pvm_builtins_t pvm_builtins[] = {
{ my_builtin_function },
// Add more built-in functions here
};
const size_t pvm_builtins_size = sizeof(pvm_builtins) / sizeof(pvm_builtins[0]);Arguments are passed to a built-in function as an array pointer, the number of arguments is passed in the args_size
parameter.
Variadic functions in PVM can accept a variable number of arguments. The number of variadic arguments is specified by pushing a value onto the data stack before calling the function this is done by the MPC compiler automatically. This value is then added to the fixed number of arguments and used to determine the total number of arguments passed to a function.
Consider a variadic function print that can accept any number of arguments:
void print(pvm_t *vm, pvm_data_t arguments[], pvm_data_stack_t args_size) {
for (int i = 0; i < args_size; i++) {
// Print each argument
printf("%d ", arguments[i]);
}
printf("\n");
}If a build-in function returns one or tuple of values, they should be put into the same arguments array upon return.
Consider a function get_date that returns a tuple of values:
void get_date(pvm_t *vm, pvm_data_t arguments[], pvm_data_stack_t args_size) {
time_t t;
time(&t);
const struct tm *tm = localtime(&t);
arguments[0] = tm->tm_year + 1900; // year
arguments[1] = tm->tm_mon + 1; // month
arguments[2] = tm->tm_mday; // date
}You can use simple debugging of each opcode by defining a custom header file with static functions (or macros) that
print debugging information and specify this header upon CMake configure passing -DPVM_DEBUG macro. For example,
using the header included in the samples folder:
cmake -DPVM_DEBUG='samples/debug.h' ..This will produce the necessary code to output information about PC, instruction and the stack dump like this for the example code below:
PC:0 PSH 0 → {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
PC:1 STV [2] 0 ← {0, 0, 0, 0, 0, 0, 0, 0, 0}
PC:2 PSH 0 → {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
PC:3 LDC [0] 18000 → {18000, 0, 0, 0, 0, 0, 0, 0, 0, 0}
PC:4 CAL <*5> (1) = {4, 4, 2025, 0, 0, 0, 0, 0, 0, 0, 0, 0}
PC:5 STV [5] 4 ← {4, 2025, 0, 0, 0, 4, 0, 0, 0, 0, 0}
PC:6 STV [4] 4 ← {2025, 0, 0, 0, 4, 4, 0, 0, 0, 0}
PC:7 STV [3] 2025 ← {0, 0, 0, 4, 4, 2025, 0, 0, 0}
...
Simple usage example can be found in the samples folder of the project.
The bytecode format consists of a series of instructions, each represented by a single byte. The instructions are designed to be compact and efficient, with a focus on minimizing resource usage.
- PSH: Push a constant literal value onto the data stack.
- PSC: Push a constant complement value onto the data stack, appending 5 bits to the existing value.
- LDC: Load a constant from the constant array using index.
- LDV: Load a variable from the data stack using variable index.
- STV: Store a value in a variable on the data stack using variable index.
- CAL: Call a function by index.
- RET: Return from a function.
- JMP: Jump to a specific offset. This instruction can hold offset up to 18 instructions forward.
- JMB: Jump back to a specific offset.
- SLP: Sleep for a specified duration.
- ADD, SUB, MUL, DIV: Arithmetic operations.
- AND, IOR, XOR: Logical operations.
- NEG, INV, INC, DEC: Unary operations.
- BZE, BNZ, BEQ, BNE, BGT, BLT, BGE, BLE: Branching operations.
Below is an example of a Python program that demonstrates the use of bytecode, constants, and control flow:
FUNCTIONS: 5, CONSTANTS: 1
FUNCTIONS DESCRIPTORS (5)
ADDRESS: None; ARGUMENTS: 1; VARIABLES: 0; RETURNS: 0; func output(action) <0>; 2 usage(s)
ADDRESS: None; ARGUMENTS: 0; VARIABLES: 0; RETURNS: 0; func print() <1>; 1 usage(s)
ADDRESS: None; ARGUMENTS: 1; VARIABLES: 0; RETURNS: 3; func get_realtime(timezone_offset) <2>; 1 usage(s)
ADDRESS: None; ARGUMENTS: 1; VARIABLES: 0; RETURNS: 3; func get_date(timezone_offset) <3>; 1 usage(s)
ADDRESS: None; ARGUMENTS: 1; VARIABLES: 0; RETURNS: 1; func get_weekday(timezone_offset) <4>; 1 usage(s)
CONSTANTS (1)
VALUE: 18000; int = 18000 <0>; 3 usage(s)
CODE:
func main()
00000 00 PSH 0 ; default_state = Action.ACTION_OFF
00001 F2 STV var default_state <2> ; default_state = Action.ACTION_OFF
label_1: ; while True:
00002 00 PSH const TIMEZONE_OFFSET = 18000 <0> ; year, month, date = get_date(TIMEZONE_OFFSET)
00003 B6 LDC ; year, month, date = get_date(TIMEZONE_OFFSET)
00004 D3 CAL func get_date(timezone_offset) <3> ; year, month, date = get_date(TIMEZONE_OFFSET)
00005 F5 STV var date <5> ; year, month, date = get_date(TIMEZONE_OFFSET)
00006 F4 STV var month <4> ; year, month, date = get_date(TIMEZONE_OFFSET)
00007 F3 STV var year <3> ; year, month, date = get_date(TIMEZONE_OFFSET)
00008 00 PSH const TIMEZONE_OFFSET = 18000 <0> ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00009 B6 LDC ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00010 D2 CAL func get_realtime(timezone_offset) <2> ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00011 F8 STV var second <8> ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00012 F7 STV var minute <7> ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00013 F6 STV var hour <6> ; hour, minute, second = get_realtime(TIMEZONE_OFFSET)
00014 00 PSH const TIMEZONE_OFFSET = 18000 <0> ; weekday = get_weekday(TIMEZONE_OFFSET)
00015 B6 LDC ; weekday = get_weekday(TIMEZONE_OFFSET)
00016 D4 CAL func get_weekday(timezone_offset) <4> ; weekday = get_weekday(TIMEZONE_OFFSET)
00017 F0 STV var weekday <0> ; weekday = get_weekday(TIMEZONE_OFFSET)
00018 E3 LDV var year <3> ; print(year, month, date, hour, minute, second, weekday)
00019 E4 LDV var month <4> ; print(year, month, date, hour, minute, second, weekday)
00020 E5 LDV var date <5> ; print(year, month, date, hour, minute, second, weekday)
00021 E6 LDV var hour <6> ; print(year, month, date, hour, minute, second, weekday)
00022 E7 LDV var minute <7> ; print(year, month, date, hour, minute, second, weekday)
00023 E8 LDV var second <8> ; print(year, month, date, hour, minute, second, weekday)
00024 E0 LDV var weekday <0> ; print(year, month, date, hour, minute, second, weekday)
00025 07 PSH 7 ; <variadic args count>
00026 D1 CAL func print() <1> ; print(year, month, date, hour, minute, second, weekday)
00027 00 PSH 0 ; state = Action.ACTION_OFF
00028 F1 STV var state <1> ; state = Action.ACTION_OFF
00029 01 PSH 1 ; if weekday >= 1:
00030 E0 LDV var weekday <0> ; if weekday >= 1:
00031 05 PSH 5 ; if weekday >= 1:
00032 A5 BLT label_2 ; if weekday >= 1:
00033 05 PSH 5 ; if weekday <= 5:
00034 E0 LDV var weekday <0> ; if weekday <= 5:
00035 01 PSH 1 ; if weekday <= 5:
00036 A4 BGT label_2 ; if weekday <= 5:
00037 01 PSH 1 ; output(Action.ACTION_ON)
00038 D0 CAL func output(action) <0> ; output(Action.ACTION_ON)
label_2:
00039 E1 LDV var state <1> ; if default_state != state:
00040 E2 LDV var default_state <2> ; if default_state != state:
00041 03 PSH 3 ; if default_state != state:
00042 A2 BEQ label_3 ; if default_state != state:
00043 E1 LDV var state <1> ; default_state = state
00044 F2 STV var default_state <2> ; default_state = state
00045 E1 LDV var state <1> ; output(state)
00046 D0 CAL func output(action) <0> ; output(state)
label_3:
00047 1F PSH 31 ; sleep(1000)
00048 88 PSC 8 ; sleep(1000)
00049 B4 SLP ; sleep(1000)
00050 31 PSH 49 ; while True:
00051 B7 JMB label_1 ; while True:This Portable Virtual Machine for Microcontrollers (pvm) is licensed under the LGPL License. See the LICENSE file for more details.
Contributions are welcome! Please open an issue or submit a pull request.
ma5ter