π§ͺ GPIO round-trip at 480ns
Python, in hardware. 480ns GPIO. No interpreter. No C. Just RunPyXL.
TL;DR
- π Read the FAQ
- β‘ RunPyXL runs Python semantics directly in hardware β no VM, no OS, no JIT.
- π§ͺ A GPIO roundtrip takes 480ns on RunPyXL vs. ~15,000ns on PyBoard (MicroPython).
- π In this test, RunPyXL is 30x faster than MicroPython β or 50x when normalized for clock speed.
- π₯ The video demo shows both systems in action on real hardware.
- π‘ This isn't a C trick β it's actual Python executed in silicon.
- π― Deterministic timing, real-time behavior, and sub-microsecond precision β in Python.
- π More at runpyxl.com β contact link at the bottom.
What is RunPyXL?
RunPyXL is a proof of concept of a custom hardware processor that executes Python directly β no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon.
A custom toolchain compiles a .py
file into CPython ByteCode, translates it to a custom assembly, and produces a binary that runs on a pipelined processor built from scratch.
What RunPyXL is not
- β Not a native C or inlined loop
- β Not MicroPython or JIT
- β Not running Linux or any OS
It's a real processor for Python, built for determinism and speed.
Where does it run?
RunPyXL runs on a Zynq-7000 FPGA (Arty-Z7-20 dev board). The RunPyXL core runs at 100MHz. The ARM CPU on the board handles setup and memory, but the Python code itself is executed entirely in hardware.
The toolchain is written in Python and runs on a standard development machine using unmodified CPython.
Wait β whatβs a GPIO?
GPIO stands for General Purpose Input/Output. Itβs a simple hardware pin that software can read from or write to β a way to control the outside world: LEDs, buttons, sensors, motors, and more.
In MicroPython (like on the PyBoard), your Python code interacts with C functions that handle hardware registers underneath. Itβs reasonably fast, but still goes through a Python VM and a software stack before reaching the pin.
RunPyXL skips all of that. The Python bytecode is executed directly in hardware, and GPIO access is physically wired to the processor β no interpreter, no function call, just native hardware execution.
Now for the GPIO test. What was the video?
I have connected two pins in the Arty board with a jumper cable.
Then, I wrote a python program that measures the time from when GPIO pin1 is set to 1, until 1 is measured on the other pin connected to it.
The video shows a comparison between RunPyXL and PyBoard that runs MicroPython VM.
Let's focus on how RunPyXL does its thing.
The program
from compiler.intrinsics import * def main(): pyxl_write_gpio_pin1(0) # Reset output pin c1 = pyxl_get_cycle_counter() # Cycle counter (100 MHz) pyxl_write_gpio_pin1(1) # Set output pin while pyxl_read_gpio_pin2() == 0: # Wait until input pin is set to 1 continue c2 = pyxl_get_cycle_counter() # Cycle counter (100 MHz) return (c2 - c1) * 10 # Return result in nano seconds (each cycle is 10 ns)
As you can see, this is a regular python program, but it also has some unfamiliar function calls.
These functions originate from compiler.intrinsics module.
pyxl_get_cycle_counter()
Gets the current cycle counter from the RunPyXL CPU. This counter advances by 1 on every tick
pyxl_write_gpio_pin1()
Writes a value (0/1) to a GPIO pin. These are low-level intrinsics exposed by the compiler β currently hardcoded for this test, but will evolve into a more general pyxl_gpio_write(pin, value) API.
pyxl_read_gpio_pin2()
Reads the value from Pin2. Same API comment is true here as well.
Wait, why isn't there a call to the main function?
The main function is just defined, but not invoked. why?
At current stage, RunPyXL calls the main function automatically when it runs a program.
This is just a convenience feature (for dev) and will change in the future.
So how does it work?
As described above, the program is compiled to a CPython Bytecode and then compiled again to RunPyXL assembly. It is then linked together and a binary is generated.
This binary is sent via network to the Arty board, where an ARM CPU gets the application, copies it to a shared memory with the RunPyXL HW and starts running it.
A typical Python runtime (CPython or MicroPython in case of the PyBoard or Python for embedded in general) has a big overhead that is caused by running the ByteCode on a Software based VM. In RunPyXL there's no VM, the HW does everything.
As for reading and writing the GPIO - The GPIO headers are directly mapped to FPGA pins, and physically wired into RunPyXL's core top-level module. Think of it as the main function of the HW.
In this test, all code and data reside in predictable low-latency memory, ensuring deterministic behavior (real-time behavior). This means that for the same input, it'll take the exact same time to run.
So how do these platforms compare?
GPIO Roundtrip Latency (ns). Lower is better.
As you can see, RunPyXL is 30x faster than PyBoard.
Also, remember that RunPyXL's clock speed is lower than PyBoard.
The reason for not operating at a higher clock is that RunPyXL is prototyped on an FPGA and PyBoard has an ASIC. But the gist of it is that it's not a limitation of RunPyXL and higher clocks can be achieved.
Since a higher clock is achievable, we need to compare apples-to-apples and normalize the clock frequencies.
That brings RunPyXLβs normalized advantage to ~50x over PyBoard.
Why don't both tests run the exact same code?
To the keen eyes among you, you may have noticed in the video that the PyBoard code and the RunPyXL code aren't the same.
Both are Python, this is obvious, but there're two main differences:
1. API calls for measuring time and reading/writing GPIO pins. The reason being that this is not CPython that runs on a host, but systems that are aware of the underlying hardware, bringing their own runtime environment with them.
Each platform has its own hardware access API calls, but regular python code is still portable between the platforms (as long as they support whatever Python feature you want to use).
2. The PyBoard runs the test in a tight loop to compensate for jitter and cold cache.
MicroPython running on the PyBoard has runtime jitter. The results are between 14-25 micro seconds in my test. So I wanted to compare to PyBoard after significant warm up to show how much better RunPyXL is even in such case.
RunPyXL, by contrast, is fully deterministic. So long as the jumper is connected, RunPyXL returns a consistent 480ns every time.
This makes RunPyXL suitable for real-time use cases.
Big deal, who cares about making a signal go a bit faster?
This isnβt just a performance boost β it's an unlock. RunPyXL brings a level of responsiveness and determinism that Python has never had in embedded or real-time contexts.
Python VMs β even those designed for microcontrollers β are still built around software interpreters. That introduces overhead and complexity between your code and the hardware.
RunPyXL removes this barrier. Your Python code is executed directly in hardware. GPIO access is physical. Control flow is predictable. Execution is tight and consistent by design.
With this unlock, RunPyXL can be further developed and adapted to these use cases:
- πΉ Real-time control systems in pure Python
- π§ ML inference + sensor response loops with hard timing budgets
- π€ Robotics tasks like motor feedback and sensor fusion with cycle-level precision
- π§ Embedded industrial systems where timing and reliability matter
With RunPyXL, you can write performance-critical code once β in Python β and ship it as-is.
Sounds interesting? Let's talk.
Whether you want to stay updated, share ideas, or discuss potential projects, Iβd love to hear from you.
Click here to get in touch. or just email me at: runpyxl@proton.me