draft versio

requirements: - simplistic (minimal) code - no bloat - Optimized (important parts; trade the last bits for simplicity) both for: - speed (multi-gpu, bubbling, etc) - footprint (no astronomical ram usage or rootfs write) machine specs target (for reference, just as demonstration as you shouldnt need that much of cpu/ram) - 4xH100 - 1T ram, 102 Core