Yuan Si | Research Assistant

About

I am a research assistant at the University of Waterloo, working on AI for software engineering: I combine large language models, multimodal signals (video, execution traces, code), and formal methods to debug, tutor, evaluate, and verify block-based programs — 9 first-authored papers, including VisionScratch (published at FSE 2026) and ScratchEval (accepted at ISSTA 2026), with recent work on behavioral equivalence, schedule robustness, certificate-carrying transformation, and robustness of programming-by-example. On the side, I work on analytic number theory and probability (4 preprints): the Collatz affine random model is under review at Forum of Mathematics, Sigma, and the first of a three-paper series on rational-distance problems — Guy's four-corner problem (D19) — is under review at the Journal of Number Theory. Previously worked at Microsoft Azure & AI Research.

Research Interests

LLM-based Program Repair Multimodal Debugging AI for Computing Education Evaluation of LLMs Program Analysis & Verification Analytic Number Theory Probability Theory

Publications

Conference Papers

VisionScratch: LLM-Based Automated Feedback Generation using Code-Produced Videos for Scratch Programs

Yuan Si, Daming Li, Hanyuan Shi, Jialu Zhang

Proceedings of the ACM on Software Engineering, Vol. 3 (FSE 2026), pp. 3534–3557 Published

ACM DL arXiv

ScratchEval: A Multimodal Evaluation Framework for LLMs in Block-Based Programming

Yuan Si, Simeng Han, Daming Li, Hanyuan Shi, Jialu Zhang

ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) 2026 Accepted

arXiv

Journal Submissions

A Microcanonical Phase Transition for the Collatz Affine Random Model

Yuan Si

Submitted to Forum of Mathematics, Sigma, 2026 — sharp resonant phase transition at the entropy line for Tao's Syracuse affine model, reducing hard-frequency mixing to a primitive ternary Bernoulli-bridge transform Under Review

Zenodo

Mixed Parity, Diagonal Denominator, and the Pell-Chord Genus-Five Obstruction for the Four-Corner Rational Distance Problem

Yuan Si

Submitted to Journal of Number Theory, 2026 — unconditional necessary conditions on Guy's problem D19 (Pillai's unit-square four-distance) and a reduction of the residual obstruction to a non-fixed Pell-chord involution on a family of arithmetic-genus-five curves Under Review

Zenodo Code

Preprints

Fixed-Set Robustness in Programming by Example: Example Corruption and Semantic Partition Recovery

Yuan Si, Jialu Zhang

Preprint, 2026

arXiv

Checked Program Recovery from Execution Video: A Sound Oracle for Untrusted Generators

Yuan Si, Jialu Zhang

Preprint, 2026

arXiv

SchedCheck: Schedule-Robustness Analysis for Event-Driven Block Programs

Yuan Si, Jialu Zhang

Preprint, 2026

arXiv

Certificate-Carrying Transformation of Event-Driven Block Programs

Yuan Si, Jialu Zhang

Preprint, 2026 — soundness proof mechanized in Lean; validated on 300 real Scratch projects

arXiv

ScratchLens: Lens-Parametric Behavioral Equivalence for Scratch Programs

Yuan Si, Jialu Zhang

Preprint, 2026

arXiv

EcoScratch: Cost-Effective Multimodal Repair for Scratch Using Execution Feedback

Yuan Si, Ming Wang, Daming Li, Hanyuan Shi, Jialu Zhang

Preprint, 2026 Under Review

arXiv

Stitch: Step-by-step LLM Guided Tutoring for Scratch

Yuan Si, Kyle Qi, Daming Li, Hanyuan Shi, Jialu Zhang

Preprint, 2025 Under Review

arXiv

Elliptic Decomposition of the Pell-Chord Genus-Five Obstruction for the Four-Corner Rational Distance Problem

Yuan Si

Preprint, 2026 — structural follow-up to Paper I: decomposes the residual genus-five curve into full-2-torsion elliptic pieces via 2-isogeny and Kani–Rosen Jacobian factorization; reduces the four-corner problem to a two-variable Pythagorean-slope exclusion

Zenodo

An Elementary Projection Obstruction for Rational Distances to Regular Polygons

Yuan Si

Preprint, 2026 — an elementary proof that a regular n-gon (n ≥ 5, n ≠ 6) with rational side length admits no point at rational distance from all vertices, via projection identities, the vertex zero-sum relation, and Niven's theorem

Zenodo

Research Reports & Other

Multiplayer Rock-Paper-Scissors: Nash Equilibria via Linear Programming

Yuan Si

Research Report, 2025

PDF

Tesla Charging Station Optimization via Independent Dominating Sets

Yuan Si

Research Report, 2025

PDF

Public Goods Game: Cooperation Dynamics and Intervention Analysis

Yuan Si

Research Report, 2025

PDF

Textbook

A Gentle Introduction to Optimization

Yuan Si

2024 — Adopted as required reading in 3 university courses

PDF

Experience

University Researcher — AI in Multidimensional Input

July 2025 – Present

University of Waterloo

Led 9 first-authored papers on LLM-driven debugging, tutoring, evaluation, and formal verification for Scratch (VisionScratch published at FSE 2026, ScratchEval accepted at ISSTA 2026).
Built multimodal repair systems fusing gameplay video, execution feedback, and project JSON to localize bugs and synthesize fixes via LLM-guided loops.
Developed an interactive tutoring system (Stitch) and a 100-project executable benchmark (ScratchEval) for evaluating LLM repair quality on block-based code.

University Researcher — Game Theory & Graph Theory

Dec. 2024 – March 2025

University of Waterloo, Combinatorics & Optimization Department

Derived Nash equilibria for multiplayer Rock-Paper-Scissors via linear programming. [report]
Modeled optimal Tesla charging station placement using independent dominating sets. [report]
Analyzed cooperation dynamics in Public Goods Games. [report]
Investigated strategic dynamics in graph-based games (Cops and Robbers).

Microsoft Researcher — AI Trends

May 2024 – Aug. 2024

Microsoft — Azure & AI, Research

Investigated AI scribe technologies for healthcare; synthesized findings into research recommendations for physician workflow automation.
Designed and evaluated improvements to an insurance chatbot through systematic analysis of user interaction data.

Projects

ResearchOS

Local-first research decision and execution system. Turns a brief into runnable experiments, evidence-checked claims, and a packaged manuscript draft. Provider-agnostic LLM adapters (OpenAI / Anthropic / mock), dual-agent code worker, HITL approval gates, encrypted secret store.

Python · FastAPI · React · TypeScript · LLM · AGPL-3.0

WizardingWorld

Harry Potter Hogwarts experience mod for Terraria via tModLoader. 590 C# files, 12 multi-phase bosses, Pensieve memory replay framework, three-language localization. Built with Claude Code + Codex in a dual-agent dev workflow.

C# · .NET 8 · tModLoader · MIT

DevToolkit

43 zero-dependency single-file Python CLI tools across 9 categories: web, data, process, MCP, security, scaffolding. Copy any file and run.

Python · CLI · MCP