Victor May

Machine Learning Engineer & Researcher

About Me

Hello, Internet.

I’m a Staff ML Engineer at Google Cloud by day, researcher by night, and working to close the gap between the two. My recent focus has been on AI agents for software engineering: from training to evals to getting them to work in production.

Prior to joining Google, I led a team at Chegg fine-tuning vision-language models (VLMs) for multimodal question answering. Before that, I worked on recommender systems and multilingual NLP at Taboola.

I hold an M.Sc. in Applied Mathematics from Tel Aviv University and a B.Sc. in Computer Science and Mathematics from Bar-Ilan University.

I contribute to open-source AI projects. Most recently, I collaborated with Ontocord on Aurora-M, a multilingual large language model. I also participated in the OpenAssistant initiative by LAION.

Google Scholar | LinkedIn | Resume | X (Twitter)

News

March 2026: Our paper MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources had been accepted to the Data-FM workshop at ICLR 2026.

October 2025: Our paper FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration had been accepted to International Conference on Software Engineering (ICSE) 2026.

September 2025: Our papers FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration and GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities had been accepted to the NeurIPS 2025 Deep Learning for Code Workshop.

Publications

Blogging

I write about machine learning and related topics on
Medium.

Kaggle Competitions