Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, 제 abs/2510.08996 권
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, 제 abs/2510.08996 권
Monoshi Kumar Roy, Simin Chen, Benjamin Steenhoek, Jinjun Peng, Gail E. Kaiser, Baishakhi Ray, Wei Le
ICLR 2026 | May 2025
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, 제 abs/2510.08996 권
Monoshi Kumar Roy, Simin Chen, Benjamin Steenhoek, Jinjun Peng, Gail E. Kaiser, Baishakhi Ray, Wei Le
ICLR 2026 | May 2025
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Monoshi Kumar Roy, Simin Chen, Benjamin Steenhoek, Jinjun Peng, Gail E. Kaiser, Baishakhi Ray, Wei Le
ICLR 2026 | May 2025
Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
DeepTest (ICSE Workshop) | April 2025
Benjamin Steenhoek, Siva Sivaraman, Renata Saldivar, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le
2025 International Conference on Software Engineering | April 2025
Spandan Garg, Benjamin Steenhoek, Yufan Huang
ArXiv | October 2025, 제 abs/2510.08996 권