SafeAgents
A unified framework for building and evaluating safe multi-agent systems SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI…
BusyBox
BusyBox is a physical 3D-printable device for benchmarking affordance generalization in robot foundation models. It features Please check out our website (opens in new tab) for more details. For fully building a instrumented BusyBox capable…
TestExplora
This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…