Aurelius: Relation Aware Text-to-Audio Generation At Scale
We present Aurelius, a new framework that enables relation aware text-toaudio (TTA) generation research at scale. Given the lack of essential audio event and relation corpora, Aurelius contributes a large-scale audio event corpus AudioEventSet and another large-scale relation corpus AudioRelSet. Comprising 110 event categories, AudioEventSet maximally covers all commonly heard audio events and each event is unique, realistic and of high-quality. AudioRelSet consists of 100 relations, comprehensively covering the relations that present in the physical world or can be neatly described by text. As the two corpora provide audio event and relation independently, they can be combined to create massive