A Large-Scale Multimodal Benchmark for Recipe Image and Video Generation
RecipeGen is the first real-world benchmark dataset designed specifically for recipe image and video generation tasks. It addresses the limitations of existing datasets by providing detailed step-level annotations with corresponding visual content.
RecipeGen provides a robust foundation for evaluating Text-to-Image (T2I), Image-to-Video (I2V), and Text-to-Video (T2V) generation models with a focus on diverse culinary traditions.