Towards synthetic data augmentation for Lecture Slide Understanding

Work under submission. Paper and Code releasing soon!

Abstract

Lecture slide element detection and retrieval, key tasks in lecture slide understanding, have gained significant attention in the multi-modal research community. However, annotating large volumes of lecture slides for supervised training is labor intensive and domain specific. To address this, we propose a large language model (LLM)-guided Synthetic Lecture Slide Generation SynLecSlideGen pipeline that produces high-quality, coherent slides, named as SynSlide dataset, closely resembling real lecture slides. We also create an evaluation benchmark RealSlide by manually annotating 1050 real slides curated from lecture presentation decks. To evaluate the effectiveness of SynSlide dataset, we perform few-shot transfer learning on real slides using models pre-trained on our synthetically generated slides. Experimental results show that few-shot transfer learning outperforms training only on the real dataset especially in low resource settings, demonstrating that synthetic slides can be a valuable pre-training resource in labeled data scarce real-world scenarios.