MINERVA Online Journal Club #2: VL-JEPA: Joint Embedding PredictiveArchitecture for Vision-language
July 7 @ 16:00 – 17:00
We’ll return with the second installment of the online journal club. In the meeting, we will have a look together at (VL-)JEPA [1], a model from Yann LeCun’s research agenda around Joint Embedding Predictive Architectures. Our team from Tübingen AI Center will host and lead the discussion. The session is open to all and free of charge. We encourage you to read the paper before joining, but it’s not a strict requirement. If you have specific questions that you want to have discussed during
the session, send them to minerva@tuebingen.ai.
(VL-)JEPA [1] is an approach to enhance vision-language models by introducing a predictive architecture. Instead of generating text tokens directly, it predicts continuous embeddings representing abstract
semantic concepts. The intuition behind this is to have the model to focus on task-relevant meaning, abstracting from surface-level linguistic variations, leading to more efficient and semantically rich representations.
Looking forward to see many of you online and diving deep into the research trends.
[1] https://openreview.net/forum?id=tjimrqc2BU
