Welcome to “The First Multimodal AI Community Forum”, an online AI UK Fringe event. This event is scheduled from 13:00 to 17:00 (GMT) on Monday, 25th March 2024. The deadline for registration is Thursday, 14th March 2024, 23:59 (GMT).
Multimodal AI, which integrates various data modalities such as text, image, sound, and others, is swiftly revolutionising our interaction with technology and data. In our recent Turing Interest Group event (22nd Nov), "The First Multimodal AI Research Sprint", we explored the diverse research states and methodologies in Multimodal AI across six areas and initiated the writing of a perspective paper on multimodal AI. Based on such past activities, this online forum aims to further bring together community members, from researchers to practitioners, to share their latest interdisciplinary perspectives and pioneering work in Multimodal AI. Our goal is to facilitate the exchange of fresh insights and foster connections and research progress within the Multimodal AI community.
We welcome researchers, practitioners, and students engaged in or interested in Multimodal AI from anywhere in the world to join us online. We also encourage the organisation of local community gatherings to watch and discuss the forum together.
The video recording of the Keynote Presentation by Chunyuan Li is given below.
The video recording of the Open Discussion session of the First Multimodal AI Community Forum is given below.
Monday, 25th March 2024
Time (GMT) | Event | |
---|---|---|
13:00 - 13:10 | Opening | |
Haiping Lu - Welcome and Introduction to the Multimodal AI Community | ||
13:10 - 15:40 | Pitches | |
Open Source Software | ||
Florence Townend - Fusilli: a Python library for comparing deep fusion models | ||
Ana Lawry Aguila - multi-view-AE: a Python library of multi-view autoencoder methods | ||
Healthcare and Medicine | ||
Jinge Wu - Introduction to Multimodal AI for Healthcare and Medicine | ||
Peter Charlton - Using Multimodal AI to Identify Determinants of Health and Wellbeing | ||
Halimat Afolabi - Med-Image Bind | ||
Dheeraj Giri - AI in Healthcare Needs Multimodal AI for Precision | ||
Farzana Patel - Using Multimodal AI Models to Support Clinical Diagnosis and Assessments | ||
Jinge Wu - Vision-Language Model in Radiology | ||
Jiachen Luo - Multimodal Emotion Recognition, Healthcare | ||
Kerf Tan - Multimodal Consolidation of Health Data, Breaking the Silos, Fast | ||
Anthony Hughes - Causal Graph Discovery Using Multimodal Models | ||
Q&A and Break | ||
Social Science and Humanities | ||
Cyndie Demeocq - Introduction to Multimodal AI for Social Science and Humanities | ||
Shashank Shekhar - Unlock Relationships in Multi-Modal Data Using Knowledge Graphs | ||
Shitong Sun - Compositional Multimodal Learning | ||
Masoumeh Chapariniya - Computational Analysis of Personal Identity in Interaction, Recognition and Ethics | ||
Teodora Vukovic - VIAN-DH - Software for Multimodal Corpora Created by Liri In Zurich | ||
Ann Van De Velde - Scientific Illustration and Banned Words in Midjourney and DALLE | ||
Lucia Cipolina-Kun - The Reading of the Herculaneum Papyri Using AI | ||
Shan Wikoon - Agency in Building Near-Human Multimodal AI Teacher | ||
Q&A and Break | ||
Engineering | ||
Xianyuan Liu - Introduction to Multimodal AI for Engineering | ||
Ehsan Nowroozi - Validating the Robustness of Cybersecurity Models | ||
Chao Zhang - Multimodal Learning in Embodied Applications | ||
Yao Zhang - AI in Control for Maritime Engineering | ||
Science | ||
Yuhan Wang - Introduction to Multimodal AI for Science | ||
Ashwath Shetty - Alt-Text Generation for the Images Using Multimodal LLM | ||
Sogol Haghighat - Empowering AI Solutions: HPC Access, Research, and Consulting for Multimodal Models | ||
Q&A and Break | ||
Environment and Sustainability | ||
Nataliya Tkachenko - Introduction to Multimodal AI for Environment and Sustainability | ||
Sachin Gaur - Approaches to Understand and Reduce Search Space for Video Data | ||
Natalia Efremova - AI for Agriculture | ||
Finance and Economics | ||
Arunav Das - Introduction to Multimodal AI for Finance and Economics | ||
Yan Ge - Multimodal Multi-Task Asset Pricing with Token-Level Numeral Learning | ||
Q&A and Break | ||
15:40 - 16:20 | Keynote Presentation by Chunyuan Li | |
Title: LLaVA: A Vision-Language Approach to Computer Vision in the Wild Abstract: The future of AI is in creating systems like foundation models that are pre-trained once, and will handle countless many downstream tasks directly (zero-shot), or adapt to new tasks quickly (few-shot). In this talk, I will discuss our vision-language approach to achieving “Computer Vision in the Wild (CVinW)”: building such a transferable system in computer vision (CV) that can effortlessly generalize to a wide range of visual recognition tasks in the wild. I will dive into Large Language-and-Vision Assistant (LLaVA) and its series, which is the first open-source project to exhibit the GPT-V level capabilities in image understanding and reasoning. I will demonstrate a promising path to build customizable large multimodal models that follow humans’ intent with an affordable cost. | ||
16:20 - 17:00 | Open Discussion and Conclusions |
This event is brought to you by the Turing Interest Group on Meta-Learning for Multimodal Data (welcome to sign up) and the Multimodal AI Community (welcome to subscribe to our Google Group) supported by the Centre for Machine Intelligence at the University of Sheffield.