First Multimodal AI Community Forum

Welcome to “The First Multimodal AI Community Forum”, an online AI UK Fringe event. This event is scheduled from 13:00 to 17:00 (GMT) on Monday, 25th March 2024. The deadline for registration is Thursday, 14th March 2024, 23:59 (GMT).

Click to show more details

Multimodal AI, which integrates various data modalities such as text, image, sound, and others, is swiftly revolutionising our interaction with technology and data. In our recent Turing Interest Group event (22nd Nov), "The First Multimodal AI Research Sprint", we explored the diverse research states and methodologies in Multimodal AI across six areas and initiated the writing of a perspective paper on multimodal AI. Based on such past activities, this online forum aims to further bring together community members, from researchers to practitioners, to share their latest interdisciplinary perspectives and pioneering work in Multimodal AI. Our goal is to facilitate the exchange of fresh insights and foster connections and research progress within the Multimodal AI community.

We welcome researchers, practitioners, and students engaged in or interested in Multimodal AI from anywhere in the world to join us online. We also encourage the organisation of local community gatherings to watch and discuss the forum together.

The video recording of the Keynote Presentation by Chunyuan Li is given below.

The video recording of the Open Discussion session of the First Multimodal AI Community Forum is given below.

Final Programme

Monday, 25th March 2024

Time (GMT)	Event
13:00 - 13:10	Opening
	Haiping Lu - Welcome and Introduction to the Multimodal AI Community
13:10 - 15:40	Pitches
	Open Source Software
	Florence Townend - Fusilli: a Python library for comparing deep fusion models
	Ana Lawry Aguila - multi-view-AE: a Python library of multi-view autoencoder methods
	Healthcare and Medicine
	Jinge Wu - Introduction to Multimodal AI for Healthcare and Medicine
	Peter Charlton - Using Multimodal AI to Identify Determinants of Health and Wellbeing
	Halimat Afolabi - Med-Image Bind
	Dheeraj Giri - AI in Healthcare Needs Multimodal AI for Precision
	Farzana Patel - Using Multimodal AI Models to Support Clinical Diagnosis and Assessments
	Jinge Wu - Vision-Language Model in Radiology
	Jiachen Luo - Multimodal Emotion Recognition, Healthcare
	Kerf Tan - Multimodal Consolidation of Health Data, Breaking the Silos, Fast
	Anthony Hughes - Causal Graph Discovery Using Multimodal Models
	Q&A and Break
	Social Science and Humanities
	Cyndie Demeocq - Introduction to Multimodal AI for Social Science and Humanities
	Shashank Shekhar - Unlock Relationships in Multi-Modal Data Using Knowledge Graphs
	Shitong Sun - Compositional Multimodal Learning
	Masoumeh Chapariniya - Computational Analysis of Personal Identity in Interaction, Recognition and Ethics
	Teodora Vukovic - VIAN-DH - Software for Multimodal Corpora Created by Liri In Zurich
	Ann Van De Velde - Scientific Illustration and Banned Words in Midjourney and DALLE
	Lucia Cipolina-Kun - The Reading of the Herculaneum Papyri Using AI
	Shan Wikoon - Agency in Building Near-Human Multimodal AI Teacher
	Q&A and Break
	Engineering
	Xianyuan Liu - Introduction to Multimodal AI for Engineering
	Ehsan Nowroozi - Validating the Robustness of Cybersecurity Models
	Chao Zhang - Multimodal Learning in Embodied Applications
	Yao Zhang - AI in Control for Maritime Engineering
	Science
	Yuhan Wang - Introduction to Multimodal AI for Science
	Ashwath Shetty - Alt-Text Generation for the Images Using Multimodal LLM
	Sogol Haghighat - Empowering AI Solutions: HPC Access, Research, and Consulting for Multimodal Models
	Q&A and Break
	Environment and Sustainability
	Nataliya Tkachenko - Introduction to Multimodal AI for Environment and Sustainability
	Sachin Gaur - Approaches to Understand and Reduce Search Space for Video Data
	Natalia Efremova - AI for Agriculture
	Finance and Economics
	Arunav Das - Introduction to Multimodal AI for Finance and Economics
	Yan Ge - Multimodal Multi-Task Asset Pricing with Token-Level Numeral Learning
	Q&A and Break
15:40 - 16:20	Keynote Presentation by Chunyuan Li
	Title: LLaVA: A Vision-Language Approach to Computer Vision in the Wild Abstract: The future of AI is in creating systems like foundation models that are pre-trained once, and will handle countless many downstream tasks directly (zero-shot), or adapt to new tasks quickly (few-shot). In this talk, I will discuss our vision-language approach to achieving “Computer Vision in the Wild (CVinW)”: building such a transferable system in computer vision (CV) that can effortlessly generalize to a wide range of visual recognition tasks in the wild. I will dive into Large Language-and-Vision Assistant (LLaVA) and its series, which is the first open-source project to exhibit the GPT-V level capabilities in image understanding and reasoning. I will demonstrate a promising path to build customizable large multimodal models that follow humans’ intent with an affordable cost.
16:20 - 17:00	Open Discussion and Conclusions

Contact Us

multimodal-ai-event-organisers-group@sheffield.ac.uk

Acknowledgement

This event is brought to you by the Turing Interest Group on Meta-Learning for Multimodal Data (welcome to sign up) and the Multimodal AI Community (welcome to subscribe to our Google Group) supported by the Centre for Machine Intelligence at the University of Sheffield.

Click to show more details

First Multimodal AI Community Forum

Keynote Speaker

Chunyuan Li

Research Lead at ByteDance/TikTok, (Co-)Lead Developer of LLaVA, Former Principal Researcher at Microsoft Research, Redmond

Final Programme

Contact Us

Acknowledgement