First Multimodal AI Community Forum

Welcome to “The First Multimodal AI Community Forum”, an online AI UK Fringe event. This event is scheduled from 13:00 to 17:00 (GMT) on Monday, 25th March 2024. The deadline for registration is Thursday, 14th March 2024, 23:59 (GMT).

Click to show more details

Multimodal AI, which integrates various data modalities such as text, image, sound, and others, is swiftly revolutionising our interaction with technology and data. In our recent Turing Interest Group event (22nd Nov), "The First Multimodal AI Research Sprint", we explored the diverse research states and methodologies in Multimodal AI across six areas and initiated the writing of a perspective paper on multimodal AI. Based on such past activities, this online forum aims to further bring together community members, from researchers to practitioners, to share their latest interdisciplinary perspectives and pioneering work in Multimodal AI. Our goal is to facilitate the exchange of fresh insights and foster connections and research progress within the Multimodal AI community.

We welcome researchers, practitioners, and students engaged in or interested in Multimodal AI from anywhere in the world to join us online. We also encourage the organisation of local community gatherings to watch and discuss the forum together.

The video recording of the Keynote Presentation by Chunyuan Li is given below.

The video recording of the Open Discussion session of the First Multimodal AI Community Forum is given below.

Keynote Speaker

Avatar

Chunyuan Li

Research Lead at ByteDance/TikTok, (Co-)Lead Developer of LLaVA, Former Principal Researcher at Microsoft Research, Redmond

Final Programme

Monday, 25th March 2024

Time (GMT)    Event
13:00 - 13:10    Opening
    Haiping Lu - Welcome and Introduction to the Multimodal AI Community
13:10 - 15:40    Pitches
    Open Source Software
        Florence Townend - Fusilli: a Python library for comparing deep fusion models
        Ana Lawry Aguila - multi-view-AE: a Python library of multi-view autoencoder methods
    Healthcare and Medicine
        Jinge Wu - Introduction to Multimodal AI for Healthcare and Medicine
        Peter Charlton - Using Multimodal AI to Identify Determinants of Health and Wellbeing
        Halimat Afolabi - Med-Image Bind
        Dheeraj Giri - AI in Healthcare Needs Multimodal AI for Precision
        Farzana Patel - Using Multimodal AI Models to Support Clinical Diagnosis and Assessments
        Jinge Wu - Vision-Language Model in Radiology
        Jiachen Luo - Multimodal Emotion Recognition, Healthcare
        Kerf Tan - Multimodal Consolidation of Health Data, Breaking the Silos, Fast
        Anthony Hughes - Causal Graph Discovery Using Multimodal Models
    Q&A and Break
    Social Science and Humanities
        Cyndie Demeocq - Introduction to Multimodal AI for Social Science and Humanities
        Shashank Shekhar - Unlock Relationships in Multi-Modal Data Using Knowledge Graphs
        Shitong Sun - Compositional Multimodal Learning
        Masoumeh Chapariniya - Computational Analysis of Personal Identity in Interaction, Recognition and Ethics
        Teodora Vukovic - VIAN-DH - Software for Multimodal Corpora Created by Liri In Zurich
        Ann Van De Velde - Scientific Illustration and Banned Words in Midjourney and DALLE
        Lucia Cipolina-Kun - The Reading of the Herculaneum Papyri Using AI
        Shan Wikoon - Agency in Building Near-Human Multimodal AI Teacher
    Q&A and Break
    Engineering
        Xianyuan Liu - Introduction to Multimodal AI for Engineering
        Ehsan Nowroozi - Validating the Robustness of Cybersecurity Models
        Chao Zhang - Multimodal Learning in Embodied Applications
        Yao Zhang - AI in Control for Maritime Engineering
    Science
        Yuhan Wang - Introduction to Multimodal AI for Science
        Ashwath Shetty - Alt-Text Generation for the Images Using Multimodal LLM
        Sogol Haghighat - Empowering AI Solutions: HPC Access, Research, and Consulting for Multimodal Models
    Q&A and Break
    Environment and Sustainability
        Nataliya Tkachenko - Introduction to Multimodal AI for Environment and Sustainability
        Sachin Gaur - Approaches to Understand and Reduce Search Space for Video Data
        Natalia Efremova - AI for Agriculture
    Finance and Economics
        Arunav Das - Introduction to Multimodal AI for Finance and Economics
        Yan Ge - Multimodal Multi-Task Asset Pricing with Token-Level Numeral Learning
    Q&A and Break
15:40 - 16:20    Keynote Presentation by Chunyuan Li
    Title: LLaVA: A Vision-Language Approach to Computer Vision in the Wild
Abstract: The future of AI is in creating systems like foundation models that are pre-trained once, and will handle countless many downstream tasks directly (zero-shot), or adapt to new tasks quickly (few-shot). In this talk, I will discuss our vision-language approach to achieving “Computer Vision in the Wild (CVinW)”: building such a transferable system in computer vision (CV) that can effortlessly generalize to a wide range of visual recognition tasks in the wild. I will dive into Large Language-and-Vision Assistant (LLaVA) and its series, which is the first open-source project to exhibit the GPT-V level capabilities in image understanding and reasoning. I will demonstrate a promising path to build customizable large multimodal models that follow humans’ intent with an affordable cost.
16:20 - 17:00    Open Discussion and Conclusions

Acknowledgement

This event is brought to you by the Turing Interest Group on Meta-Learning for Multimodal Data (welcome to sign up) and the Multimodal AI Community (welcome to subscribe to our Google Group) supported by the Centre for Machine Intelligence at the University of Sheffield.

Click to show more details
AIUK Logo CMI Logo