First Workshop on Multimodal AI

MultimodalAI'23 Photos Now Available! Click the photo to view and download!

Group Photo

Join the Multimodal AI Community mailing list by subscribing to our Google Group.

Multimodal AI combines multiple types of data (image, text, audio, etc) via machine learning models and algorithms to achieve better performance. Multimodal AI is key for AI research and applications including healthcare, net zero, finance, robotics, and manufacturing. Multimodal AI in these areas is challenging due to the inherent complexity of data integration and the limited availability of labelled data. Unimodal AI for a single type of data input is maturing at an accelerating pace, thus creating vast opportunities for tackling multimodal AI challenges.

MultimodalAI’23 brings together researchers and practitioners from AI, data science, and various scientific and application domains to discuss problems and challenges, share experiences and solutions, explore collaborations and future directions, and build networks and a vibrant community on multimodal AI. We have three keynote speakers covering academic research, industrial research, and industrial applications: Professor Mirella Lapata (University of Edinburgh, UKRI Turing AI World-Leading Researcher Fellow), Dr Yutian Chen (Google DeepMind, AlphaGo Developer), and Dr Chew-Yean Yam (Microsoft, Principal Data and Applied Scientist).

We offer participants opportunities to give 3-min pitches and present posters, with four prizes (£150 each) in total for the best pitches and best posters. You may submit proposals for a pitch and/or poster when you register. We will confirm accepted pitches and posters in the week ending June 17th.

Should you require assistance with accessibility for this event, or if you have any other special requirements, or if you would like to discuss your needs with the organizing team, please contact us. We will do our best to fulfill your requirements to allow you to fully participate in this event.

Join this interdisciplinary event to create a diverse community that shapes and builds the future of multimodal AI research and developments.

Welcome to share the workshop flyer in PDF with your network.

Keynote Speakers

Avatar

Yutian Chen

Staff Research Scientist, Google DeepMind, AlphaGo Developer

Avatar

Mirella Lapata

Professor of Natural Language Processing, University of Edinburgh & UKRI Turing AI World-Leading Researcher Fellow

Avatar

Chew-Yean Yam

Principal Data and Applied Scientist, Microsoft

Final Programme (In Person Only)

Tuesday, 27th June 2023

Time    Event
09:30 - 10:00    Registration and Morning Refreshments ☕
10:00 - 10:10    Welcome and Introduction: Haiping Lu, The University of Sheffield 🎤 (YouTube Video)
10:10 - 10:50    Keynote 1: Yutian Chen, Google DeepMind (YouTube Video)
    Title: Learning generalizable models on large scale multi-modal data
Abstract: The abundant spectrum of multi-modal data provides a significant opportunity for augmenting the training of foundational models beyond mere text. In this talk, I will introduce two lines of work that leverage large-scale models, trained on Internet-scale multi-modal datasets, to achieve good generalization performance. The first work trains an audio-visual model on YouTube datasets of videos and enables automatic video translation and dubbing. The model is able to learn the correspondence between audio and visual features, and use this knowledge to translate videos from one language to another. The second work trains a multi-modal, multi-task, multi-embodiment generalist policy on a massive collection of simulated control tasks, vision, language, and robotics. The model is able to learn to perform a variety of tasks, including controlling a robot arm, playing a game, and translating text. Both lines of work exhibit the potential future trajectory of foundational models, highlighting the transformative power of integrating multi-modal inputs and outputs.
10:50 - 11:30    Morning 3-Minute Pitches 📣
11:30 - 12:00    Break and Poster Session 1 ☕ 🖼️
12:00 - 12:40    Keynote 2: Mirella Lapata, The University of Edinburgh (YouTube Video)
    Title: Hierarchical3D Adapters for Long Video-to-text Summarization
Abstract: In this talk I will focus on video-to-text summarization and discuss how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2021), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding fulllength videos. We incorporate multimodal information into a pretrained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8% of model parameters. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
12:40 - 12:45    Group Photos
12:45 - 14:00    Lunch & Poster Session 2 :fork_knife_plate: 🖼️
14:00 - 14:40    Keynote 3: Chew-Yean Yam, Microsoft (YouTube Video)
    Title: Us and AI: Redefining our relationships with AI
Abstract: The rapid advancement of AI has transformed how we interact with intelligent machines. Unravel the dynamic shifts in human-AI relations across diverse roles that we play in our society. Spark your imagination and seize the power to sculpt this new relationship that is meaningful to you.
14:40 - 15:20    Afternoon 3-Minute Pitches 📣
15:20 - 15:25    Final Voting for Best Pitch/Poster and Best Student Pitch/Poster Prizes
15:25 - 16:00    Panel Discussion 💬
  • What breakthroughs in multimodal AI do you foresee having the most significant impact in the next five years?
  • How can we navigate and mitigate the ethical concerns associated with advancing multimodal AI technologies?
  • Given the interdisciplinary nature of multimodal AI, how can we better integrate different fields of expertise to accelerate innovation in this area?
16:00 - 16:10    Prize Winner Announcement and Closing Initiative: Haiping Lu 🏆
16:10 - 17:00    Tea/Coffee and Networking Forums 🍵 ☕
  • Envisioning MultimodalAI'24
  • Boosting Engagement and Active Participation in Multimodal AI
  • Cross-Disciplinary Collaboration and Resource Sharing in Multimodal AI
  • Open-Source Software Development for Multimodal AI
  • Ethical and Responsible Practices in Multimodal AI

Morning Pitch Session

Name    Title
Xingchi Liu    A Gaussian Process Method for Ground Vehicle Classification using Acoustic Data
Adam Wynn    BETTER: An automatic feedBack systEm for supporTing emoTional spEech tRaining (slides)
Yizhi Li    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Rui Zhu    KnowWhereGraph: A Geospatial Knowledge Graph to Support Cross-Domain Knowledge Discovery
Imene Tarakli    Robot as a Schoolmates for Enhanced Adaptive Learning
Sina Tabakhi    From Multimodal Learning to Graph Neural Networks (slides)
Nitisha Jain    Semantic Interpretations of Multimodal Embeddings towards Explainable AI

Afternoon Pitch Session

Name    Title
Lucia Cipolina-Kun    Diffusion Models for the Restoration of Cultural Heritage (slides)
Bohua Peng    Recent Findings of Foundational Models on Multimodal NLU
Thao Do    Social Media Mining and Machine Learning for Understanding Illegal Wildlife Trade in Vietnam (slides)
Sam Barnett    The Genomics England Multimodal Research Environment for Medical Research
Luigi A. Moretti    Can AI be effectively implemented to help treat Anxiety Disorders (ADs)? (slides)
Mohammod Suvon    The Multimodal Lens: Understanding of Radiologists Visual Search Behavior Patterns (slides)
Ning Ma    Obstructive Sleep Apnoea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach (slides)
Felix Krones    Multimodal Cardiomegaly Classification with Image-Derived Digital Biomarkers (sildes)

Posters

Name    Title
Abdulsalam Alsunaidi    Predicting Actions in Images using Distributed Lexical Representations
Bohua Peng    Recent Findings of Foundational Models on Multimodal NLU
Chenghao Xiao    Adversarial Length Attack to Vision-Language Models
Chenyang Wang    A Novel Multimodal AI Model for Generating Hospital Discharge Instruction
Christoforos Galazis    High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network
Douglas Amoke    Multimodal Data and AI for Downstream Tasks
Jayani Bhatwadiya    Multimodal AI for Cancer Detection and Diagnosis : A Study on the Cancer Imaging Archive (TCIA) Dataset
Jiachen Luo    Cross-Modal Fusion Techniques for Emotion Recognition from Text and Speech
Jingkun Chen    Semi-Supervised Unpaired Medical Image Segmentation Through Task-Affinity Consistency
Lucia Cipolina-Kun    Diffusion Models for the Restoration of Cultural Heritage (poster)
Luigi A. Moretti    Can AI be effectively implemented to help treat Anxiety Disorders (ADs)? (poster)
Nitisha Jain    Semantic Interpretations of Multimodal Embeddings towards Explainable AI
Prasun Tripathi    Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI
Raja Omman Zafar    Digital Twin for Homecare
Sokratia Georgaka    CellPie: A Fast Spatial Transcriptomics Topic Discovery Method via Joint Factorization of Gene Expression and Imaging Data
Wei Xing    Multi-Fidelity Fusion (poster)
Yichen He    AI in Evolution and Ecology (poster)
Yixuan Zhu    Potential Multimodal AI for Electroencephalogram (EEG) Analysis (poster)
Yizhi Li    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training (poster)
Yu Hon On    Automatic Aortic Valve Disease Detection from MRI with Spatio-Temporal Attention Maps

Organising Committee

Avatar

Haiping Lu

Head of AI Research Engineering, Professor of Machine Learning, and Turing Academic Lead

Avatar

David Clifton

Professor of Clinical Machine Learning, Department of Engineering Science, University of Oxford & Turing Fellow, Alan Turing Institute

Avatar

Shuo Zhou

Deputy Head of AI Research Engineering, and Academic Fellow in Machine Learning

Avatar

Tingting Zhu

Associate Professor in AI for Digital Health, University of Oxford

Avatar

Peter Charlton

British Heart Foundation Research Fellow, University of Cambridge

Avatar

Tingyan (Tina) Wang

Postdoctoral Scientist, University of Oxford

Avatar

Lei Lu

Postdoctoral Research Assistant, University of Oxford

Avatar

Zhixiang Chen

Lecturer in Machine Learning, Department of Computer Science

Avatar

Emma Barker

Centre Manager of Centre for Machine Intelligence

Avatar

Sina Tabakhi

PhD Student, Department of Computer Science

Technical Support

Avatar

Jayani Bhatwadiya

Data Analyst, Nuffield Department of Population Health, University of Oxford

Avatar

Mohammod Suvon

AI Research Engineer

Sponsors and Partners

This workshop is jointly organised by University of Sheffield and University of Oxford under the Turing Network Funding from the Alan Turing Institute, with support from University of Sheffield’s Centre for Machine Intelligence and Alan Turing Institute’s Interest Group on Meta-learning for Multimodal Data (welcome to sign-up and join).

CMI Logo ATI Logo OX Logo

Disclaimer: This event is supported by The Alan Turing Institute. The Turing is not involved in the agenda or content planning.

Contact Us