First Workshop on Multimodal AI

Imagine Multimodal AI (Insights from the First Workshop on Multimodal AI)
MultimodalAI'23 Photos
MultimodalAI'23 Keynote Speakers YouTube Videos
MultimodalAI'23 pitches slides and posters

MultimodalAI'23 Photos Now Available! Click the photo to view and download!

Join the Multimodal AI Community mailing list by subscribing to our Google Group.

Multimodal AI combines multiple types of data (image, text, audio, etc) via machine learning models and algorithms to achieve better performance. Multimodal AI is key for AI research and applications including healthcare, net zero, finance, robotics, and manufacturing. Multimodal AI in these areas is challenging due to the inherent complexity of data integration and the limited availability of labelled data. Unimodal AI for a single type of data input is maturing at an accelerating pace, thus creating vast opportunities for tackling multimodal AI challenges.

MultimodalAI’23 brings together researchers and practitioners from AI, data science, and various scientific and application domains to discuss problems and challenges, share experiences and solutions, explore collaborations and future directions, and build networks and a vibrant community on multimodal AI. We have three keynote speakers covering academic research, industrial research, and industrial applications: Professor Mirella Lapata (University of Edinburgh, UKRI Turing AI World-Leading Researcher Fellow), Dr Yutian Chen (Google DeepMind, AlphaGo Developer), and Dr Chew-Yean Yam (Microsoft, Principal Data and Applied Scientist).

Click to show more details

We offer participants opportunities to give 3-min pitches and present posters, with four prizes (£150 each) in total for the best pitches and best posters. You may submit proposals for a pitch and/or poster when you register. We will confirm accepted pitches and posters in the week ending June 17th.

Should you require assistance with accessibility for this event, or if you have any other special requirements, or if you would like to discuss your needs with the organizing team, please contact us. We will do our best to fulfill your requirements to allow you to fully participate in this event.

Join this interdisciplinary event to create a diverse community that shapes and builds the future of multimodal AI research and developments.

Welcome to share the workshop flyer in PDF with your network.

Final Programme (In Person Only)

Tuesday, 27th June 2023

Time	Event
09:30 - 10:00	Registration and Morning Refreshments ☕
10:00 - 10:10	Welcome and Introduction: Haiping Lu, The University of Sheffield 🎤 (YouTube Video)
10:10 - 10:50	Keynote 1: Yutian Chen, Google DeepMind (YouTube Video)
	Title: Learning generalizable models on large scale multi-modal data Abstract: The abundant spectrum of multi-modal data provides a significant opportunity for augmenting the training of foundational models beyond mere text. In this talk, I will introduce two lines of work that leverage large-scale models, trained on Internet-scale multi-modal datasets, to achieve good generalization performance. The first work trains an audio-visual model on YouTube datasets of videos and enables automatic video translation and dubbing. The model is able to learn the correspondence between audio and visual features, and use this knowledge to translate videos from one language to another. The second work trains a multi-modal, multi-task, multi-embodiment generalist policy on a massive collection of simulated control tasks, vision, language, and robotics. The model is able to learn to perform a variety of tasks, including controlling a robot arm, playing a game, and translating text. Both lines of work exhibit the potential future trajectory of foundational models, highlighting the transformative power of integrating multi-modal inputs and outputs.
10:50 - 11:30	Morning 3-Minute Pitches 📣
11:30 - 12:00	Break and Poster Session 1 ☕ 🖼️
12:00 - 12:40	Keynote 2: Mirella Lapata, The University of Edinburgh (YouTube Video)
	Title: Hierarchical3D Adapters for Long Video-to-text Summarization Abstract: In this talk I will focus on video-to-text summarization and discuss how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2021), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding fulllength videos. We incorporate multimodal information into a pretrained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8% of model parameters. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
12:40 - 12:45	Group Photos
12:45 - 14:00	Lunch & Poster Session 2 :fork_knife_plate: 🖼️
14:00 - 14:40	Keynote 3: Chew-Yean Yam, Microsoft (YouTube Video)
	Title: Us and AI: Redefining our relationships with AI Abstract: The rapid advancement of AI has transformed how we interact with intelligent machines. Unravel the dynamic shifts in human-AI relations across diverse roles that we play in our society. Spark your imagination and seize the power to sculpt this new relationship that is meaningful to you.
14:40 - 15:20	Afternoon 3-Minute Pitches 📣
15:20 - 15:25	Final Voting for Best Pitch/Poster and Best Student Pitch/Poster Prizes
15:25 - 16:00	Panel Discussion 💬
	What breakthroughs in multimodal AI do you foresee having the most significant impact in the next five years? How can we navigate and mitigate the ethical concerns associated with advancing multimodal AI technologies? Given the interdisciplinary nature of multimodal AI, how can we better integrate different fields of expertise to accelerate innovation in this area?
16:00 - 16:10	Prize Winner Announcement and Closing Initiative: Haiping Lu 🏆
16:10 - 17:00	Tea/Coffee and Networking Forums 🍵 ☕
	Envisioning MultimodalAI'24 Boosting Engagement and Active Participation in Multimodal AI Cross-Disciplinary Collaboration and Resource Sharing in Multimodal AI Open-Source Software Development for Multimodal AI Ethical and Responsible Practices in Multimodal AI

Morning Pitch Session

Name	Title
Xingchi Liu	A Gaussian Process Method for Ground Vehicle Classification using Acoustic Data
Adam Wynn	BETTER: An automatic feedBack systEm for supporTing emoTional spEech tRaining (slides)
Yizhi Li	MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Rui Zhu	KnowWhereGraph: A Geospatial Knowledge Graph to Support Cross-Domain Knowledge Discovery
Imene Tarakli	Robot as a Schoolmates for Enhanced Adaptive Learning
Sina Tabakhi	From Multimodal Learning to Graph Neural Networks (slides)
Nitisha Jain	Semantic Interpretations of Multimodal Embeddings towards Explainable AI

Afternoon Pitch Session

Name	Title
Lucia Cipolina-Kun	Diffusion Models for the Restoration of Cultural Heritage (slides)
Bohua Peng	Recent Findings of Foundational Models on Multimodal NLU
Thao Do	Social Media Mining and Machine Learning for Understanding Illegal Wildlife Trade in Vietnam (slides)
Sam Barnett	The Genomics England Multimodal Research Environment for Medical Research
Luigi A. Moretti	Can AI be effectively implemented to help treat Anxiety Disorders (ADs)? (slides)
Mohammod Suvon	The Multimodal Lens: Understanding of Radiologists Visual Search Behavior Patterns (slides)
Ning Ma	Obstructive Sleep Apnoea Screening with Breathing Sounds and Respiratory Effort: A Multimodal Deep Learning Approach (slides)
Felix Krones	Multimodal Cardiomegaly Classification with Image-Derived Digital Biomarkers (sildes)

Posters

Name	Title
Abdulsalam Alsunaidi	Predicting Actions in Images using Distributed Lexical Representations
Bohua Peng	Recent Findings of Foundational Models on Multimodal NLU
Chenghao Xiao	Adversarial Length Attack to Vision-Language Models
Chenyang Wang	A Novel Multimodal AI Model for Generating Hospital Discharge Instruction
Christoforos Galazis	High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network
Douglas Amoke	Multimodal Data and AI for Downstream Tasks
Jayani Bhatwadiya	Multimodal AI for Cancer Detection and Diagnosis : A Study on the Cancer Imaging Archive (TCIA) Dataset
Jiachen Luo	Cross-Modal Fusion Techniques for Emotion Recognition from Text and Speech
Jingkun Chen	Semi-Supervised Unpaired Medical Image Segmentation Through Task-Affinity Consistency
Lucia Cipolina-Kun	Diffusion Models for the Restoration of Cultural Heritage (poster)
Luigi A. Moretti	Can AI be effectively implemented to help treat Anxiety Disorders (ADs)? (poster)
Nitisha Jain	Semantic Interpretations of Multimodal Embeddings towards Explainable AI
Prasun Tripathi	Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI
Raja Omman Zafar	Digital Twin for Homecare
Sokratia Georgaka	CellPie: A Fast Spatial Transcriptomics Topic Discovery Method via Joint Factorization of Gene Expression and Imaging Data
Wei Xing	Multi-Fidelity Fusion (poster)
Yichen He	AI in Evolution and Ecology (poster)
Yixuan Zhu	Potential Multimodal AI for Electroencephalogram (EEG) Analysis (poster)
Yizhi Li	MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training (poster)
Yu Hon On	Automatic Aortic Valve Disease Detection from MRI with Spatio-Temporal Attention Maps

Organising Committee

Haiping Lu

Director of the UK Open Multimodal AI Network, Professor of Machine Learning & Head of AI Research Engineering, University of Sheffield

Peter Charlton

Lead for Early Career Researchers at the UK Open Multimodal AI Network & Senior Research Scientist, Nokia Bell Labs

David Clifton

Professor of Clinical Machine Learning, Department of Engineering Science, University of Oxford & Turing Fellow, Alan Turing Institute

Shuo Zhou

Lecturer in Machine Learning & Deputy Head of AI Research Engineering, University of Sheffield

Tingting Zhu

Associate Professor in AI for Digital Health, University of Oxford

Tingyan (Tina) Wang

Postdoctoral Scientist, University of Oxford

Lei Lu

Postdoctoral Research Assistant, University of Oxford

Zhixiang Chen

Lecturer in Machine Learning

Emma Barker

Centre Manager of Centre for Machine Intelligence

Sina Tabakhi

PhD Candidate, School of Computer Science, University of Sheffield

Sponsors and Partners

This workshop is jointly organised by University of Sheffield and University of Oxford under the Turing Network Funding from the Alan Turing Institute, with support from University of Sheffield’s Centre for Machine Intelligence and Alan Turing Institute’s Interest Group on Meta-learning for Multimodal Data (welcome to sign-up and join).

Click to show more details

Disclaimer: This event is supported by The Alan Turing Institute. The Turing is not involved in the agenda or content planning.

Contact Us

Click to show more details

The Edge, The Endcliffe Village, 34 Endcliffe Cres, Sheffield, S10 3ED
Email the organisers

First Workshop on Multimodal AI

Keynote Speakers

Yutian Chen

Staff Research Scientist, Google DeepMind, AlphaGo Developer

Mirella Lapata

Professor of Natural Language Processing, University of Edinburgh & UKRI Turing AI World-Leading Researcher Fellow

Chew-Yean Yam

Principal Data and Applied Scientist, Microsoft

Final Programme (In Person Only)

Organising Committee

Haiping Lu

Director of the UK Open Multimodal AI Network, Professor of Machine Learning & Head of AI Research Engineering, University of Sheffield

Peter Charlton

Lead for Early Career Researchers at the UK Open Multimodal AI Network & Senior Research Scientist, Nokia Bell Labs

David Clifton

Professor of Clinical Machine Learning, Department of Engineering Science, University of Oxford & Turing Fellow, Alan Turing Institute

Shuo Zhou

Lecturer in Machine Learning & Deputy Head of AI Research Engineering, University of Sheffield

Tingting Zhu

Associate Professor in AI for Digital Health, University of Oxford

Tingyan (Tina) Wang

Postdoctoral Scientist, University of Oxford

Lei Lu

Postdoctoral Research Assistant, University of Oxford

Zhixiang Chen

Lecturer in Machine Learning

Emma Barker

Centre Manager of Centre for Machine Intelligence

Sina Tabakhi

PhD Candidate, School of Computer Science, University of Sheffield

Technical Support

Jayani Bhatwadiya

Data Analyst, Nuffield Department of Population Health, University of Oxford

Mohammod Suvon

AI Research Engineer

Sponsors and Partners

Contact Us