Congratulations to all the winners! Lucas Farndale for Best Talk, Ruizhe Li for Best Poster, Nee Ling Wong for Best Student Talk, and Luigi Moretti for Best Student Poster.

Prize Winner

MultimodalAI'24

Group Photo

Keynote Speakers

Avatar

Daniel Zügner

Senior Researcher, Microsoft Research AI4Science

Avatar

Maria Liakata

Professor of Natural Language Processing, Queen Mary University of London & Turing AI Fellow

Avatar

Nataliya Tkachenko

Generative AI Ethics & Assurance Lead, Lloyds Banking Group

Avatar

Adam Steventon

Director of Data Platforms, Our Future Health

Multimodal AI, which integrates various data modalities such as text, image, sound, and others, is swiftly revolutionising our interaction with technology and data. MultimodalAI’24 brings together researchers and practitioners from AI, data science, and various scientific and application domains to discuss problems and challenges, share experiences and solutions, explore collaborations and future directions, and build networks and a vibrant community on multimodal AI.

This workshop features four keynote speakers from academia and industry: Maria Liakata (Professor of NLP, Queen Mary University of London, Turing AI Fellow), Daniel Zügner (Senior Researcher, Microsoft Research AI4Science), Nataliya Tkachenko (Generative AI Ethics & Assurance Lead, Lloyds Banking Group), and Adam Steventon (Director of Data Platforms, Our Future Health). We offer participants opportunities to present 5-minute talks and posters, with four prizes (£150 each) in total for the best talks and posters. We also have funds to support travel costs.

Pre-workshop Events

24th June 2024

Click to show
Time    Event
14:00 - 17:00    Domain-specific meetings
17:00 - 19:00    Networking reception

Final Programme (In Person Only)

25th June 2024

Time    Event
09:00 - 09:30    Registration, morning refreshments, and poster session 1
09:30 - 09:35    Welcome - Guy Brown, Deputy Director of Centre for Machine Intelligence, University of Sheffield (YouTube Video)
09:35 - 10:00    Introduction: exploring multimodal AI beyond vision and language, Haiping Lu (YouTube Video)
10:00 - 10:40    Keynote 1: Daniel Zügner, Microsoft Research AI4Science (YouTube Video)
    Title: MatterGen: a generative model for inorganic materials design
Abstract: The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Traditionally, materials design is achieved by screening a large database of known materials and filtering down candidates based on the application. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. In this talk, we present MatterGen, a generative model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. To enable this, we introduce a new diffusion-based generative process that produces crystalline structures by gradually refining atom types, coordinates, and the periodic lattice. We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset. Compared to prior generative models, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, novel materials with desired chemistry, symmetry, as well as mechanical, electronic and magnetic properties. Finally, we demonstrate multi-property materials design capabilities by proposing structures that have both high magnetic density and a chemical composition with low supply-chain risk. We believe that the quality of generated materials and the breadth of MatterGen’s capabilities represent a major advancement towards creating a universal generative model for materials design.
10:40 - 11:20    Community talks
11:20 - 11:50    Break and poster session 2
11:50 - 12:30    Keynote 2: Maria Liakata, Queen Mary University of London (YouTube Video)
    Title: Longitudinal language processing for dementia
Abstract: While the advent of Large Language Modes (LLMs) has brought great promise to the field of AI there are many unresolved challenges especially around appropriate generation, temporal robustness, temporal and other reasoning and privacy concerns especially when working with sensitive content such as mental health data. The programme of work I have been leading consists in three core research directions: (1) data representation and generation (2) methods for personalised longitudinal models and temporal understanding (3) evaluation in real-world settings, with a focus on mental health. I will give an overview of work within my group on these topics and focus on work on longitudinal monitoring for dementia.
12:30 - 12:40    Group Photos
12:40 - 14:00    Lunch and poster session 3
14:00 - 14:40    Keynote 3: Nataliya Tkachenko, Lloyds Banking Group (YouTube Video)
    Title: Ethical challenges for multimodal conversational banking & parametric insurance
Abstract: Ever since mass-propagation of generative AI models, multimodal data has been getting increased attention from the customer-focused industries. Multimodal chatbots, which can process and respond to customer queries using enriched context, such as text, voice, and even visual data, offer significant advantages in customer banking and parametric insurance by enhancing user interaction, speed and overall service efficiency. Customers now have an option to choose their preferred mode of communication, whether through typing, speaking, or even using gestures. By analysing customer data from various sources, chatbots can offer personalised financial advice, investment recommendations, and alert about unusual activities. They even can help with the immediate payouts, by promptly verifying predefined parameters, such as weather data for crop insurance for example. However, with enriched context also come multi-dimensional ethical considerations, such as bias, fairness, transparency and confabulations. In this presentation I will cover how these risks emerge and mutually diffuse in highly automated interfaces.
14:40 - 15:20    Community talks
15:20 - 15:30    Break
15:30 - 16:10    Keynote 4: Adam Steventon, Our Future Health
    Title: An incredibly detailed picture of human health: the exciting potential of Our Future Health to prevent, detect and treat diseases
Abstract: In this presentation, I will detail the groundbreaking efforts of Our Future Health to construct a multimodal dataset encompassing 5 million individuals, representative of the UK’s diverse population. I will explore the transformative potential of this dataset to enhance our capabilities in predicting, detecting, and treating major diseases. Additionally, I will discuss the roles of artificial intelligence in this context, focusing on the opportunities and challenges it presents. This exploration will underscore the potential of AI and large-scale data in shaping the future of healthcare.
16:10 - 16:40    Panel discussion
  • What are the major barriers to deploying multimodal AI systems in real-world applications?
  • How can we best identify and utilise diverse data sources to advance multimodal AI research and applications?
16:40 - 17:00    Best talk/poster prize winner announcement and closing
17:00 - 17:30    Tea/coffee and networking

Morning Talk Session (10:40 - 11:20)

Name    Title
Yao Zhang    AI in Maritime Engineering Control System (slides)
Douglas Amoke    Geo-located Multimodal Data for Maritime Downstream Tasks
Yan Ge    Multimodal Multi-task Asset Pricing with Numeral Learning (slides)
Ruyi Wang    Multimodal Affective Computing for Mental Health Support
Luigi Moretti    Integrating Affective Computing and Smart Sensing into Treatment Pathways for Anxiety Disorders (slides)
Jiawei Zheng    Process-aware Human Activity Recognition
Lucas Farndale    Super Vision without Supervision: Self-supervised Learning from Multimodal Data for Enhanced Biomedical Imaging
Ruizhe Li    It’s Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition (slides)

Afternoon Talk Session (14:40 - 15:20)

Name    Title
Valentin Danchev    Data Governance, Ethics, and Safety of Multimodal vs Unimodal AI Models: A Review of Evidence and Challenges
Salah (Sam) Hammouche    Beyond Regulatory Compliance: The RCR College Report
Nee Ling Wong    How Would AI Work with Us in Healthcare
Jiayang Zhang    Interdisciplinary Multimodal AI Research (slides)
Martin Callaghan    Multimodal AI for Enhanced Information Extraction from Complex HPC Documentation (slides)
Madhurananda Pahar    CognoSpeak: An Automatic, Remote Assessment of Early Cognitive Decline in Real-world Conversational Speech (slides)
Hubin Zhao    Wearable Intelligent Multimodal Neuroimaging for Health
Peter Charlton    Understanding Determinants of Health: Leveraging Routinely Collected Data

Posters

Name    Title
Douglas Amoke    Geo-located Multimodal Data for Maritime Downstream Tasks
Sedat Dogan    Enhanced Multimodal Learning for Meme Virality Prediction
Wenrui Fan    MeDSLIP: Medical Dual-Stream Language-Image Pre-training for Fine-grained Alignment
Lucas Farndale    Super Vision without Supervision: Self-Supervised Learning from Multimodal Data for Enhanced Biomedical Imaging
Yan Ge    Multimodal Multi-Task Asset Pricing with Numeral Learning
Ruizhe Li    Large Language Models are Efficient Learners of Noise-robust Speech Recognition
Xianyuan Liu    Exploring Multimodal AI beyond Vision and Language
Sabrina McCallum    Learning Generalisable Representations for Embodied Tasks with Multimodal Feedback
Luigi Moretti    Integrating Affective Computing and Smart Sensing into Treatment Pathways for Anxiety Disorders
Madhuranand Pahar    CognoSpeak: An Automatic, Remote Assessment of Early Cognitive Decline in Real-world Conversational Speech
Mohammod Suvon    Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection
Ruyi Wang    Multimodal Affects Spreading and Development
Jiawei Zheng    Process-aware Human Activity Recognition

Organising Committee

Avatar

Haiping Lu

Head of AI Research Engineering, Professor of Machine Learning, and Turing Academic Lead

Avatar

David Clifton

Professor of Clinical Machine Learning, Department of Engineering Science, University of Oxford & Turing Fellow, Alan Turing Institute

Avatar

Shuo Zhou

Deputy Head of AI Research Engineering, and Academic Fellow in Machine Learning

Avatar

Tingting Zhu

Associate Professor in AI for Digital Health, University of Oxford

Avatar

Donghwan Shin

Lecturer in Testing, School of Computer Science

Avatar

Peter Charlton

British Heart Foundation Research Fellow, University of Cambridge

Avatar

Xianyuan Liu

Assistant Head of AI Research Engineering, and Senior AI Research Engineer

Avatar

Chen Chen

Lecturer in Computer Vision, School of Computer Science

Avatar

Zhixiang Chen

Lecturer in Machine Learning, School of Computer Science

Avatar

Emma Barker

Centre Manager of Centre for Machine Intelligence

Avatar

Xi Wang

Lecturer in Natural Language Processing, School of Computer Science

Avatar

Sina Tabakhi

PhD Student, School of Computer Science

Administrative and Technical Support

Avatar

Kate Jones

Administrative Assistant for the Centre for Machine Intelligence

Avatar

Mohammod Suvon

AI Research Engineer

Partners

This workshop is brought to you by the Turing Interest Group on Meta-Learning for Multimodal Data (welcome to sign up) and the Multimodal AI Community (welcome to subscribe to our Google Group) supported by the Centre for Machine Intelligence at the University of Sheffield.

Click to show more details
CMI Logo ATI Logo

Disclaimer: This event is supported by The Alan Turing Institute. The Turing is not involved in the agenda or content planning.

Our key sponsors

Contact Us

Click to show more details