Nature Machine Intelligence · 2025

Multimodal AI Landscape Explorer

This page presents data visualisations based on the Nature Machine Intelligence Perspective paper Towards deployment-centric multimodal AI beyond vision and language. A preprint version is available at arXiv:2504.03603.

The analysis covers arXiv preprints from 2019 to 2025 and highlights emerging research trends in multimodal AI, using data derived from the Kaggle arXiv Dataset.

The data is updated annually, with the latest update in January 2026. The reported numbers should be interpreted as indicators of overall trends rather than exact totals.

Coverage: 2019 – 2025 Last updated February 2026 ★ Star on GitHub
Fig. 1 · Yearly growth of multimodal AI preprints ?

Bars (left axis) show total multimodal AI preprint volume per year. The dashed line (right axis) shows multimodal AI as a proportion of all AI preprints identified in the dataset.

Fig. 2 · Multimodal AI preprints by modality ?

Vision and Language consistently dominate. Audio, Sensor, Graph, and Tabular modalities show emerging — and accelerating — growth from 2022 onwards.

Fig. 3 · Preprints by number of combined modalities ?

Pairwise combinations remain the most common. Triple-modality papers are growing fastest proportionally, reflecting a trend towards richer and more complex multimodal systems.

Fig. 4 · Pairwise, triple, quadruple, and quintuple modality combinations ?

Across all combination types, Vision & Language pairings dominate. The "Others" category captures novel pairings that do not involve Vision or Language as a primary modality — an area of growing research interest.

Fig. 5 · Modality pairs ?
Top 

Vision & Language is by far the most common pairing. Filtering by a specific modality reveals which pairings are most or least explored relative to it.

Fig. 6 · Underexplored modality combinations ?

These underexplored combinations often involve non-standard modalities such as Sensor, Graph, Tabular, or Spatial data. Some show rapid recent growth, indicating emerging research directions.