Student Research Projects
During the Summer School, students will have the exciting opportunity to collaborate on group projects, with a maximum group size of 8. The projects will intend to utilize high performance computing resources to train and evaluate latest machine learning approaches on challenging data sets from the Earth and Climate Sciences.
The idea behind the projects is highly challenging: students are tasked with proposing and implementing modifications to known algorithms and solutions on a provided dataset in collaboration with their challenge leader. The top three projects selected by a jury will be awarded with the best-project-award, accompanied by a formal certificate and recognition from the ELLIS summer school.
Projects will come from one of four areas:
A) Hybrid Modeling
B) Robustness (incl. uncertainty, cross-validation, probabilistic modeling)
C) Interpretability
D) Large-scale deep learning
OUR TUTORS
Bernhard Ahrens
Lazaro Alonso
George Athanasiou
Nuno Carvalhais
Sujan Koirala
Manuel Álvarez Chaves
Kai-Hendrik Cohrs
Gregory Duveiller
Fabian Gans
Miguel-Ángel Fernández-Torres
Maximilian Gelbrecht
Philipp Hess
Chloe Hopling
Yuming Jin
Ioannis Prapas
Christian Reimers
Alexander Winkler
Gideon Stein
Lily-belle Sweet
Yinglin Tian
Qi Yang
Xin Yu
Title:
EasyHybrid.jl Unlocked: Bridging Neural Networks and Process-Based Models
Abstract:
Scientific modeling has long faced a trade-off: choose between opaque, „black box“ neural networks with limited interpretability or rigid process-based models that often struggle to capture complex, real-world phenomena. Hybrid modeling bridges this gap by integrating machine learning and scientific knowledge. This workshop introduces EasyHybrid.jl, a user-friendly Julia package designed to make hybrid modeling accessible to researchers across disciplines. Participants will learn to build models that combine neural networks with process-based models. This approach allows for AI predictions rooted in process understanding while maintaining flexibility and performance. Through hands-on coding of a case study, participants will explore the temperature sensitivity of ecosystem respiration across an array of Fluxnet sites. The carbon balance of these ecosystems were measured with the Eddy Covariance technique. Participants will explore how to partition the carbon balance into its components photosynthesis and ecosystem respiration, while estimating the temperature sensitivity of ecosystem respiration at the same time. Participants will study if the temperature sensitivity varies between ecosystems or does converge to a global value. No prior experience with Julia or advanced machine learning is required. By merging domain knowledge with modern AI techniques, participants will walk away with tools to create models that are not only accurate and adaptable but also scientifically meaningful and transparent.
Bio – Lazaro:
Lazaro is a physicist currently working at the Max Planck Institute for Biogeochemistry in Jena in the Modelling Interactions in Soil Systems project group and in the Model Data Integration Group. He is interested in hybrid model-based approaches to climate sciences as well as scientific visualization. His other interests include complex networks, graph neural networks, and time series analysis. He is a coauthor of the Julia Data Science book (https://juliadatascience.io), main contributor to https://beautiful.makie.org/, and https://github.com/LuxDL/DocumenterVitepress.jl among other open projects.
Bio – Bernhard:
Bernhard is Project Group Leader of Modelling Interactions in Soil Systems within the Department of Biogeochemical Integration at the Max Planck Institute for Biogeochemistry in Jena, where he is part of the Soil Biogeochemistry research group. His work focuses on advancing the understanding of soil processes and their role in the Earth’s biogeochemical cycles, with an emphasis on the complex interactions between physical, chemical, and biological factors in soils. Bernhard applies and develops process-based and data-driven models to investigate soil carbon and nutrient dynamics across scales, integrating field observations, laboratory experiments, and remote sensing data. He is part of the USMILE project, which aims to improve predictions of soil–climate feedbacks under changing environmental conditions. Through his research, he contributes to bridging fundamental soil science with large-scale Earth system modelling to better inform climate projections and sustainable land management strategies.
Title:
Kuro Siwo Challenge: Robust Flood Mapping Under Environmental Shifts
Abstract:
This challenge focuses on developing models that are robust to distribution shifts induced by diverse environmental conditions across the globe. Using the Kuro Siwo dataset—a globally distributed collection of annotated and unlabeled flood events represented as multi-temporal SAR image triplets (two pre-event and one post-event, in both GRD and SLC formats)—participants are tasked with training models that generalize across unseen geographic regions and climate zones. The core objective is to simulate real-world scenarios where large volumes of unlabeled SAR data are available, but labeled examples are scarce. To achieve this, participants are encouraged to explore self-supervised and semi-supervised learning strategies that leverage temporal context to extract resilient, transferable representations. The downstream task is semantic segmentation of flooded areas, permanent water, and non-water regions, with fine-tuning limited to a small subset of labeled events and testing conducted exclusively on novel environments. This setup places emphasis on learning spatiotemporal dynamics that can withstand shifts in environmental distributions, making it a realistic benchmark for geospatial domain adaptation. By focusing on temporal structure and minimal supervision, the challenge seeks approaches that push the boundaries of time-series modeling, efficient fine-tuning, and generalization under environmental variability. The unique design of the Kuro Siwo dataset offers a rich testbed to evaluate foundational learning techniques and build models that are robust, adaptive, and ready for global-scale deployment in disaster monitoring and environmental change detection.
Bio:
George is a postdoctoral researcher at the National Technical University of Athens (NTUA) within the School of Rural, Surveying and Geoinformatics Engineering. He holds degrees in Mechanical & Aeronautical Engineering (University of Patras), Computational Engineering (Technical University of Munich), and Analytic Philosophy (University of Barcelona), as well as a PhD in Computer Science with a focus on deep learning from the IIIA-CSIC in Spain. His research centers on deep learning and artificial intelligence, with a strong interest in socially relevant applications such as extreme weather events, environmental monitoring, and healthcare. Currently, he focuses on integrating causality theory into deep learning models to enhance interpretability and robustness. Georgios has conducted research across Germany, Spain, and Denmark, and contributed to AI-driven innovation in both academia and industry. He is an alumnus of the DAAD and Marie Skłodowska-Curie programs and is passionate about using AI to address pressing global challenges.
Title:
Learning spatial controls of ecosystem functional properties
Abstract:
Land carbon and water fluxes shape the feedback between terrestrial ecosystems and climate, yet traditional land models remain hampered by structural error and equifinality. Hybrid models—embedding machine‑learning (ML) modules inside mechanistic frameworks—address several of these gaps by combining physical consistency with data‑driven flexibility. So far, pioneering work linking process knowledge and ML has already demonstrated superior realism across scales, while underlining the need for richer observations to resolve coupled C–H₂O dynamics. This is demonstrated by the limitation in learning the spatial and temporal controls of parameters that modulate the responses of ecosystems to weather and climate variability. The challenge lies in the need for intensive and long-term observations that underpin robust and comprehensive representations of ecosystem functioning. Although hundreds of locations with such observations exist worldwide, we still observe significant limitations in parameter generalization, consequently limiting our ability to predict ecosystem function. The challenge here is to overcome the previous generalizability in predicting carbon and water fluxes using a hybrid modelling approach. Based on a global open dataset and the SINDBAD hybrid modelling framework, the project will be open to a wide range of approaches towards generalization, from different ML architectures to the ingestion of foundation models.
Bio – Nuno:
Nuno is the group leader of the Model Data Integration group at the Max Planck Institute for Biogeochemistry in Jena. His research focuses on improving the understanding of the carbon cycle using remote sensing Earth Observation, ecosystem models, as well as data-driven methods such as model-data fusion and hybrid machine learning methods.
Bio – Sujan:
Sujan is a Senior Scientist at the Max Planck Institute for Biogeochemistry in Jena. His research expertise includes diagnostic (data/machine learning-based) and prognostic (physics-based) data-driven modelling and model-data integration of the terrestrial carbon and water cycles. His research experience encompasses models of different complexities as well as the domain of applications from local to global scales.
Title:
Development and diagnostic evaluation of hybrid hydrological models
Abstract:
In the context of rainfall-runoff modeling, hybrid models have been proposed as an approach to combine the predictive capacity of data-driven methods with the interpretability of conceptual physics-based hydrological models. Recent work has shown that while this idea sounds promising, the effectiveness of hybrid models in this context often stems from the data-driven component overcompensating for the shortcomings of its physics-based counterpart rather than achieving true synergy between these two approaches. In this project, we will conduct a model intercomparison experiment in which we will develop candidate hybrid hydrological models and compare them against a purely data-driven baseline (LSTM) for predicting streamflow in a subset of the CAMELS-GB dataset. The candidate models will be evaluated on their predictive capacity, interpretability, and behavior, along with a quantitative metric that assesses the contribution of the data-driven component to the model’s overall performance. By comparing these relative contributions, we can evaluate how effectively different hybrid architectures take advantage of their prescribed physics-based components. Participants will learn a typical machine learning workflow to train and evaluate data-driven methods (LSTM) and hybrid hydrological models. Moreover, they will develop critical thinking skills for evaluating claims about the benefits of integrating these two approaches for rainfall-runoff modeling and hydrology.
Bios:
Eduardo and Manuel are civil engineers from Costa Rica, currently working as researchers at the Karlsruhe Institute of Technology and the University of Stuttgart, respectively. Their professional experience includes projects in hydrological and hydrodynamic modeling, as well as infrastructure design. Their doctoral research focuses on investigating advanced machine learning techniques for rainfall-runoff modeling, with particular emphasis on evaluating the validity and practical benefits of the hybrid approach in this type of modeling.
Title:
Uncertainty-Aware Deep Learning for Carbon Flux Partitioning: A Hybrid Modeling Challenge
Abstract:
Carbon flux partitioning is the task of separating measured net ecosystem exchange (NEE) into gross primary production (GPP), i.e., the CO2 that the ecosystem takes up, and ecosystem respiration (RECO), which is the CO2 that an ecosystem emits. It is a cornerstone for understanding the terrestrial carbon cycle as these separated fluxes form the base of a wide range of downstream analyses, from climate feedback studies to the calibration of land models and ecosystem management. With deep learning (DL) methods increasingly used to estimate GPP and RECO from eddy covariance measurements, the pressing question becomes: when can we trust these predictions? Addressing this requires robust approaches to uncertainty quantification (UQ) that go beyond point estimates to provide interpretable and actionable confidence measures. In this research challenge, we explore and evaluate state‑of‑the‑art UQ techniques for DL in the context of carbon flux partitioning. We will experiment with methods such as Bayesian methods, density networks and conformal prediction, embedding them into hybrid modeling workflows that combine data-driven and process-based insights. Using a benchmark dataset that integrates synthetic data with known ground truth (spanning diverse biomes, environmental conditions, and extreme events) and realistic, noisy, gappy observational records, we will assess how well these UQ techniques characterize epistemic and aleatoric uncertainties. Key questions to be addressed include:
- How well calibrated are the model’s predicted uncertainties?
- Do data scarcity and observational gaps appropriately increase uncertainty estimates?
- How do uncertainty estimates behave during extreme meteorological and ecological events?
- Can Bayesian priors embed ecological knowledge to improve model robustness and interpretability?
Bio:
Kai is a doctoral researcher at the University of Valencia and a guest researcher at the Max Planck Institute for Biogeochemistry in Jena. His work sits at the intersection of machine learning and Earth system science, with a focus on integrating scientific knowledge into AI models. He develops robust, interpretable techniques, particularly hybrid models that combine machine learning with physical principles, where he addresses challenges such as uncertainty quantification and equifinality. Through this, he aims to deepen our understanding of ecosystem processes and carbon dynamics, making machine learning a more reliable and insightful tool for environmental research.
Beyond his core projects, Kai’s research interests span Bayesian inference, equation discovery, causality, computer vision and foundation models. He is especially motivated by applications in climate modeling, the study of extreme weather events and understanding their broader societal impacts.
Title:
Characterization and Tracking of Cloud Movements using Geostationary Satellite Image Time Series
Abstract:
Clouds represent one of the most uncertain components of the climate system. On the terrestrial side, the effects of land cover on cloud formation through land-atmosphere interactions is still understudied. For that, a key challenge is to identify and isolate in a robust way which near-surface clouds have had an influence from land on their formation. This task focuses on the analysis of cloud formation and movement using high-quality observations from geostationary satellites, which capture images every 15 minutes. The observed signals are complex, comprising multiple cloud layers. This challenge aims to disentangle these local contributions while adhering to physical constraints, thereby enhancing our understanding of cloud dynamics and their interaction with the Earth’s surface.
Bio Fabian:
Fabian is a Project Group leader at the Max Planck Institute for Biogeochemistry in Jena with a background in physics. He strives to improve our understanding of Biogeochemical cycles by developing and improving state-of-the art data analysis tools that bring together large datasets from diverse sources such as remote sensing, climate models, in-situ observations. He is author and maintainer of a list of commonly used Julia libraries for large-scale geospatial data analysis.
Bio Greg:
Greg is a Project Group leader at the Max Planck Institute for Biogeochemistry in Jena with a background in forestry and remote sensing. His main research aims at improving our understanding of the role of terrestrial ecosystems in the Earth System by using data-driven yet process-based thinking applied to satellite Earth Observation data. A key focus is on exploring the complexity and diversity of terrestrial ecosystems, and how their specific functional properties affect land-atmosphere interactions. He has published over 75 peer-reviewed papers in multiple subfields of Earth system science and Earth observation.
Title:
Explainable Artificial Intelligence for Multimodal Natural Disaster Impact Assessment
Abstract:
This challenge focuses on using deep learning and explainable AI (XAI) to assess natural disaster impacts by fusing high-resolution satellite imagery with climate variables such as precipitation, temperature, soil moisture, and vegetation indices. Participants will develop models that are both accurate and interpretable, aiming to uncover how environmental conditions contribute to damage from events like floods, wildfires, hurricanes, and volcanic eruptions. The task involves designing deep learning architectures that integrate multimodal data to detect and characterize the extent and severity of disasters. Participants are encouraged to explore both ante-hoc (e.g., attention mechanisms) and post-hoc (e.g., saliency maps, surrogate models) explanation techniques to identify critical predictive factors. The challenge emphasizes temporal modeling to capture changes before, during, and after events. Datasets include real-world pre- and post-disaster satellite image pairs and corresponding climate data. The computational environment will be based on Python and PyTorch, with access to tools supporting deep learning and XAI workflows. Basic Python skills and familiarity with neural networks are recommended. The challenge fosters hands-on, collaborative research advancing transparency in AI for disaster response.
Bio:
Miguel-Ángel received his B.Sc. in Audiovisual Systems Engineering (2013), M.Sc. in Multimedia and Communications (2014), and Ph.D. in Multimedia and Communications (2019) from Universidad Carlos III de Madrid, Spain. He is currently an Assistant Professor in the Signal Theory and Communications Department at the same university, after a postdoctoral stage at the Image and Signal Processing Group from Universitat de València (2020-2024). Miguel-Ángel`s research focuses on Machine Learning for Earth and Climate Sciences, particularly on the development of deep generative models and attention mechanisms for anomaly and extreme event detection. He is also interested in explainable AI applied to environmental monitoring. Previously, he contributed to projects in Computer Vision, including spatio-temporal visual attention modeling, image and video analysis, and medical image classification. Miguel-Ángel has gained international experience through academic and research stays at Technische Universität Wien (Austria, 2013), the Visual Perception Laboratory at Purdue University (USA, 2016), the International Future Lab AI4EO at Technische Universität München (Germany, 2022), and the Department of Artificial Intelligence at Fraunhofer Institute for Telecommunications (Germany, 2025).
Title:
Learning subgridscale parametrizations in a atmospheric toy model
Abstract:
The parametrization of subgrid-scale physics is a source of major source of uncertainties in atmospheric and ocean global circulation models (AO-GCMs). Machine learning showed tremendous potential in learning these sub-grid parametrizations to improve the representation of unresolved processes in AO-GCMs. This is achieved by an ML model that is trained on high-resolution data to learn the effect this high-resolutions has on the course target-resolution. Once trained, a hybrid ML model can be set up which combines a course resolution physical model with the learned ML subgrid-scale parametrization. In this project we will learn how to set up, train and evaluate such a ML subgrid-scale parametrization on the Lorenz96 atmospheric toy model.
Bio:
Maximilian is a researcher at the interface of Earth System Modelling, Nonlinear Dynamics and Machine Learning research. After studying physics at Humboldt-University Berlin, he worked at the Humboldt University Berlin, the University of São Paulo, the Free University Berlin, the Technical University of Berlin and as a guest scientist at the Potsdam Institute for Climate Impact Research. After starting to explore the combination of nonlinear dynamics, complex systems and machine learning in his PhD, he now applies ML to Earth System Modelling, mostly in the form of differentiable programming. Maximilian is passionate about open source development, e.g. as one of the core developers of SpeedyWeather.jl, an open atmospheric model, with the hope to make climate models easier to use, accessible and open to data-driven methods and machine learning.
Title:
Physically consistent reconstruction of spatiotemporal dynamics with generative diffusion models
Abstract:
Inverse problems, such as reconstructing spatiotemporal dynamics from incomplete, sparse observations, are highly relevant in many scientific disciplines, for example, in fluid dynamics, astrophysics, or meteorology. Generative machine learning methods, and in particular video diffusion models, have emerged as powerful tools for such tasks due to their flexibility to include additional information during the sample generation. This research challenge explores how spatiotemporal dynamics can be reconstructed from sparse measurements while adhering to physical principles of the system. We will start exploring this in an idealized setting with a diffusion model that is pretrained on 2D Navier-Stokes turbulence, with possible extensions to other systems.
Bio:
Philipp is a postdoctoral researcher at the Potsdam Institute for Climate Impact Research and the Technical University of Munich. His work focuses on applying machine learning to improve weather and climate predictions, with a particular interest in adapting generative methods from image and video processing. Philipp develops and tailors these approaches for applications such as emulating dynamics, data reconstruction, bias correction and downscaling, and subgrid-scale parameterizations.
Title:
Using machine learning methods to forecast drought in East Africa with multi-modal remote sensing data
Abstract:
Droughts are a recurring global climate hazard that incur human, economic and environmental costs. In Eastern Africa, pastoralist communities whose livelihoods depend on the availability of pasturelands are particularly vulnerable to the impacts of drought. In-situ observations of vegetation condition in East Africa are sparse, the wealth of earth observation data available provides an opportunity to monitor and forecast droughts on a large scale. This challenge explores developing a machine learning model that forecasts vegetation condition, with the view that, equipped with reliable forecasts decision makers can take anticipatory action and reduce the impacts felt. The model input will be a multi modal Earth observation dataset including: vegetation condition indicators, meteorological variables (e.g. rainfall, temperature), land cover classification and topographical features.
Bio:
Chloe is a Postdoctoral researcher at the University of Sussex, United Kingdom. Her research focuses on applying machine learning techniques combined with remote sensing data to improve forecasting of drought onset in pastoralist regions of Eastern Africa. By developing advanced predictive models, Chloe aims to provide early warning systems that can help mitigate the impacts of drought on vulnerable communities and livestock-dependent livelihoods. Her work bridges environmental science, data science, and sustainable development, contributing to climate resilience in drought-prone areas. Chloe’s interdisciplinary approach combines expertise in machine learning algorithms, satellite data analysis, and field knowledge of socio-ecological systems. Through collaborations with international research groups and stakeholders, she seeks to enhance the accuracy and usability of drought forecasts, supporting adaptation strategies and policy-making. Chloe’s research ultimately strives to foster more sustainable management of natural resources under changing climatic conditions.
Title:
Simulating atmospheric CO2 forward transport with deep learning
Abstract:
Understanding the transport of atmospheric CO2 is fundamental to tracking anthropogenic emissions, assessing biosphere–atmosphere exchanges, and supporting international climate policy frameworks such as the Paris Agreement. Atmospheric transport models are needed to link CO2 observations to emissions and natural sinks. Traditional models, however, are computationally intensive, especially at high spatial resolution and over long temporal horizons, limiting their usability in operational Monitoring, Reporting, and Verification (MRV) systems. Recent advances in deep learning have demonstrated remarkable success in emulating complex physical systems, particularly in weather forecasting. Yet, applying such models to simulate CO₂ transport presents unique challenges, including maintaining physical constraints such as mass conservation and ensuring stable predictions over extended time periods. Inspired by Benson et al. (2025), we propose a group project to explore the use of the SwinTransformer architecture for forward simulation of atmospheric CO2 transport, leveraging the CarbonBench benchmark dataset. Our objectives are threefold: (1) to implement and train a SwinTransformer emulator for offline CO₂ tracer transport; (2) to assess its fidelity in reproducing the spatiotemporal evolution of atmospheric CO₂ compared to traditional numerical models; and (3) to evaluate its long-term stability and physical realism. To interpret model behavior, we will conduct sensitivity tests including ablations of physics-informed constraints (e.g., mass fixers, spectral regularization), input perturbations, and resolution changes. These tests will help elucidate how model performance depends on architectural and physical design choices. Through this project, we aim to deepen our understanding of how machine learning models can emulate tracer transport, and contribute to ongoing efforts in building efficient, high-resolution tools for greenhouse gas monitoring.
Bio:
Yuming is an ASP Postdoctoral Fellow at the NSF National Center for Atmospheric Research (NCAR) in the Earth Observing Laboratory. He earned his PhD from the Scripps Institution of Oceanography at the University of California San Diego in December 2023. Yuming´s research centers on understanding the global carbon cycle through long-term measurements of CO₂ and O₂. He integrates diverse observational approaches—ranging from aircraft campaigns to in-situ monitoring—with climate models and idealized modeling frameworks to uncover the mechanisms driving climate and ecosystem change.
Title:
Generative AI to communicate risk
Abstract:
Even with actionable forecasts of risks, impact mitigation can fail if the risk is not communicated properly. For example, in Germany (2021) and Spain (2024), the devastating effects of floods, with more than 200 fatalities each, were related to ineffective communication of warnings [1].
Generative AI can help personalize messages and answer user questions based on data provided by early warning systems. Such examples have been shown in prototypes such as FloodBrain [2], ClimateSight [3], and ChatClimate [4].
In this challenge, the goal is to provide textual descriptions of fire danger. The fire danger maps will come from an AI-based system currently deployed in Greece, which predicts danger better than existing operational tools [5]. Providing textual descriptions aims to enhance the reliability of the tool and ease the communication to fire responders and decision-makers.
Bio:
Ioannis is the founder and CEO of Manteo AI, a chat-map assistant that simplifies access to geospatial insights. He is completing his PhD at the Orion Lab (NTUA) and the University of Valencia, where he has developed large scale datasets and deep learning models for wildfire forecasting at different spatio-temporal scales. Ioannis has led the research in projects funded by ESA and NASA and has presented work at top-tier AI conferences (NeurIPS, ICCV) and geoscience journals (AGU GRL, Nature Communications,). His background includes an MEng in Electrical and Computer Engineering at NTUA, and a joint MSc in Big Data across Brussels, Barcelona, and Berlin. Additionally, he has worked for 2 years as a Software Engineer at CERN and several years as a freelance Data Scientist.
Title:
Infuse Machine-Learned Land Phenology in a Global Climate Model
Abstract:
Phenology in a Global Climate Model In Earth system science, machine learning is a promising tool, especially where traditional process-based methodologies have encountered significant obstacles. Two of these obstacles are processes for which we lack a clear understanding of the numerical structure of the relationships and a scale mismatch between what we aim to simulate and what can be experimentally investigated in a laboratory setting. One example is plant phenology, which refers to the timing of recurring events in the seasonal cycle of plants. In this context, we are particularly interested in vegetation greening and browning, as measured by the leaf area index (LAI). The LAI quantifies the leaf area that is photosynthetically active and absorbs CO2 from the atmosphere. Predicting the seasonal development of LAI in response to various meteorological drivers embodies the two obstacles described earlier: First, our knowledge of the relationship between meteorological factors and the LAI response remains limited, which causes a key uncertainty in global land-climate modeling. Second, while we can expose individual plants to different meteorological conditions in a laboratory and record their LAI, it remains unclear how to scale these findings to an ecosystem level. We tackle these problems in two connected projects, where both aim at an integration of data-driven phenology models in a state-of-the-art climate model. Overall, the goal is to predict LAI from a remote sensing product in dependence of meteorological conditions taken from reanalysis products, and integrate the resulting numerical predictions with the Max Planck Earth system model (ICON-XPP). The first group will conduct a symbolic regression, a method that provides a mathematical formula for the relationship between meteorology and LAI, which can then be directly implemented in the Fortran code of the climate model, whereas the second group will train a neural network and connect it to the climate model using a recently developed bridge based on the HTTP protocol. In the end, both teams will obtain global climate simulations with customized data-driven phenology models!
Bio Alex:
Alex is an Earth system scientist specializing in the intricate interplay of the atmosphere and the biosphere, and associated climate feedbacks. He completed his PhD in Geoscience with an emphasis on Earth system modeling at the Max-Planck-Institute for Meteorology and the University of Hamburg in 2019. Subsequently, he undertook a Postdoctoral role within the „Climate, Climatic Change, and Society“ (CLICCS) Cluster of Excellence. In 2020, he joined the Max-Planck-Institute for Biogeochemistry in Jena, as part of the ERC Synergy Grant „Understanding and Modelling the Earth System with Machine Learning (USMILE).“ In 2021, Alex initiated the establishment of the Research Group „Atmosphere-Biosphere Coupling, Climate, and Causality“ at the Max Planck Institute for Biogeochemistry. He received the Feodor-Lynen Fellowship from the Alexander-von-Humboldt foundation, facilitating collaborative research at the Scripps Institution for Oceanography, University of California, San Diego. His research revolves around discerning feedback loops and causal connections governing the exchange of CO2, water, and energy between the land biosphere and atmosphere, particularly under the context of rising atmospheric CO2 levels. His research employs an array of models, ranging from basic conceptual frameworks to complex Earth system models. Methodologically, Alex emphasizes the application of machine learning to Earth system research, prioritizing interpretability for enhanced process understanding through harnessing a diverse spectrum of Earth observation data. Specifically, he pioneers hybrid models that fuse data-driven and mechanistic approaches in Earth system modeling.
Bio Christian:
Christian is a postdoctoral researcher at the Max Planck Institute for Biogeochemistry in Jena, Germany. Since 2024, he has headed the project group „Adapting Machine Learning to Earth’s System Science“. This group adapts neural network architectures developed for computer vision or natural language modeling to the unique challenges of Earth’s system science. After completing his master’s degree in mathematics from the University of Göttingen in 2017, he earned his PhD in computer science from the Friedrich Schiller University Jena in 2023. During his PhD, he developed methods based on causal inference to determine whether a high-level feature is used by a neural network, and a method to debias neural networks based on this approach.
Title:
Decoding Cause and Effect in Earth’s Complex Systems
Abstract:
Ready to take a dive into Causality? Current causal discovery methods, so methods that try to uncover causal relationships between variables from observational data, often struggle with the noisy and dynamic nature of real-world data sources. Your mission, should you choose this project, is to investigate the field of causal discovery and push the state-of-the-art! How? Our target will be to adapt, combine, or even invent causal discovery techniques to outperform established causal discovery approaches specifically for real-world data scenarios. To do this, we will work with the largest causal discovery benchmark from time series (concerned with river discharge) and attempt to improve on the current leaderboard. Here, topics like increasing robustness to noisy data and domain adaptation might be particularly beneficial. However, other creative approaches are highly encouraged. Finally, by choosing this project, you will not only have the opportunity to wrestle with a challenging unsolved research benchmark but also acquire an understanding of how to tackle complex problems in even more complex data scenarios.
Bio:
Gideon is a PhD candidate in the Computer Vision Group at FSU Jena, where he investigates how to quantify causal relationships in real-world data sources. His research, conducted in cooperation with the German Center for Integrative Biodiversity Research (iDiv), focuses on the deployment and development of causal discovery algorithms to answer complex ecological questions. He holds a Master’s degree in Machine Learning from ITMO University with a specialization in Reinforcement Learning and NLP. Additionally, Gideon has experience in remote sensing applications, including large-scale infrastructure monitoring, and maintains a broad interest in Deep Learning and related fields.
Title:
Making robust projections of agricultural yields under climate change
Abstract:
Current projections of global crop yields under climate change are highly uncertain due to the complex relationships influencing plant growth. Machine learning (ML) models have excelled in yield forecasting and crop mapping, and the agricultural modelling community is keen to explore their use in other applications. However, over decadal timespans, climate change induces a distribution shift which can severely degrade model performance, and without ground truth, projections cannot be validated. In this project, by using simulated data from a mechanistic crop model, we will explore the strengths and weaknesses of ML approaches to capture a similar data-generating process and generalise to future decades in a worst-case climate change scenario. Participants will use the FutureCrop benchmark dataset, which provides the data used to force a mechanistic crop model (daily multivariate climate variables, soil characteristics, nitrogen fertilisation rate and ambient CO2 concentration) and the resulting global gridded maize and wheat yields. Models are trained on data covering 1980-2020 and are tested on their ability to predict annual yields per gridcell over 2021-2100. Model performance is evaluated using multiple metrics at different spatial and temporal scales, reflecting a range of research questions and stakeholder concerns that climate impact projections can be used to address. Despite over 300 submissions to the challenge in 2024, the task is far from solved. Model performance is consistently observed to strongly degrade over time, and, in many metrics of interest, no submitted model has achieved an acceptable score. However, there has so far been limited effort to explore state-of-the-art robust or causal methods. In this project, participants can apply tools learned during the summer school or seek out and test other approaches according to their interests. Alternatively, the dataset can be used as a testbed for questions related to transfer learning, uncertainty analysis or spatiotemporal ML. To get the group off to a flying start, notebooks for processing the data and implementing baseline models are already available. Participants are encouraged to present their findings to the wider AgML community, and, if interested, there are opportunities to extend the project beyond the summer school.
Bio:
Lily-belle is a doctoral researcher at the Helmholtz Centre for Environmental Research (UFZ), where she develops interpretable machine learning methods to identify compound meteorological drivers of agricultural yield failure. Her work focuses on understanding how combinations of weather events—beyond just extreme heat or drought—can lead to significant crop losses. Lily-belle is part of the Compound Weather and Climate Events group led by Dr. Jakob Zscheischler and contributes to the COMPOUNDX project, which explores the use of AI to uncover complex, nonlinear relationships between climate variables and their societal impacts. Before starting her PhD in 2021, Lily-belle worked as a data scientist
Title:
When and How to Use Explainable AI?
Abstract:
Machine learning is now widely used in earth and climate science. Explainable AI (XAI) methods, such as SHAP, are often applied to interpret model results and link them to physical processes. But how reliable are these explanations? In this project, we will explore how different factors, such as model performance, feature dependencies, and nonlinear effects, affect the stability and usefulness of XAI outputs. Through controlled experiments with synthetic datasets, we aim to build a clearer understanding of when these tools can be trusted. This project is for students who want to better understand how to use machine learning tools responsibly in climate and earth science research, and how to assess the trustworthiness of model-based explanations.
Bio:
Yinglin is a postdoctoral researcher at the Potsdam Institute for Climate Impact Research (PIK), Germany. She received her PhD in Hydraulic Engineering from Tsinghua University in Beijing. Her research focuses on the physical mechanisms and socio-ecological impacts of extreme weather events, particularly heatwaves and droughts. During her PhD, she spent 12 months as a visiting researcher at the Max Planck Institute for Biogeochemistry in Jena and 6 months at the Helmholtz Centre for Environmental Research (UFZ). Currently, she is interested in combining explainable machine learning with physical analyses of surface energy and water balance to better understand complex extremes, including record-breaking, spatially compounding, and rapidly changing (flash) events.
Title:
Enhancing Carbon and Water Flux Upscaling with Knowledge-Guided Machine Learning
Abstract:
Data-driven upscaling of biogenic carbon and water fluxes from eddy covariance (EC) sites to the global scale offers a valuable complement to process-based models for producing global flux estimates. However, significant uncertainties arise during extrapolation due to sparse and unevenly distributed EC sites. The goal of this summer school session is to deepen our understanding of how to integrate domain knowledge — including ecological principles, physical laws, process-based models, or other forms of expert knowledge — with machine learning to create synergistic models for improved carbon and water flux estimation. Several possible strategies for combining knowledge with machine learning are: 1) architecture design, e.g., to hardcode knowledge into the neural network; 2) extra loss function, e.g., add loss function about mass or energy balance; 3) pretraining, e.g., pretrain machine learning model by external or synthetic data. Participants are encouraged to propose their own implementation pathways, with further details refined through group discussions. The GPP and ET estimates produced by these knowledge-guided machine learning models will be compared against a baseline pure machine learning model and validated using independent datasets. Finally, we will collectively summarize our findings and identify the most effective strategies for integrating domain knowledge into carbon and water flux upscaling frameworks.
Bio:
Qi is a postdoctoral researcher at the Max Planck Institute for Biogeochemistry. He earned his PhD in 2021 and works at the intersection of plant–environment–human interactions, focusing on water–carbon dynamics and their representation in ecosystem and agroecosystem carbon cycle models. Qi´s approach combines knowledge-guided machine learning, model–data fusion, and hybrid modeling techniques. Currently, he is investigating uncertainty in carbon flux upscaling and its interannual variability across different plant functional types.
Title:
Extrapolating the Globe: Challenges of Upscaling Sparse Data
Abstract:
Understanding environmental processes across space and time is key to studying Earth system dynamics. While in situ measurements provide high-quality data, their limited and uneven distribution hampers global-scale modeling. To address this, machine learning is often used to upscale local observations with satellite and meteorological data. However, models trained on regionally biased datasets struggle in underrepresented areas due to extrapolation errors—often caused by gaps in meteorological conditions or ecosystem types in the training data. This project will simulate extrapolation challenges using synthetic datasets that mimic underrepresented environmental scenarios. You will evaluate how conventional upscaling methods perform in these settings and identify the drivers of extrapolation errors. To address these issues, you will integrate synthetic global-scale data—designed to represent spatially continuous proxies—into the modeling process. Using techniques like transfer learning, you will test whether these additional inputs can reduce extrapolation errors and improve prediction reliability in data-sparse regions.
Bio:
Xin is a postdoctoral researcher in the Department of Biogeochemical Integration at the Max Planck Institute for Biogeochemistry in Jena, where he also completed his PhD. His research focuses on land–atmosphere interactions, the impacts of climate extremes, and the application of machine and deep learning techniques in Earth system science. During his PhD, Xin investigated drought legacy effects on terrestrial ecosystem carbon cycling using eddy covariance measurements and machine learning. Currently, they are exploring how satellite-based solar-induced chlorophyll fluorescence can improve the spatial generalization of gross primary productivity through transfer learning.