BINGO Work package 2
Climate predictions and downscaling to extreme weather

DECO
–
A plug-in for data extraction and conversion
developed within and for
BINGO
Henning W. Rust, Andy Richling, Edmund Meredith,
Madlen Fischer, Christos Vagenas,
Christopher Kadow and Uwe Ulbrich
Version from July 29, 2016
Abstract
DECO is a plug-in to the Freie Universit?t Berlin Evaluation Framework for Earth System Science (FreVa) and aims at extracting and converting COSMO-CLM regional climate model simulations from a central data storage on demand via a web-based platform. The data is to be used as meteorological driving data for hydrological models at the six BINGO Research Sites (BINGO deliverable D2.1). Currently, the spatial resolution of the regional model is 12km and data will be made available as daily values. The climatology for the simulated meteorological parameters, i.e. their seasonally varying mean (and higher moments) differ from the observed one, a bias correction can be optionally applied before converting the data and file format to the particular needs of the individual modeling groups at the Research Sites. The latter implies a conversion from the native COSMO-CLM grid to station locations or to a different grid, a change of units, as well as writing the data to the desired file format. This on-demand post-processing and conversion approach allows for an efficient data storage, maximal reproducibility and transparency, as well as transferability to new data sets. The application developed here can be accessed via a web-platform or a command line interface. Development is subjected to a strict version control to ensure reproducibility.
Contents
2 Description of the climate simulations
2.1 Model Description
2.2 Description of Simulations
2.2.1 Reanalysis-forced evaluation runs
2.2.2 High-resolution (test) simulations of extremal episodes
2.2.3 MiKlip forced decadal predictions
3 Bias correction
3.1 Reference data
3.2 Seasonal Generalized Linear Model method
3.2.1 Underlying principle and modeling approach
3.2.2 At BINGO Research Sites
3.3 Cumulative Distribution Function Transform method
3.3.1 Quantile-Mapping Method
3.3.2 CDF-Transform Method
3.3.3 Application of CDF-t at BINGO research sites
4 DECO – A BINGO plug-in for FreVa
4.1 FreVa – Freie Universit?t Berlin evaluation system
4.2 Documentation of DECO
4.2.1 Introduction
4.2.2 Preprocessing
4.2.3 Input parameters
4.2.4 Output
5 Summary
Bibliography
Chapter 1
DECO – Aims and strategy
The BINGO Work Package WP2 Climate predictions and downscaling to extreme weather aims at providing high resolution meteorological driving data for various hydrological models. For most models, this includes precipitation, temperature, pressure, wind speed, incoming solar radiation and others. This data is generated on a regional level (for this deliverable at a European level) by dynamically downscaling coarse resolution global data, see also the red box of Fig. 1.1.
To be usable for hydrological models, this data needs to be appropriately post-processed and converted to the needs of the 15 and more individual models used at the 6 BINGO Research Sites.
For transparency and reproducibility within BINGO, we must ensure that the very same driving data is available throughout the project and beyond. Thus, it must be either stored in the various different file formats needed by the individual models, or – to be more efficient on memory consumption – one data set is stored in a common standardized format for climate models and additionally data conversion algorithms are specifically tailored for all the individual hydrological models. Besides the efficient use of memory, there are other advantages to the latter procedure: I) these conversion algorithms can be reused for more data to come with the following deliverables associated with WP2, II) bias correction can be exchange with a more sophisticated one if available, and III) many other available data sets in this standardized format can be used additionally to the data produced particularly for BINGO.
Given these advantages, WP2 leaders decided to develop a hybrid application (command-line-interface and web-based) to extract the data needed for the Research Sites, post-process and convert it to the needs of all individual modeling groups. These conversion routines can also be adapted to incorporate new models or to changing needs of existing models. Furthermore, users can also get back to data generated earlier and extract these using, e.g., a new bias correction method or a slightly changed conversion routine. Additionally to efficiently archiving climate data in a central place, the plug-in developers in WP2 use git1 to ensures a proper versioning of the bias correction and conversion algorithms.
During the development of the plug-in comments were requested directly from the modeling groups at the Research Site. Hydrological modelers were also asked to test the output of the plug-in with their models and review the plug-in during development. As for all pieces of software, also the development of DECO is not finished but open to being continued. Conversion routines for other models could be integrated, as well as new bias correction schemes.
This document describes the data generated with the COSMO-CLM driven by ERA-Interim (Sect. 2), the bias correction used (Sect. 3) and the extraction and conversion algorithms for the individual models at the research Sites (Sect. 4).
Chapter 2
Description of the climate simulations
All simulations described in this chapter have been carried out using the COSMO-CLM regional climate model (Rockel et al., 2008), for a domain centred over Europe. In this chapter, a description of the COSMO-CLM regional model is provided, followed by details of the simulations performed.
2.1 Model Description
The COSMO-CLM (CCLM) is a state-of-the-art nonhydrostatic regional climate model, that is the climate version of the COSMO numerical weather prediction model used by the German Weather Service (DWD). CCLM is developed for climate purposes by the CLM-Community (http://www.clm-community.eu/) and features a software architecture allowing for computational parallelism and system extensibility. It is suitable for a broad spectrum of applications across scales ranging from hundreds of metres to thousands of kilometres. The model components include an atmospheric model directly coupled to a land-surface/soil model, and an aerosol model. Sea surface temperatures must be prescribed as a boundary condition.
2.2 Description of Simulations
2.2.1 Reanalysis-forced evaluation runs
To create a climatology and continuous time series of key hydorological variables, downscaling simulations have been carried out with the CCLM over the EURO-CORDEX domain (Fig. 2.1), with a horizontal resolution of 0.11? (about 12km) and spanning the time period 1979-2015. At the domain lateral boundaries, the model is constrained by the latest generation of reanalyses from the European Centre for Medium-range Weather Forecasting, ERA-Interim (Dee et al., 2011), which provide updated lateral boundary conditions every 6 hours. The entire period covered by the ERA-Interim reanalysis has been downscaled (1979-2015). Initial conditions, sea surface temperature, and sea-ice cover also come from the ERA-Interim reanalysis.
Computational limitations prohibit simulation at the kilometre-scale over such a large domain and time period. A horizontal resolution of 0.11? has thus been chosen, which represents somewhat of a “breakthrough” resolution and has been shown to add significant value to coarser global model output for the simulation of (non-convective) precipitation extremes (Heikkil?. et al., 2011), particularly in mountainous regions, and can modulate the climate change signal of coarse resolution global models (Torma et al., 2015).
The simulations have been carried out via three separate model runs, each of which is continuous during the time period indicated in Tab 2.1 (middle column). Runs 1 and 3 were carried out at the Freie Universit?t Berlin; run 2 had previously been carried out by the CLM community, and we utilise this to keep compuational expense to a minimum. Within each model run, it is necessary to allow sufficient time after initialisation for the spin-up of soil moisture and soil-related processes within the higher resolution CCLM. In each run, a minimum of two months is allowed for spin-up. Model output from this period is thus not included in the final data, i.e. the data to be used for research purposes. The final data thus covers the period March 1979 - July 2015, summarised in Tab 2.1 (right column). The model output can be extended beyond July 2015 if there is agreement amongst project partners and if reanalysis data for that period is available.
For most key BINGO variables, 0.11? CCLM output is available at 3-hourly frequency. The exception to this is precipitation, which is available at hourly frequency. BINGO variables available and their temporal frequencies are summarised in Table 2.2.
Model Run | Period of Run | Data Available |
1 | 01.01.1979 - 01.04.1989 | 01.03.1979 - 01.03.1989 |
2 | 01.01.1989 - 01.01.2009 | 01.03.1989 - 01.12.2009 |
3 | 01.09.2008 - 01.08.2015 | 01.12.2009 - 01.08.2015 |
Variable Description | Variable ID | Frequency | Time Method* | Min/Max† |
Total Cloud Fraction | clt | 3-hr, day | Instantaneous | Daily |
Near-Surface Relative Humidity | hurs | 3-hr, day | Instantaneous | Daily |
Near-Surface Specific Humidity | huss | 3-hr, day | Instantaneous | Daily |
Precipitation | pr | 1-hr, 3-hr, day | Mean | Daily |
Surface Air Pressure | ps | 3-hr, day | Instantaneous | Daily |
Sea Level Pressure | psl | 3-hr, day | Instantaneous | Daily |
Surface Downwelling Longwave Radiation | rlds | 3-hr, day | Mean | Daily |
Surface Downwelling Shortwave Radiation | rsds | 3-hr, day | Mean | Daily |
Near-Surface Wind Speed | sfcWind | 3-hr, day | Instantaneous | Daily |
Near-Surface Air Temperature | tas | 3-hr, day | Instantaneous | Daily |
Near-Surface Dew Point Temperature | tdps | 3-hr, day | Instantaneous | Daily |
Eastward Near-Surface Wind | uas | 3-hr, day | Instantaneous | Daily |
Northward Near-Surface Wind | vas | 3-hr, day | Instantaneous | Daily |
A “day” is defined as 24-hours from 00:00 UTC. All time-stamps in the output data are in UTC.
Note on the interpretation of the time stamps: For daily data, all time stamps are of the form “YYYY-MM-DD 00:00:00” and precipitation values are for that entire day. For sub-daily data, time stamps are of the form “YYYY-MM-DD HH:00:00” and precipitation values are for the interval up to this time. For example, for the 3-hourly data a time stamp of “2015-07-21 09:00:00” would mean precipitation in the 3 hour period 06:00-09:00 UTC on that day.
2.2.2 High-resolution (test) simulations of extremal episodes
Theory and Methods
Extremal weather patterns and individual events of hydrological significance shall be identified
from the 0.11? CCLM simulations, for each of the 6 research sites. These shall be subjected to
more detailed analysis. In the case of extreme precipitation events, which are a phenomenon
with high spatial variability, this involves further dynamic downscaling of the identified events
up to a convection-permitting resolution of 0.02? (2.2 km) to provide better representation
of the dynamics driving any changes in hydrological extremes and hence a more
detailed input for the hydrological models. In convection-permitting simulations,
deep convective processes can be explicitly simulated by the model, where they have
to be otherwise parametrized in lower-resolution simulations, i.e. our 0.11? CCLM
integration. The further downscaling to convection-permitting resolution is a key step,
as recent studies have shown that convection-permitting resolution is essential to
accurately capture the response of convective precipitation extremes to climatic changes
(Kendon et al., 2014; Ban et al., 2015), which can be highly nonlinear (Meredith
et al., 2015).
The issue of spatial spinup - meaning the distance from the lateral boundaries at which
fine-scale features can be achieved - is an important consideration when designing
regional downscaling experiments. For consistency, we intend to use a common domain
for all high-resolution simulations at each research site. As the large-scale forcing
behind individual extremes can come from any side of the domain, we centre our
high-resolution domains over each research site and use a 201 x 201 grid, with 50
vertical levels. This allows at least 100 grid-lengths between the lateral boundaries and
the centre of each research site. Brisson et al. (2015) investigated the impact of
domain size on the simulation of precipitation in convection-permitting models, using a
horizontal grid spacing of 3 km. They concluded that a spatial spinup of at least 40
grid cells is necessary for the realistic simulation of precipitation patterns. For more
detailed discussion of convection-permitting modelling the reader is referred to Prein
et al. (2015).
The Simulations
With the aid of the questionnaire responses from each research site, one extreme
precipitation event has been identified from the 0.11?-CCLM simulations for each site
(excluding Cyprus), and has been further downscaled to 0.02? resolution (2.2 km) with the
CCLM. The events for these test-simulations were subjectively identified from the 0.11?
degree model output, based on the questionnaire descriptions of past extremes at each site and
the 0.11? modelled precipitation.
The output variables are at an hourly frequency and have been made available for the same
parameters as shown in Table 2.2, though there are obviously no daily min/max’s provided. All
data are available through the Freva DECO plugin, and are best accessed by selecting
“test-events” in the experiment field and then the appropriate research site (i.e. Badalona,
Bergen, Tagus, Veluwe, Wupper) in the product field. As of 10-06-2016, the test-events for the
Tagus research site are not yet available through the DECO plugin. We hope to remedy this
asap, after clarifying the input requirements of the particular hydrological model being used for
the site.
Project partners are asked to download the high-resolution test-simulations for their respective
research sites and test the data on their hydrological models. Feedback should then be provided
as soon as possible. The earlier feedack is received, the more likely that any concerns raised can
be satisfactorily addressed. Feedback on the test-simulations is best provided to Edmund
Meredith (edmund.meredith@met.fu-berlin.de).
2.2.3 MiKlip forced decadal predictions
The relatively new field of decadal climate prediction, e.g. Smith et al. (2007), aims to
simulate both the climate response to future anthropogenic forcing and the future evolution
(from the present) of the climate due to internal climate variability (Marotzke et al., 2016).
This differs from the approach taken in climate projections, e.g. the CMIP5 project (Taylor
et al., 2012), where the focus is on the response of the climate to anthropogenic forcing
and the impacts of internal climate variability are (supposed to be) nullified via
multi-decadal climate model integrations. The earth system models (ESMs) used in decadal
prediction systems are initialized with an observed state of the climate system, i.e.
ocean, atmosphere, soil, ice, etc. Skill in predicting internal climate variability on a
decadal scale is derived from the long-term memory (i.e. sensitivity to the initial
state) of certain components of the climate system, predominantly the ocean. As
such, decadal predictions (unlike climate projections) are reliant on a high-quality
initialization of the ESM for those components which exhibit long-term memory.
The MiKlip project (http://www.fona-miklip.de) is funded by the German Ministry for
Education and Research with the aim of developing a world-class decadal prediction
system. The MiKlip decadal prediction system is based on the Max Planck Institute’s
earth system model, MPI-ESM, and has an atmospheric horizontal resolution of T63
(1.875?). The first phase of the MiKlip project showed significant skill (e.g. Mueller
et al. (2012), Pohlmann et al. (2013)) in the MiKlip system based on the evaluation
of decadal hindcast simulations initialized yearly from 1960-2010. In addition to
this, the MiKlip system was also used for future decadal prediction running up to
2024, for 10 realizations and with an initialization in 2015. Module C of the first
phase of the MiKlip project was devoted to the regionalisation of the MiKlip global
model output, via dynamical downscaling. This was carried out over a European
domain for the entire MiKlip period (1960-2024) using the CCLM at 0.44? resolution.
For the BINGO project, the FUB have further dynamically downscaled four realizations of
the future decadal predictions (2015-2024) from 0.44? to 0.11? using the CCLM.
To reduce computational expense, this has been carried out over two sub-domains
(Fig. 2.2):
(1) NW-EUR-11: which contains the research sites at Bergen, Veluwe and Wupper.
(2) IBERIA-11: which contains the research sites at Tagus and Badalona.
The same variables and frequencies as listed in Table 2.2 are available and are downloadable
via the online DECO plugin.
Chapter 3
Bias correction
Typically, systematic differences between the climate model simulation and observed data exist. The most prominent difference is a shift in the mean value. Climate model simulations are thus typically post-processed using a bias correction. This chapter aims at outlining this kind of post-processing. Two different bias correction methods are presented in this chapter: Seasonal Generalized Linear Model method and Cumulative Distribution Function Transform method.
The first Section 3.1 covers the reference datasets used. The Seasonal Generalized Linear Model method is describe in 3.2 followed by a description of Cumulative Distribution Function Transform method in Section 3.3.
3.1 Reference data
Depending on the Research Sites and hydrological models, the reference data for bias correction varies. In case the driving data is requested gauge based and gauges based reference data is available, we use this data for bias correction. For gridded products, we use the WATCH forcing data ERA-Interim (WFDEI, Weedon et al., 2014) as a reference in case no other gridded reference product was provided. As bias correction of gridded products based on gauge-based reference data is a lot less straightforward, this will not be included in this deliverable. The following list gives an overview over the reference data used at different Research Sites.
- RS1 Bergen
- a gridded data product is requested and thus WFDEI dataset is used as reference.
- RS2 Veluwe
- a gridded data product is requested and thus WFDEI dataset is used as reference.
- RS3 Wupper
- a gridded data product is requested and thus WFDEI dataset is used as reference.
- RS4 Badalona
- a gridded data product is requested and thus WFDEI dataset is used as reference.
- RS5 Tagus-River (Portugal)
- a gauge-based data product for the the variables
precipitation (mm/day), daily maximum and minimum near-surface air temperature
(C), surface downwelling
shortwave flux in air (W/m),
wind speed (m/s) and surface air pressure (kPa) at daily resolution is to be provided.
Gauge locations have been provided but not all gauges record all the requested variables.
Consequently, bias correction can only been made for those quantities where reference
has been provided. That is
- Maximum and minimum temperature, wind speed (monthly) for Tapada da Ajuda, Salvaterra de Magos, Dois Portos, Santarem, Alvega, and
- Precipitation (daily) for Vila Nogueira, Moinhola, Canha, Barragem de Magos, Barragem de Montargil, Ota, Marianos, Santarem ESA, Tojeiras de Cima, Bemposta, and Pernes.
- RS6 Troodos-Mountains
- no data products requested.
For most of the Research Sites a seasonally resolved climatology computed from WFDEI is the reference for the climate simulations. This forcing data set is been frequently used in the context of hydrological modeling (e.g., Gudmundsson et al., 2011; Koch et al., 2013; Prudhomme et al., 2014). However, for this data set, Rust et al. (2015) found that due to the way of merging the ERA-Interim reanalysis with a gridded observation-based data product from CRU implausible differences in daily temperatures across boundaries of calender month might arise for some regions. For Europe, however, these differences are insignificant.
For bias correction the following variables of the WFDEI are available: mean/min/max temperature, total precipitation, surface air pressure, near-surface wind speed, long-/shortwave incident radiation and near-surface specific humidity. A bias correction for mean/min/max relative humidity will be done in later stages. WFDEI does not provide vectorial information of the wind nor its directions, thus wind direction cannot be corrected. This is, however, not a problem for most of the cases relevant to BINGO. WFDEI comes on a coarser resolution than the COSMO-CLM simulation used for D2.1. it has thus been interpolated to the grid of the COSMO-CLM.
3.2 Seasonal Generalized Linear Model method
In this Section, the Seasonal Generalized Linear Model bias correction method is described. The underlying idea of the approach is first presented (Sect. 3.2) followed by the concrete application of the approach at the BINGO Research Sites in Sect. 3.2.2.
3.2.1 Underlying principle and modeling approach
The underlying idea of the bias correction applied here is the assumption that the climatological seasonal cycle of both, simulations and observations, is a smooth function of the day of the year. If the simulated seasonal cycle does not match the observed one, it needs to be adjusted. The smooth functions are modeled using a generalized linear model (GLM, McCullagh and Nelder, 1989) with harmonic function (sine and cosine) of the day of the year as predictors. Periodic functions, such as the seasonal cycle, can be always described with a series of harmonic functions(e.g., Priestley, 1992); the more features the cycle has, the higher the order of the series expansion must be. Generalized linear models are not restricted to Gaussian residuals as the standard linear regression is, Residuals can be from any distribution in the exponential family of probability distributions (McCullagh and Nelder, 1989), e.g. Exponential, Gamma, Binomial, Poisson. For our cases particularly interesting distributions are those with positive support for modeling precipitation and other non-negative quantities.
Once the two seasonal cycles have been obtained (modeling step), a difference (or ratio in case of precipitation) is obtained and this is used for adjusting the simulated data (adjustment step).
The model
As a simultaneous treatment of all data is advantageous over a separate treatment of data in different months, the seasonal variations of the variables can be captured by using harmonic functions, as mentioned above. For the generalized linear model, such a description is given in Eq. (3.1) for the expectation value .
(3.1) |
with , being the time variable running over all possible days of the year. For parameter estimation, will be centered at the months of the year; a description of the seasonal cycle is, however, possible at a daily time resolution, thus . The choice of distribution for the residuals (anomalies) does vary with the meteorological parameter considered, the model for the expectation given in Eq. (3.1) remains basically the same.
Distributional assumptions
Precipitation Precipitation is a somewhat particular quantity. It shows a continuous probability distribution for strictly positive values but has a discontinuity at zero. For a statistical description, this is typically captured with a compound model consisting of a Binomial variable for describing dry and wet days and a strictly positive variable (Gamma or Exponential) for the quantity of precipitation on rainy days. Here, we adjust only the amount of precipitation on rainy days and not the distribution of dry and wet days. This is to avoid inconsistencies in the post-processing model simulations, such as precipitation on days with no clouds.
Other variables Table 3.1 gives an overview over all variables and the associated model distributions. The order of the harmonic series expansion (model selection) has been chosen on the reference datasets. Harmonic series to 5th order have been considered and selected with the Bayesian Information Criterion (BIC). The model thus obtained has been used to describe the climate model simulations and the reference data.
variable | long name | distribution |
difference between and | log-Gaussian | |
sum of and | Gaussian | |
mean surface temperature | Gaussian | |
precipitation | gamma | |
near-surface wind speed | log-Gaussian | |
surface air pressure | Gaussian | |
longwave incident radiation | log-Gaussian | |
shortwave incident radiation | log-Gaussian | |
near-surface specific humidity | log-Gaussian | |
Minimum and maximum temperature
Particular care needs to be taken when correction minimum and maximum temperature to avoid inconsistencies such as . Here, a variable transformation given in Eq. (3.2) ensures physical consistency.
After correcting and based on the reference, the corrected values for and can be derived by back-transforming the variables.
3.2.2 At BINGO Research Sites
For reasons of data availability and for a robust fit, we use monthly means to estimate the model parameters (coefficients of the harmonic functions). For some cases, a vector generalized linear model (VGLM, Yee, 2015) has been used. Exemplarily, Fig. 3.1 shows the monthly mean precipitation for the reference stations Vila Nogueira de Azeitao, located on the Research Site Tagus-River (Portugal).
In the modeling step, the seasonality of the model output and the respective reference data are derived using the approach described above. Figure 3.2 shows an example of the two seasonal cycles for precipitation at the station Vila Nogueira de Azeitao on the Research Site Tagus-River (Portugal).
The adjustment step depends on the variable considered: For Gaussian distributions the difference (case 1) and for positive variables (e.g. Gamma or log-normal) the quotient (case 2) of the estimated seasonal cycle for the reference data set and the simulated data set are obtained, see Eq. (3.4).
(3.4) |
Finally, the adjusted (bias corrected) values are calculated by either adding (case 1) or multiplying (case 2) the thus obtained values to the climate model simulations, see Eq. (3.5).
(3.5) |
Figure 3.3 and Figure 3.4 suggest that the precipitation dataset at the station Vila Nogueira de Azeitao in the research site Tagus-River (Portugal) was corrected in general towards lower values.
3.3 Cumulative Distribution Function Transform method
We present here a method, namely the CDF-Transform, which can be perceived as an
extension of the classical quantile-mapping approach (Panofsky and Brier, 1968). This method
has been developed by (Michelangeli et al., 2009) and applied in many climate-related studies
(e.g. (Colette et al., 2012), (Tisseuil et al., 2012), (Vigaud et al., 2013), (Vrac and
Friederichs, 2015)). In the following, we first recap the quantile-mapping method (Sect.
3.3.1) followed by a description of the CDF-Transform method (Sect. 3.3.2) and some
concrete applications of the approach at the BINGO Research Sites (Sect. 3.3.3).
3.3.1 Quantile-Mapping Method
Let stand for the CDF (Cumulative Distribution Function) of a climate random variable (temperature, precipitation, wind, etc.) observed at a given weather station during the historical time period, and for the CDF of the same variable from the model, for the same time period. The idea of quantile-mapping is to correct the distribution function of the modelled climate variable to agree with the observed distribution function:
(3.6) |
The corrected value can be obtained empirically from (see Figure 3.5),
(3.7) |
where , defined from [0,1],
is the inverse function of .
The quantile-mapping method is only suitable when observations are available for the same
time period as for the model output (Vrac and Friederichs, 2015). However, it often happens
that one needs to correct model output that covers a time period longer than that of the
observations. Another case where the classical quantile-mapping is not suited to bias correction
is the correction of model future predictions (where observations are obviously not available).
This method does not take into account the information on the distribution of the future
modelled dataset. The CDF-Transform method is proposed to overcome this potential
issue.
3.3.2 CDF-Transform Method
The CDF-Transform approach (hereafter "CDF-t") can be perceived as an extension of
quantiles-mapping, directly dealing with and providing CDFs (Michelangeli et al., 2009).
Let
stand for the CDF of a climate random variable observed at a given
weather station during the historical time period (training period) and
for the CDF of the same variable from the model output during the same period.
(unknown) and
are the CDFs
equivalent to
and
but for a future (or simply different) time period (see Table 3.2). The main goal
of the CDF-t is to approximate the CDF of the observations in the future period
()
based on historical information and then to apply quantile-mapping between
and
.
Historical period | Future period | |
Observation | (unknown) | |
Model | ||
Assuming that we know (which can be modelled via future model output), and that there exists a transformation T: such that
(3.8) |
The CDT-t method is based on the assumption, which is made by most statistical bias correction approaches, that the transformation is still valid in the future period
(3.9) |
Under this assumption, we can approximate by applying to .
The first step is to model and the simple way to do so is to replace by in (3.8), where is any probability in . We then obtain
(3.10) |
corresponding to the simple definition of T. Inserting (3.10) in (3.8) leads to a modelling of ,
(3.11) |
From a technical/algorithmic point of view, the CDF transform approach is defined in three steps following (Michelangeli et al., 2009) :
- The estimates of , and , respectively , and , are empirically modelled respectively from the historical observations and the historical and future model output data.
- Then, by combining them according to equation (3.11), we dispose of , an estimation of . Note that it is also possible to use parametric CDFs.
- Once and are estimated, quantile-mapping is applied as in Section 3.3.1
Note that, in the equation ( 3.11), is only defined for , where and are respectively the minimum and the maximum of the model outputs in the historical period. Outside , gives the same constant value. As in (D?qu?, 2007) or (Michelangeli et al., 2009) a "constant correction" method is applied whenever is outside e.g. if the maximum value of in the range of is corrected by , all such as is corrected by . It is important to mentioned that the portion of data for which the "constant correction" method is applied is very small.
3.3.3 Application of CDF-t at BINGO research sites
WATCH forcing data ERA-Interim (WFDEI) are used as reference data for all research
sites. The available variables in WFDEI for bias correction are: temperature, total
precipitation, surface air pressure, near-surface wind speed, long/shortwave incident
radiation and near-surface specific humidity. To take care of the seasonality, the
CDF-t is applied separately for each calender month. The calibration period is set to
1980-2013.
Special attention is given to precipitation since it shows a continuous probability
distribution for strictly positive values but has a discontinuity at zero. The CDF-t is applied
for daily precipitation amounts greater than a fixed threshold. The chosen threshold is 0.1mm
for WFDEI and for the model, the threshold is adjusted so that the frequency of wet days is
the same as in the WFDEI.
Figures 3.7 and 3.8 show some results obtained for precipitation at the grid point of
coordinates (longitude=7.46 and latitude=51.11). In figure 3.7, as expected, a good agreement
can be observed between the reference data quantiles and the quantiles of the model after
bias correction. Although CDF-t is based on cumulative distribution function, it is
able to correct also the mean and the variance. Indeed, in Figure 3.8 we compare
the monthly mean (left) and standard deviation (right) of daily precipitation and
there is a good agreement between the reference and the model after CDF-t bias
correction.



Chapter 4
DECO – A BINGO plug-in for FreVa
4.1 FreVa – Freie Universit?t Berlin evaluation system
Freva is the Freie Universit?t Berlin Evaluation Framework for Earth System Science. The fully operational hybrid features a HPC shell access and an user friendly web-interface. It employs one common system with a variety of verification tools and validation data from different projects in- and outside of the FUB. The evaluation system is located at the FUB, the DWD and German Climate Computing Centre (DKRZ), especially this has direct access to the bulk of its ESGF node including millions of climate model data sets, e.g. from CMIP5 and CORDEX. The database is organized by the international CMOR standard using the meta information of the self-describing model, reanalysis and observational data sets. Apache Solr is used for indexing the different data projects into one common search environment. This implemented meta data system with its advanced but easy to handle search tool supports users, developers and their tools to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. Facilitating the provision and usage of tools and climate data increases automatically the number of scientists working with the data sets and identify discrepancies. Additionally, the history and configuration sub-system stores every analysis performed with the evaluation system in a MySQL database. Configurations and results of the tools can be shared among scientists via shell or web-system. Therefore, plugged-in tools gain automatically from transparency and reproducibility. Furthermore, when configurations match while starting a evaluation tool, the system suggests to use results already produced by other users–saving CPU time, I/O and disk space. website: freva.met.fu-berlin.de visitor-login: click on "Guest". A detailled description is currently being prepared for Geoscientific Model Development (Kadow et al., in preparation).
4.2 Documentation of DECO
4.2.1 Introduction
The BINGO-DECO FreVa plug-in produces meteorological and climatological input data for hydrological models within the BINGO project. For a list of models see Tab. 4.1 in Sect. 4.2.2. The plug-in is part of an general evaluation system (Sect. 4.1); a sketch of this system with the BINGO-DECO plug-in highlighted is given in Fig. 4.1.
FreVa is a framework hosting several plug-ins (applications) which have all access to a common data pool. Data in this pool is standardized such that plug-ins can instantly access new data which comes into this pool. The climate data produced for BINGO will be part of this pool. To use the framework, users must register via the FreVa web-page https://freva.met.fu-berlin.de/ and access is granted by the FreVa team. The team has currently a list of BINGO hydrological modelers provided by WP3.
Regardless of the chosen meteorological data set to be processed, the plug-in prepares a specific downloadable standard output according to the selected hydrological model and Research Site respectively (Tab. 4.3). Besides preparing the original meteorological data for hydrological models, the plug-in has an option to bias-correct the data based on different methods beforehand, see Chap. 3 .
This section is structured as follows: Section 4.2.2 describes the preprocessing, Sections 4.2.3 and 4.2.4 give an overview over the input and the output of the BINGO-DECO plug-in, respectively.
4.2.2 Preprocessing
The preprocessing consists basically of a spatial selection of the region/stations, as well as a temporal selection of dates plus a conversion of variables and their associated units. To avoid unnecessary grid remapping, in the first version of the plug-in the spatial selection is applied on the native grid of the chosen meteorological input data. For station data, a nearest neighbor remapping (see cdo -remapnn in the CDO User’s Guide (Schulzweida, 2015)) is applied to get the nearest grid cell of a given longitude/latitude location. For the region selection, a box will be selected (cdo -sellonlatbox) regarding to a defined longitude/latitude rectangle. This definition of rectangle is based on the replies to the hydrological-model-based questionnaires, sent around by BINGO WP3. Results of these questionnaires are given in Tab. 4.1. Note, that due to the use of native grid, the final selected grid boundary is larger and does not exactly match the defined rectangle. Nevertheless, all grid cell centers of the selected native grid are definitively inside the defined rectangle range by users. With finer grids in later project stages, this effect becomes more and more negligible.
4.2.3 Input parameters
Using the first option of the plug-in lets the user choose among the Research Sites and the associated hydrological models combination (Research site and hydrological model). The option Date range specifies the time period to be processed. The parameter Bias correction optionally leads to a subsequent application of a bias correction scheme (Sect. 3 to the simulated data; choose None for no bias correction and output of the native, uncorrected data instead. Bias correction is applied with reference to station data in case these data was made accessible for us and with reference to the WATCH Forcing Data ERA-Interim (WFDEI)(Weedon et al., 2014) otherwise.
The meteorological input data to be chosen for hydrological model can be uniquely addressed by specifying seven parameters. These parameters result from the standardized storage of climate model data according to the CMOR convention also used in the Coupled Model Inter-comparison Project 5 (CMIP5, Taylor et al., 2012). For the deliverable D2.1 these are set by default. For completeness, these options are Dataproject, Dataproduct, Institute and Model providing input data. Further, the Experiment, Ensemble member and Time frequency of provided input data must be specified. Many of these parameters will make sense at a later stage of BINGO.
With the following parameters various technical options can be set: A specific output (Outputdir) and cache directory (Cachedir) can be defined, plus the Output type. By selecting the option Basic only one compressed zip-file containing the hydrological-model-specific-formatted data files will be produced, while Additional will result in an additional NetCDF-file containing all variables, time steps and locations. The subsequent operator Cacheclear allows the cache directory to be cleared and the intermediate data used for processing to be deleted or not, in case one wants to guard these data. In Ntask the number of parallel-processing task can be selected, to optimize the usage of computer resources. Setting the option Dryrun to True gives only a list of the meteorological data chosen. Finally, a caption can be associated with the results produced by this run. The last parameter option Unique output id will prevent you from overwriting an existing result. A full list of all available parameters and their descriptions can be found in Tab. 4.2
4.2.4 Output
The resulting data files are stored in the chosen Outputdir and can be accessed either directly via the web download from the FreVa system or by using secure copy (scp) or secure shell (ssh)1 . The downloadable compressed zip-file which contains the data files formatted as specified for the hydrological model is stored in the Outputdir/ZIP directory. An additionally produced NetCDF files (if Output type is set Additional) are located in the Outputdir/NETCDF directory. Note that, depending on the chosen research site and hydrological model, the files in the two directories could be the same. Some information about the output can be found in Tab. 4.3. The specific output content and format is based on the questionnaire send out by WP3. In case we did not get any feedback a standard output (NetCDF file) will be provided.
Chapter 5
Summary
This document describes the development of a web based application for the extraction and conversion of climate model simulations. Climate model data is extracted from a central data pool, post-processed (bias-corrected, regridded) and converted to a set of meteorological driving data directly usable for hydrological models. This application has been realized as a plug-in to the Freie Universit?t Berlin Evaluation Framework for Earth System Science (FreVa) which can hold various types of evaluation workflows having access to an indexed data pool. Workflows can be accessed via a web-platform or the command line interface. The BINGO-DECO plug-in has been specifically designed for the hydrological models at the six BINGO Research Sites but can in principle be extended to other models. A major advantage is the on-demand post-processing and conversion of the driving data from a standardized climate model data source. Instead of storing and keeping the very same meteorological driving information in different formats, the system holds the conversion routines and generates the data on demand. This is storage efficient and ensures reproducible and transparent results.
Bibliography
Nikolina Ban, Juerg Schmidli, and Christoph Sch?r. Heavy precipitation in a changing climate: Does short-term summer precipitation increase faster? Geophysical Research Letters, 42(4):1165–1172, 2015.
Erwan Brisson, Matthias Demuzere, and Nicole PM van Lipzig. Modelling strategies for performing convection-permitting climate simulations. Meteorologische Zeitschrift, 2015.
Augustin Colette, Robert Vautard, and M Vrac. Regional climate downscaling with prior statistical correction of the global climate forcing. Geophysical Research Letters, 39(13), 2012.
D. P. Dee et al. The ERA-interim reanalysis: configuration and performance of the data assimilation system. Q. J. R. Meteor. Soc., 137(656):553–597, 2011. ISSN 1477-870X. doi: 10.1002/qj.828.
Michel D?qu?. Frequency of precipitation and temperature extremes over france in an anthropogenic scenario: model results and statistical correction according to observed values. Global and Planetary Change, 57(1):16–26, 2007.
L. Gudmundsson, L. M. Tallaksen, K. Stahl, and A. K. Fleig. Low-frequency variability of european runoff. Hydrol. Earth System Sci., 15(9):2853–2869, 2011.
U. Heikkil?., A. Sandvik., and A. Sorteberg. Dynamical downscaling of ERA-40 in complex terrain using the wrf regional climate model. Clim. Dyn., 37(7-8):1551–1564, 2011.
C. Kadow, S. Illing, O. Kunst, T. Schartner, I. Kirchner, H.W. Rust, U. Cubasch, and U Ulbrich. Freva - freie univ evaluation framework for scientific infrastructures in earth system modeling. Geoscientif. Model Develop., in preparation.
Elizabeth J Kendon, Nigel M Roberts, Hayley J Fowler, Malcolm J Roberts, Steven C Chan, and Catherine A Senior. Heavier summer downpours with climate change revealed by weather forecast resolution model. Nature Climate Change, 4(7): 570–576, 2014.
H. Koch, S. Liersch, and F. Hattermann. Integrating water resources management in eco-hydrological modelling. Integrative Water Resource Management in a Changing World: Lessons Learnt and Innovative Perspectives, page 13, 2013.
Jochem Marotzke, Wolfgang A M?ller, Freja SE Vamborg, Paul Becker, Ulrich Cubasch, Hendrik Feldmann, Frank Kaspar, Christoph Kottmeier, Camille Marini, Iuliia Polkova, et al. Miklip-a national research project on decadal climate prediction. Bulletin of the American Meteorological Society, (2016), 2016.
P. McCullagh and J. Nelder. Generalized Linear Models. CRC Press, Boca Raton, Fla, 2 edition, 1989.
Edmund P Meredith, Vladimir A Semenov, Douglas Maraun, Wonsun Park, and Alexander V Chernokulsky. Crucial role of black sea warming in amplifying the 2012 krymsk precipitation extreme. Nature Geoscience, 2015.
P-A Michelangeli, Matthieu Vrac, and H Loukos. Probabilistic downscaling approaches: Application to wind cumulative distribution functions. Geophysical Research Letters, 36(11), 2009.
Wolfgang A Mueller, Johanna Baehr, Helmuth Haak, Johann H Jungclaus, J?rgen Kr?ger, Daniela Matei, Dirk Notz, Holger Pohlmann, JS Storch, and Jochem Marotzke. Forecast skill of multi-year seasonal means in the decadal prediction system of the max planck institute for meteorology. Geophysical Research Letters, 39(22), 2012.
Hans A Panofsky and Glenn Wilson Brier. Some applications of statistics to meteorology. University Park : Penn. State University, College of Earth and Mineral Sciences, 1968.
Holger Pohlmann, Wolfgang A Mueller, K Kulkarni, M Kameswarrao, Daniela Matei, FSE Vamborg, C Kadow, S Illing, and Jochem Marotzke. Improved forecast skill in the tropics in the new miklip decadal climate predictions. Geophysical Research Letters, 40(21):5798–5802, 2013.
Andreas F Prein, Wolfgang Langhans, Giorgia Fosser, Andrew Ferrone, Nikolina Ban, Klaus Goergen, Michael Keller, Merja T?lle, Oliver Gutjahr, Frauke Feser, et al. A review on regional convection-permitting climate modeling: Demonstrations, prospects, and challenges. Reviews of geophysics, 53(2):323–361, 2015.
M. B. Priestley. Spectral Analysis and Time Series. Academic Press, London, 1992.
C. Prudhomme, I. Giuntoli, E. L. Robinson, D. Clark B, N. W. Arnell, R. Dankers, B. M. Fekete, W. Franssen, D. Gerten, S. N. Gosling, et al. Hydrological droughts in the 21st century, hotspots and uncertainties from a global multimodel ensemble experiment. Proc. Nat. Acad. Sci., 111(9):3262–3267, 2014.
B. Rockel, A. Will, and A. Hense. The regional climate model COSMO-CLM (CCLM). Meteorol. Z., 17(4):347–348, 2008.
H. W. Rust, T. Kruschke, A. Dobler, M. Fischer, and U. Ulbrich. Discontinuous daily temperatures in the watch forcing datasets. J. Hydrometeor., 16(1):465–472, 2015.
U. Schulzweida. Climate Data Operators: CDO User’s Guide. MPI for Meteorology, October 2015. URL https://code.zmaw.de/projects/cdo/embedded/cdo.pdf. Version 1.7.0.
Doug M Smith, Stephen Cusack, Andrew W Colman, Chris K Folland, Glen R Harris, and James M Murphy. Improved surface temperature prediction for the coming decade from a global climate model. science, 317(5839):796–799, 2007.
K. E. Taylor, R. J. Stouffer, and G. A. Meehl. An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93(4):485–498, 2012.
Cl?ment Tisseuil, M Vrac, G Grenouillet, AJ Wade, M Gevrey, Thierry Oberdorff, J-B Grodwohl, and S Lek. Strengthening the link between climate, hydrological and species distribution modeling to assess the impacts of climate change on freshwater biodiversity. Science of the total environment, 424:193–201, 2012.
C. Torma, F. Giorgi, and E. Coppola. Added value of regional climate modeling over areas characterized by complex terrain—precipitation over the Alps. J. Geophys. Res.: Atmospheres, 120(9):3957–3972, 2015. 2014JD022781.
N Vigaud, M Vrac, and Y Caballero. Probabilistic downscaling of gcm scenarios over southern india. International Journal of Climatology, 33(5):1248–1263, 2013.
Mathieu Vrac and Petra Friederichs. Multivariate-intervariable, spatial, and temporal-bias correction. Journal of Climate, 28(1):218–237, 2015.
G. P. Weedon, G. Balsamo, N. Bellouin, S. Gomes, M. J. Best, and P. Viterbo. The wfdei meteorological forcing data set: Watch forcing data methodology applied to era-interim reanalysis data. Water Resour. Res., 50(9):7505–7514, 2014.
T. W. Yee. Vector generalized linear and additive models. Springer, 2015.