RelDiag — Attributes Diagram

Igor Kröner$\ast $, Henning W. Rust$\u2020$, Tim Kruschke, Madlen Fischer, Uwe Ulbrich |

Institut für Meteorologie, Freie Universität Berlin |

Andreas Dobler |

Potsdam-Institut für Klimafolgenforschung |

Version from April 9, 2015

### 1 Introduction

A reliability diagram is able to present the evaluation of probabilistic predictions of a dichotomous event including a lot of information in just one plot. It shows the full joint distribution of probability forecasts as well as the relative frequency of observation of the binary predictand (e.g. dry – rain or below threshold – above threshold). Compared to scalar quantities, such as Brier Skill Score (BSS), the reliability diagram allows diagnosis of particular strengths and weaknesses in a veriﬁcation data set.

$$BS=\frac{1}{n}\sum _{k=1}^{n}{\left({y}_{k}-{o}_{k}\right)}^{2}$$ | (1) |

The most common scalar accuracy measures used in the veriﬁcation of probabilistic forecasts is the Brier Score (Equation 1). Obvioulsy the BS exhibits 0 for perfect forecasts and up to 1 for poor forecasts. For a statement according to the reliability a decomposition of the BS into terms of the quantities is shown in Equation 2

$$BS=\frac{1}{n}\sum _{i=1}^{I}{N}_{i}{\left({y}_{i}-{\u014d}_{i}\right)}^{2}-\frac{1}{n}\sum _{i=1}^{I}{N}_{i}{\left({\u014d}_{i}-\u014d\right)}^{2}+\u014d\left(1-\u014d\right)$$ | (2) |

with n events, $I$ unique forecasts – or bins –, ${\u014d}_{i}=p\left(o\mid {y}_{i}\right)$ the observed frequency depending on the forecast probability and $\u014d=\frac{1}{N}{\sum}_{t=1}^{N}{o}_{k}$ climatological occurence rate of the event.

The ﬁrst term of Equation 2 belongs to the “reliability” of the forecast. It is calculated as a weighted average of the squared diﬀerences between the forecast probabilities ${y}_{i}$ an the relative frequencies of the observed event ${\u014d}_{i}$. It is interpretable as a measure how close the forecast probabilities are to the corresponding observed frequencies. In gerneral the lower the better, as the term is interpretable as “violence of reliability”. But low reliability is only of value if the second term – the “resolution” – is high. This term measures how much the observed frequencies diﬀer from the climatological occurence rate of the event in dependence to the forecast probabilities. The “uncertainty” measures the event uncertainty. As it can be seen the regarding term is only based on the observed climatology and therefore maximum when the event occurs 50% of time and zero if the event occurs never or always.

### 2 the diagram

RelDiag takes advantage of the R package ”veriﬁcation” developed at the NCAR to plot attributes diagrams. Attributes diagrams (?) are reliability diagrams extended with useful information like no-resolution or no-skill line. In the following each element of the diagram should be brieﬂy descriebed and discussed.

In this part of the documentation each element in the diagram should get a brief description.

1:1 line The diogonal line or 1:1 line indicates prefect reliability as the observed frequencies of the event equal the forecast probabilities.

”no skill” line The no-skill line is deﬁned through to the Brier Skill Score (BSS) with the climatological prediction as reference. The single point of the climatological prediction would be located at the intersection of the 1:1 line, no-resolution and no-skill line. Since the forecast and the observed relative frequency are both equal to the climatological probability, the climatological forecast has perfect reliability but zero resolution (?). Thus the Brier score (equation 2) for the climatological forecast is the uncertainty. As the BSS is deﬁned as $BSS=1-BS\u2215B{S}_{ref}$, the BSS with climatological forecast as reference results in

$$BSS=\frac{\u201dResolution\u201d-\u201dReliability\u201d}{\u201dUncertainty\u201d}.$$ | (3) |

grey shaded area Points in the grey shaded area, bounded by the no–skill line and the horizontal climatology line, contribute positively to the brier skill score, based on a climatological forecast as reference. For enlarging the area a bias correction suggested by ? has be done. The plotted hyperbola is estimated out of the bias corrected decomposition of the BS, in fact of the bias corrected terms of reliability and resolution. The hyperbola function is

$${o}_{k}=\frac{{y}_{k}^{2}-\alpha}{2{y}_{k}-\beta}$$ | (4) |

reﬁnement diagram The reﬁnement diagram at the bottom corner to the right hand delivers important information about the distribution of predictions, expressing the frequency of occuring predictions. Out of the reﬁnement a statement about the sharpness can be made.

uncertainty Even if the uncertainty is not explicite plotted, it is determainable directly out of the attributes diagram. Its magnitude is deﬁned as the area of the rectangular in the left top as well as in the right bottom enclosed by the horizontal (no resolution line) and vertical climatology line.

the slope The blue line in ﬁgure 1 represents the weighted linear regression. Its slope can be used as a key indicator for usefulness of the probabilistic prediction combining reliability and resolution.

### 3 RelDiag parameters

The RelDiag plug-in shall oﬀer the possibility to plot an attributes diagram. This is an extended reliability diagram, which compares the forecasted probability to the observed relative frequency.

In fact there are two possibilities to run the tool. First, with your own prepared prediction and observation data. Second, with the “leadtimeselector” in extension.

#### 3.1 data

RelDiag works with own data as well as with data searched out of the MiKlip system using the “leadtimeselector”. If you like to use the leadtimeselector it needs to be set to “TRUE”.

- LEADTIMESELECTOR (mandatory, True or False)

##### 3.1.1 own data

- mandatory if leadtimeselector=False
- FILEHEAD
- REDHEAD

Here the LEADTIMESELECT0R is need to be set to ’False’. For input one NetCDF ﬁle for each ensemble member declared with a preﬁx (FILEHEAD) and containing the variable as a time series for the lead time and time period of interest is needed. Same for the reference data (REFHEAD).

##### 3.1.2 leadtimeselector

See documentation “leadtimeselector”

- mandatory if leadtimeselector=True
- VARIABLE
- MODEL
- PROJECT
- INSTITUTE
- PRODUCT
- ENSEMBLES
- LEADTIMES
- TIME FREQUENCY
- DECADALS
- OBSERVATION

If the LEADTIMESELECTOR is set to ’True’ a connection to the same-named CES plug-in is realised.
In routine the access to the MiKlip database is anabled with setting the parameters VARIABLE,
MODEL, PROJECT, INSTITUTE, PRODUCT, ENSEMBLES, LEADTIMES, TIME FREQUENCY, DECADALS
and OBSERVATION.

To guarantee stability of RelDiag, it is not a direct link but an outchecked version of the
leadtimeselector-tool. This does not inﬂuence the work of RelDiag but its further
developement. With ongoing progress of the leadtimeselector, further abilities in RelDiag are
expected.

#### 3.2 lonlatbox

- LONLATBOX (mandatory, -180,180,-90,90 (default)) For each ensemble member and reference a ﬁeld mean is calculated over this given region.

#### 3.3 threshold

- THRESHOLD (mandatory)

The threshold is calculated with a CDO [?] based syntax separately for model and reference. Cdo command for percentile calculation could be runpctl,50,TSTEPS IFILE for median. TSTEPS is indicator for the sum of time steps in the ﬁle used for calculation. If the string IFILE is used the model threshold is estimated with respect to all ensemble members of the prediction. Setting the threshold to a constant value is done by const,VALUE,IFILE with VALUE as a self deﬁned value.

The dichotomous event is estimated by an “greater than”.

#### 3.4 binning

- BINS (mandatory, integer, default $=0$)

Should probabilities be binned or treated as unique predictions? The larger the ensemble size, the more important is binning. No binning will be done, if set to 0 (default).

#### 3.5 reﬁnement

- REFINEMENT (mandatory, histogram, pointsize, both (default) or numbers)

This option deﬁnes a graphical parameter weather the reﬁnement

### 4 workﬂow

### 5 Things to consider

- up to now only three dimensional ﬁelds (lon x lat x time) are supported. Some kind of levelselector will be integrated soon
- there is no explicite bias-correction included! The bias is removed implicite with separatley calculated thresholds for model and reference. For thresholds out of averaging (timmean, etc.) normal distributed data is needed to remove the bias completly. Out of this it is adviced to use percentile-based thresholds.
- Model drifts are neglected at the moment
- at the moment there are two further development stages. One wich takes spatial pooling into account, to increase the sampling size. The other aims to analyse the reliability on a grid regarding the slope of reliability in the attributes diagram (?)