Visualise Mortality Improvements
Click here to download the R-code (R Markdown notebook with text and code; to be opened in RStudio), and here to download the corresponding rendered html document with properly formatted text and all results from the code (includes interactive figures). Note that you will have to download the file (use download button in the top right corner) and open it in a browser (Chrome, Firefox, Edge, Internet Explorer, ...), as the Google Drive preview will show the file as text document. Be aware that your firewall may block the download of html/Rmd files.
For more information on R Markdown, see https://rmarkdown.rstudio.com/lesson-1.html.
1. Problem Statement
The visualisation of mortality rates and mortality improvements can be tricky due to the fairly high number of dimensions of interest (age, time, gender, country, etc.). Moreover, traditional visualisations usually illustrate either mortality rates, or mortality improvements, but rarely both together. We propose some approaches on how the various dimensions can be appropriately reflected in plots of mortality rates/improvements, and how it is possible to combine rates and improvements in a visually appealing way.
2. Suggested Approach
We suggest the use of different types of visualisations to deliver a full picture of mortality rates and developments. Figs. 2 and 4 are interactive – we suggest viewing them in the attached html file.
- Heatmaps (Fig. 1): These are often used to depict mortality improvement rates, and provide an excellent overview of the age- and time-dependence of improvements. Alternatively, a 3d surface plot can be used (Fig. 2).
- "Trajectory plots” (Figs. 3 & 4): These are very common e.g. in physics, however they are rather unconventional in the actuarial field. The idea is to plot the development of a variable as a function of time in the form of a trajectory with the current value of the variable on the x-axis, and the rate of change on the y-axis (mechanics: space on the x-axis and velocity or momentum on the y-axis).
- Combined line/scatter/bar plots (Fig. 5) for a more conventional illustration. In combination with faceting and a well-designed plot arrangement, a lot of information can be packed into these plots.
Fig. 1: Heatmap with annual mortality improvement values.
Fig. 2: (Interactive) two-dimensional surface representation of mortality improvements.
Fig. 3: Trajectories with mortality rates and improvements for females of four selected countries.
Fig. 4: (Interactive) trajectories for 10 developed countries (UK, Japan, France, Italy, Span, Sweden, Switzerland, Austria, Norway, USA) and both genders. The grey band covers approx. 80% of the data.
Fig. 5: Mortality rates, improvements, and population exposure for Japan.
3. Rationale and Commentary
Heatmaps
These are powerful visualisations and particularly handy in the context of mortality improvements as a function of time and age (gender and country fixed). Quite often, diagonal structures appear in these heatmaps, which can point to cohort effects or data errors (e.g. exposure miscounts). Both observations (cohorts / data issues) are highly relevant for actuarial applications.
The mortality (improvement) surface in time-age space typically forms the basis for projecting mortality rates into the future. In particular, the heatmaps are used to set the starting (i.e. current) improvement rates for these projections. For example, if improvements in the latest year appear too high or too low (which is often due to boundary effects in the underlying modelling), one may go back one or two years and start the projection at these older improvement rates.
The mortality (improvement) surface in time-age space typically forms the basis for projecting mortality rates into the future. In particular, the heatmaps are used to set the starting (i.e. current) improvement rates for these projections. For example, if improvements in the latest year appear too high or too low (which is often due to boundary effects in the underlying modelling), one may go back one or two years and start the projection at these older improvement rates.
Trajectories
Visualising mortality rates or mortality improvements alone without combining the two leaves various interesting questions open. For example:
- Country X currently shows much higher improvements than country Y. Is this a “catch-up” effect, i.e. country X still has lower current mortality rates, and there is a lot of room for improvement? Or is it a genuinely different development patterns for two countries with a similar current mortality profile?
- Can we identify a common trend across all (developed) countries, and infer a long-term improvement rate?
Trajectories help addressing such questions, and moreover give a strong visual impression. For example, it becomes evident how the 1970s and (even more pronounced) the 1990s saw detrimental changes to mortality rates and life expectancy in Russia. In the visualisation, the negative improvements make the trajectories go in the “wrong” x-direction, and one can easily see and understand how many years of positive improvements it takes to recover from such a crisis. Note that in our visualisation, the x-axis is reversed – this makes it more natural to understand the development of a country (movement from left to right, with decreasing mortality rates).
Plotting a multitude of countries (and genders) together may help dealing with the second question, which is whether we should assume a long-term improvement rate, and if yes, what value it should take. The plot above with a multitude of countries and both genders would suggest that there is an overall downward trend in improvements for those countries that are already well-advanced on their improvement journey. Yet, of course we do not know what will drive mortality improvements in the future, i.e. the next 30 years may be entirely different to the past.
Combined line/scatter/bar plots
These more conventional illustrations, when designed well, can convey a lot of information. One nice feature is shown in the third row: the visualisation of a multitude of densities on top of each other (here x- and y-axes are flipped, hence the curves are arranged horizontally). Such plots are sometimes called ridgeline plots.
For the example presented here, arranging mortality rates and improvements vertically helps in linking the two (steep curve in the top plot = high improvements in the bottom plot). In the top row, the combined plotting of raw data as crosses and the fitted values as lines adds a lot of transparency with respect to the fitting process. As a third row, we add the population exposure (how many people are alive in a certain year for a given age), which gives some additional information about the shift in the population structure.
Comments on the data and pre-processing
We use deaths and exposure data from the publicly available Human Mortality Database (https://www.mortality.org/). A key pre-processing step is smoothing – without it, almost nothing of interest would be visible in the plots above. We use a 2-dimensional (Year + Age) spline-smoothing approach, in the form of a general additive model (GAM) with “thin-plate” regression splines. We make use of the easy-to-use implementation of GAMs in the R-package “mgcv” (function mgcv::gam). Note that the results above can be quite sensitive to the assumptions (ages and years used, # degrees of freedom for the splines, …).
Plotting a multitude of countries (and genders) together may help dealing with the second question, which is whether we should assume a long-term improvement rate, and if yes, what value it should take. The plot above with a multitude of countries and both genders would suggest that there is an overall downward trend in improvements for those countries that are already well-advanced on their improvement journey. Yet, of course we do not know what will drive mortality improvements in the future, i.e. the next 30 years may be entirely different to the past.
Combined line/scatter/bar plots
These more conventional illustrations, when designed well, can convey a lot of information. One nice feature is shown in the third row: the visualisation of a multitude of densities on top of each other (here x- and y-axes are flipped, hence the curves are arranged horizontally). Such plots are sometimes called ridgeline plots.
For the example presented here, arranging mortality rates and improvements vertically helps in linking the two (steep curve in the top plot = high improvements in the bottom plot). In the top row, the combined plotting of raw data as crosses and the fitted values as lines adds a lot of transparency with respect to the fitting process. As a third row, we add the population exposure (how many people are alive in a certain year for a given age), which gives some additional information about the shift in the population structure.
Comments on the data and pre-processing
We use deaths and exposure data from the publicly available Human Mortality Database (https://www.mortality.org/). A key pre-processing step is smoothing – without it, almost nothing of interest would be visible in the plots above. We use a 2-dimensional (Year + Age) spline-smoothing approach, in the form of a general additive model (GAM) with “thin-plate” regression splines. We make use of the easy-to-use implementation of GAMs in the R-package “mgcv” (function mgcv::gam). Note that the results above can be quite sensitive to the assumptions (ages and years used, # degrees of freedom for the splines, …).
4. Applicability and Alternatives
The plots presented above can also be used in generic contexts.
Heatmaps are commonly used whenever there is a scalar function of interest which depends on two variables (or more, but the rest of them are kept constant). As an alternative (or in addition) to colour coding, the z-coordinate can be represented by contour lines. Classical application areas are bivariate probability distributions, or fitness landscapes in optimisation problems.
Trajectories are common whenever one studies dynamical systems (where a rate of change is interesting). One can also relax the assumption of a rate of change on the y-axis and use an independent variable as y-coordinate; these visualisations can be used as alternatives to plots with secondary axes (see https://blog.datawrapper.de/dualaxis/, Solution 4).
Scatter plots, line charts or bars form the backbone of data visualisation, and can be used in almost any context. Quite often it is worth thinking about particularly well-suited arrangements and combinations of such plots (vertical/horizontal arrangements, or using insets).
5. Implementation
See the attached R-Markdown file for the full code for this blog post, and the HTML-file for a rendered version including the interactive visualisations.
We recommend the use of R and the package ggplot2. In particular, the use of the following commands:
- geom_raster/geom_tile for heatmaps
- geom_point, geom_path, geom_line for tractories and line/scatter plots
For the interactive visualisations, we have used the package plotly (also available for python, e.g.). Plotly has quite a different syntax to ggplot2, and unfortunately, the two packages do not (yet) work well together.
6. Context
7. Tags
Trajectories
Trajectories illustrate the evolution of a variable and its rate of change, here mortality rates and annual improvement values.
Heatmap
2-dimensional representations, here used for mortality improvements as a function of age and year
Interactive
Interactive visualisations are used to facilitate the visualisation of a larger amount of data.





No comments:
Post a Comment