pyam
an open-source Python package
for IAM scenario analysis

Daniel Huppmann, IIASA, huppmann@iiasa.ac.at

The pyam package is available at github.com/IAMconsortium/pyam

The package was developed by Matthew Gidden and Daniel Huppmann.
It is released under an APACHE 2.0 Open-Source license.

The presentation is based on a talk by Matthew Gidden given at IAMC 2017, Recife, Brazil and the tutorial notebooks of the pyam package.

Creative Commons License

This presentation is licensed under
a Creative Commons Attribution 4.0 International License.

Diagnostics, analysis and visualization tools
for Integrated Assessment timeseries data

First steps with the pyam package

The pyam package provides a range of diagnostic tools and functions
for analyzing and working with IAMC-style timeseries data.

The package can be used with data that follows the data template convention of the Integrated Assessment Modeling Consortium (IAMC). An illustrative example is shown below; see data.ene.iiasa.ac.at/database for more information.

model scenario region variable unit 2005 2010 2015
MESSAGE V.4 AMPERE3-Base World Primary Energy EJ/y 454.5 479.6 ...
... ... ... ... ... ... ... ...

Features of the pyam package

Validation, diagnostics and sanity checks of the data

Visualization and plotting tools

Categorization of scenarios and creation of metadata indicators

Source of tutorial data

The timeseries data used in this tutorial is a partial snapshot of the scenario database compiled for the IPCC's Fifth Assessment Report (AR5):

Krey V., O. Masera, G. Blanford, T. Bruckner, R. Cooke, K. Fisher-Vanden, H. Haberl, E. Hertwich, E. Kriegler, D. Mueller, S. Paltsev, L. Price, S. Schlömer, D. Ürge-Vorsatz, D. van Vuuren, and T. Zwickel, 2014: Annex II: Metrics & Methodology.
In: Climate Change 2014: Mitigation of Climate Change. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Edenhofer, O., R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K. Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier, B. Kriemann, J. Savolainen, S. Schlömer, C. von Stechow, T. Zwickel and J.C. Minx (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA. Link

The complete AR5 scenario database is publicly available at tntcat.iiasa.ac.at/AR5DB/.

Scientific references for selected tutorial data

The data snapshot used for this tutorial consists of selected data from two model intercomparison projects:

  • Energy Modeling Forum Round 27 (EMF27), see the Special Issue in Climatic Change 3-4, 2014.
  • EU FP7 project AMPERE, see the following scientific publications:
    • Riahi, K., et al. (2015). "Locked into Copenhagen pledges — Implications of short-term emission targets for the cost and feasibility of long-term climate goals." Technological Forecasting and Social Change 90(Part A): 8-23.
      DOI: 10.1016/j.techfore.2013.09.016
    • Kriegler, E., et al. (2015). "Making or breaking climate targets: The AMPERE study on staged accession scenarios for climate policy." Technological Forecasting and Social Change 90(Part A): 24-44.
      DOI: 10.1016/j.techfore.2013.09.021

The data used in this tutorial is ONLY a partial snapshot
of the IPCC AR5 scenario database!
This tutorial is only intended for an illustration of the pyam package.

Import package and load data from the AR5 tutorial csv snapshot file

First, we import the pyam package and load the timeseries data snapshot from the file tutorial_AR5_data.csv in the pyam/tutorial folder.

In [1]:
%matplotlib inline
import pyam
In [2]:
data = '../../pyam/tutorial/tutorial_AR5_data.csv'
df = pyam.IamDataFrame(data=data)
INFO:root:Reading `../../pyam/tutorial/tutorial_AR5_data.csv`

What's in our dataset?

As a first step, we use a number of functions to find out what is included in the snapshot.

In [3]:
df.models()
Out[3]:
0    AIM-Enduse 12.1
1           GCAM 3.0
2          IMAGE 2.4
3        MERGE_EMF27
4        MESSAGE V.4
5         REMIND 1.5
6        WITCH_EMF27
Name: model, dtype: object
In [4]:
df.scenarios()
Out[4]:
0             AMPERE3-450
1         AMPERE3-450P-CE
2         AMPERE3-450P-EU
3             AMPERE3-550
4         AMPERE3-550P-EU
5     AMPERE3-Base-EUback
6       AMPERE3-CF450P-EU
7          AMPERE3-RefPol
8          EMF27-450-Conv
9         EMF27-450-NoCCS
10       EMF27-550-LimBio
11    EMF27-Base-FullTech
12          EMF27-G8-EERE
Name: scenario, dtype: object
In [5]:
df.regions()
Out[5]:
0      ASIA
1       LAM
2       MAF
3    OECD90
4       REF
5     World
Name: region, dtype: object
In [6]:
df.variables(include_units=True)
Out[6]:
variable unit
0 Emissions|CO2 Mt CO2/yr
1 Emissions|CO2|Fossil Fuels and Industry Mt CO2/yr
2 Emissions|CO2|Fossil Fuels and Industry|Energy... Mt CO2/yr
3 Emissions|CO2|Fossil Fuels and Industry|Energy... Mt CO2/yr
4 Price|Carbon US$2005/t CO2
5 Primary Energy EJ/yr
6 Primary Energy|Coal EJ/yr
7 Primary Energy|Fossil|w/ CCS EJ/yr
8 Temperature|Global Mean|MAGICC6|MED °C

A first look at the data

We use the temperature outcome as the first variable of interest in our data snapshot.

In [7]:
v = 'Temperature|Global Mean|MAGICC6|MED'
df.filter({'region': 'World', 'variable': v}).line_plot(legend=False)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x111b3ba90>

Categorization of scenarios

We use the temperature outcome as a first criteria for categorization of scenarios.

The function categorize() assigns all scenarios fulfilling a number of criteria to a specific category. The function metadata() applies a categorization to all scenarios.

In [8]:
df.metadata(meta='uncategorized', name='temperature')
In [9]:
df.categorize(
    'temperature', 'Below 1.6C',
    criteria={v: {'up': 1.6, 'year': 2100}},
    color='cornflowerblue'
)
INFO:root:4 scenarios categorized as `temperature: Below 1.6C`
In [10]:
df.categorize(
    'temperature', 'Below 2.0C',
    criteria={v: {'up': 2.0, 'lo': 1.6, 'year': 2100}},
    color='forestgreen'
)
INFO:root:8 scenarios categorized as `temperature: Below 2.0C`
In [11]:
df.categorize(
    'temperature', 'Below 2.5C',
    criteria={v: {'up': 2.5, 'lo': 2.0, 'year': 2100}},
    color='gold'
)
INFO:root:16 scenarios categorized as `temperature: Below 2.5C`
In [12]:
df.categorize(
    'temperature', 'Below 3.5C',
     criteria={v: {'up': 3.5, 'lo': 2.5, 'year': 2100}},
     color='firebrick'
)
INFO:root:3 scenarios categorized as `temperature: Below 3.5C`
In [13]:
df.categorize(
    'temperature', 'Above 3.5C',
    criteria={v: {'lo': 3.5, 'year': 2100}},
    color='magenta'
)
INFO:root:9 scenarios categorized as `temperature: Above 3.5C`

Checking for uncategorized scenarios

In [14]:
df.filter({'temperature': 'uncategorized'})[['model', 'scenario']]\
    .drop_duplicates()
Out[14]:
model scenario
0 AIM-Enduse 12.1 EMF27-450-Conv
23 AIM-Enduse 12.1 EMF27-450-NoCCS
46 AIM-Enduse 12.1 EMF27-550-LimBio
69 AIM-Enduse 12.1 EMF27-Base-FullTech
92 AIM-Enduse 12.1 EMF27-G8-EERE
590 WITCH_EMF27 EMF27-450-Conv
613 WITCH_EMF27 EMF27-550-LimBio
636 WITCH_EMF27 EMF27-Base-FullTech

The pyam package includes the function require_variable() to check a-priori whether a variable exists. The option exclude: True marks these scenarios as "exclude" in the metadata, so that they can be easily removed from further analysis.

In [15]:
df.require_variable(variable=v, exclude=True)
INFO:root:8 scenarios do not include required variable `Temperature|Global Mean|MAGICC6|MED`, marked as `exclude: True` in metadata
Out[15]:
model scenario
0 AIM-Enduse 12.1 EMF27-450-Conv
1 AIM-Enduse 12.1 EMF27-450-NoCCS
2 AIM-Enduse 12.1 EMF27-550-LimBio
3 AIM-Enduse 12.1 EMF27-Base-FullTech
4 AIM-Enduse 12.1 EMF27-G8-EERE
5 WITCH_EMF27 EMF27-450-Conv
6 WITCH_EMF27 EMF27-550-LimBio
7 WITCH_EMF27 EMF27-Base-FullTech

Plotting the temperature outcome again using the categorization

We repeat the plot, this time excluding the uncategorized scenarios and using the 'temperature' metadata column to assign colors. The colors of the individual categories were defined in the function categorize() above.

In [16]:
df.filter({'variable': v, 'exclude': False})\
    .line_plot(color='temperature')
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x111d7add8>

Using the categorization to analyse other variables

We now plot the timeseries data of the 'Primary Energy' variable, using the color-coding of the Temperature categorization to analyse the correlation between energy consumption and warming.

In [17]:
df.filter({'variable': 'Primary Energy', 'exclude': False})\
    .line_plot(color='temperature')
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x1124030f0>

Filtering scenarios by Primary Energy in the base year

To get clearer understanding of the relationship between Primary Energy and Warming, we focus on only those scenarios that have similar levels of Primary Energy in the base year (2010).

We first use the function validate() to check that certain values are within a given range,

In [18]:
df.validate(criteria={'Primary Energy': {'lo': 400, 'year': 2010}}).head()
INFO:root:104 of 6622 data points to not satisfy the criteria
Out[18]:
model scenario region variable unit year value
672 AIM-Enduse 12.1 EMF27-450-Conv REF Primary Energy EJ/yr 2010 52.61
666 AIM-Enduse 12.1 EMF27-450-Conv MAF Primary Energy EJ/yr 2010 50.12
660 AIM-Enduse 12.1 EMF27-450-Conv ASIA Primary Energy EJ/yr 2010 168.75
669 AIM-Enduse 12.1 EMF27-450-Conv OECD90 Primary Energy EJ/yr 2010 202.29
663 AIM-Enduse 12.1 EMF27-450-Conv LAM Primary Energy EJ/yr 2010 31.42

Assigning valid scenarios to a new category

We assign those scenarios that have a Primary Energy level above 400 EJ/y in 2010 to a new category, and then re-display the previous figure including only these scenarios.

In [19]:
df.metadata(meta='uncategorized', name='PE')
In [20]:
df.categorize(name='PE', value='high',
              criteria={'Primary Energy': {'lo': 400, 'year': 2010}})
INFO:root:24 scenarios categorized as `PE: high`
In [21]:
df.filter(
    {'variable': 'Primary Energy', 'exclude': False, 'PE': 'high'})\
    .line_plot(color='temperature')
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x11259f048>

Highlighting particular models and scenarios

Next, we want to check how one particular model behaves within an ensemble of scenarios.

In [22]:
df.metadata(meta='uncategorized', name='model_family')
In [23]:
pyam.categorize(
    df, filters={'model': 'MESSAGE*'}, name='model_family', value='MESSAGE',
    criteria={'Primary Energy': {'lo': 400, 'year': 2010}},
    marker='o')
INFO:root:4 scenarios categorized as `model_family: MESSAGE`
In [24]:
from pyam.plotting import run_control
rc = run_control()
rc.update({'marker': {'model_family': {'uncategorized': None}}})
In [25]:
df.filter(
    {'variable': 'Primary Energy', 'exclude': False, 'PE': 'high'})\
    .line_plot(color='temperature', marker='model_family')
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x112809c50>

And just for the fun of it, let's add scenario linestyles, too...

In [26]:
df.filter(
    {'variable': 'Primary Energy', 'exclude': False, 'PE': 'high'})\
    .line_plot(color='temperature', marker='model_family',
               linestyle='scenario', legend=True)
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1125cd4a8>

Further analysis using metadata

Rather than plotting the development over time, it is often useful to extract and visualize key indicators. In this example, we determine the year of peak warming and plot this indicator against the cumulative CO2 emissions from 2010 until that year.

In [27]:
def peak_warming(x, peak_year=False):
    peak = x[x == x.max()]
    if peak_year:
        return peak.index[0]
    else:
        return float(max(peak))
In [28]:
mean_temperature = df.filter(filters={'variable': v}).timeseries()
In [29]:
df.metadata(
    mean_temperature.apply(peak_warming, raw=False, axis=1),
    'median warming at peak')
In [30]:
df.metadata(
    mean_temperature.apply(peak_warming, peak_year=True, raw=False, axis=1),
    'year of peak warming')
In [31]:
co2 = df.filter({'region': 'World', 'variable': 'Emissions|CO2'})\
    .timeseries() / 1000
In [32]:
df.metadata(
    co2.apply(lambda x:
              pyam.cumulative(x, first_year=2010,
                              last_year=df.meta.loc[x.name[0:2],
                                                    'year of peak warming']),
              raw=False, axis=1),
    'cumulative CO2 emissions (2010 to peak warming)')
In [33]:
df.filter({'exclude': False}).\
    scatter(x='cumulative CO2 emissions (2010 to peak warming)',
            y='median warming at peak')
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x112c1d4a8>
In [34]:
df.filter({'exclude': False}).\
    scatter(x='cumulative CO2 emissions (2010 to peak warming)',
            y='median warming at peak',
            color='temperature')
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1138b6a20>
In [35]:
df.filter({'exclude': False}).\
    scatter(x='cumulative CO2 emissions (2010 to peak warming)',
            y='median warming at peak',
            color='temperature', marker='model_family')
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x1128e6ac8>

We had previously defined the marker for the scenarios not categorized by model family as None, so no marker is shown in the previous scatterplot.

We can easily reset that marker as illustrated below.

In [36]:
rc.update({'marker': {'model_family': {'uncategorized': '*'}}})
In [37]:
df.filter({'exclude': False}).\
    scatter(x='cumulative CO2 emissions (2010 to peak warming)',
            y='median warming at peak',
            color='temperature', marker='model_family')
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x113b62c18>

Looking at regional disaggregation for a particular scenario

We use the 'EMF27-550-LimBio' scenario from the MESSAGE model to more closely look at the regional break

In [38]:
df_ = df.filter({'model': 'MESSAGE*', 'scenario': 'EMF27-550-LimBio',
                 'variable': 'Emissions|CO2'})
In [39]:
df_.filter({'region': 'World'}, keep=False).bar_plot(bars='region')
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x113d75320>
In [40]:
df_.filter({'region': 'World'}, keep=False).\
    filter({'year': 2010}).pie_plot(category='region')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x113f132e8>
In [41]:
df_.filter({'region': 'World'}, keep=False).\
    filter({'year': 2050}).pie_plot(category='region')
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x1140d0e10>
In [42]:
df_.filter({'region': 'World'}, keep=False)\
    .filter({'year': 2100}).pie_plot(category='region')
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x11411cef0>

And finally - who doesn't like maps?

This feature is work in progress. The following figure is based on the unit test and the CEDS Harmonization work by Matt Gidden.

In [45]:
import matplotlib.pyplot as plt
import cartopy
fig, ax = plt.subplots(
    subplot_kw={'projection': cartopy.crs.PlateCarree()}, figsize=(10, 7))
df.map_regions('iso').region_plot(ax=ax, cbar=False)
Out[45]:
<cartopy.mpl.geoaxes.GeoAxesSubplot at 0x1143bca58>

Summary, conclusions, outlook

The pyam package ...

1. is a toolbox for AUTOMATED sanity checks and diagnostics of scenarios

2. allows efficient analysis of scenarios in model comparison exercises

3. provides a number of 'out-of-the-box' visualization tools

We hope that the package will develop into a valuable resource
for the energy modeling and integrated assessment community!

Please send suggestions or contribute to the package development on GitHub!

Find out more on github.com/IAMconsortium/pyam