An Age Pyramid in Altair

3 minute read

Charting an Age Pyramid in Altair

Altair is a declarative statistical visualization library for Python, based on

Vega and Vega-Lite, and the source is available on GitHub.

import altair as alt
from vega_datasets import data
from altair.expr import datum, if_

If you are running this code in a Jupyter notebook (as opposed to a JupyterLab book),

uncomment the next cell and run it to enable rendering in the notebook session. ###

# alt.renderers.enable('notebook')

If you are using a notebook and fail to run this cell, the following error is displayed:

<VegaLite 2 object> If you see this message, it means the renderer has not been properly enabled for the frontend that you are using. For more information, see https://altair-viz.github.io/user_guide/troubleshooting.html

Perhaps, you simply forgot?

If so, you may still run into trouble, as I did when I switch to the Jupyter notebook. When you run the cell, you may get this other message:

ValueError: To use the ‘notebook’ renderer, you must install the vega package and the associated Jupyter extension. See https://altair-viz.github.io/getting_started/installation.html for more information.

Since I had installed Altair for Jupyter only, I needed to install the missing components in my local environment: conda install -c conda-forge notebook vega


This is a correction of the Gallery example, which renders an inverted age pyramid.

pop = data.population()

# Get the min and max of the slider tool from the dataset:
slider = alt.binding_range(min=pop.year.min(), max=pop.year.max(), step=10)

# If name is None or not given, the default slider title of "selector<nnn>" will be used;
# Note 1: The <nnn> portion change as per the number of time the chart has been refreshed.
# Note 2: name (or default string) is automatically concatenated with "_" (?) and fields.
# Note 3: To my knowldege, the slider does not take an initial value, which could be min by default,;
#             Instead, the initial position seems to be the middle of the range, but not quite.
#             Also, the initial position is not labeled.

select_year = alt.selection_single(name='Select', fields=['year'], bind=slider)

base = ( alt.Chart(pop).add_selection(select_year)
                       .transform_filter(select_year)
                       .transform_calculate(gender=if_(datum.sex == 1, 'Male', 'Female')) )

title = alt.Axis(title='population')
color_scale = alt.Scale(domain=['Male', 'Female'], range=["steelblue", "salmon"])


# Try this: change alt.Y with alt.X, and keep all else the same: there should not be any difference.
# My guess is that these are methods for encoding axes, so the assignment does not really matter:
# its the lower case x and y assigned to them that matter.

left = ( base.transform_filter(datum.gender == 'Female')
             .encode(y=alt.Y('age:O', axis=None, sort='descending'),
                         x=alt.X('sum(people):Q', 
                         axis=title,
                         sort=alt.SortOrder('descending')),
                         color=alt.Color('gender:N', scale=color_scale, legend=None))
              .mark_bar().properties(title='Female') )

middle = base.encode(y=alt.Y('age:O', axis=None, sort='descending'),
                                 text=alt.Text('age:Q')).mark_text().properties(width=20, title='Age')

right = ( base.transform_filter(datum.gender == 'Male')
              .encode(y=alt.Y('age:O', axis=None, sort='descending'),
                          x=alt.X('sum(people):Q', axis=title),
                          color=alt.Color('gender:N', scale=color_scale, legend=None))
               .mark_bar().properties(title='Male') )

# Concatenate the three charts horizontally, same as using alt.hconcat(left, middle, right):
left | middle | right

Age Pyramid