Find data: Guardian data

The Guardian publishes many of its datasets here:

 There is a great deal of data there that you could use for visualisations in Silk. I’ve chosen this one as a emonstration of what can be done:

British railway station usage, or, How busy is your station?

Go to this link:

Export as a excel file, and save to your desktop.

Filter data

Create a new data story in Silk and drag the Excel file into Silk, following the steps outlined in previous modules.

You’ll be told there are several files/pages and you must choose one. Choose the most recent period we have data for.

Because this data is presented in many more column-categories than the material we looked at in Project 1, there is a great deal more variation possible in how we sift and visualise it.

You can compare data from two time periods, and look at the percentage change in usage between them.

You’ll be asked how you want to sort the data.

Choose by station name.

Run down the list of items and weed out any that are not relevant. You can delete these by clicking on the gear wheel alongside the line and selecting Ignore. It will be removed.

Now click to import. This is a large dataset and will be turned into around 3,000 datacards.

Once the data has loaded, you’ll see several columns that list entries, exits and other sub-divisions of the total usage. You can play with these columns, either keeping them all or removing some.

Delete any that are irrelevant to the visualisations you plan to make. Consider what the essential information is that you want to convey, and cut anything non-essential.

Note: Make sure you set the ‘Maximum items’ at a high enough figure to get all stations in, otherwise you’ll find some towards the end of the alphabet left out. You’ll find ‘Maximum items’ under the ‘More options’ tab.

You could, for instance, focus on total usage of stations in 2010-11 and 2011-12 and the percentage change between them. You do that by clicking the ‘Add column’ dropdown and then selecting the filter you want to apply to that column.

Be sure to add columns in the order you want them to appear.

So, with this example, I have added the 2010-11 traffic, then the 2011-12 traffic, then the percentage change between those two time periods.

Sometimes, it’s only when you play with data like this that you can see important patterns within it.

How come Armadale has seen such an increase in traffic?

in 3 armadale

Google it and you get the probable answer:

Visualise data

I’ve made all those changes under the table style of visualisation. If I click on another of the categories within Explore I can create a different set of filters.

You can also add inline filters. If you add to have the stations mapped then you’ll be able to create geographically-based filters.

It’s worth looking at Silk’s explanation for what visualisation works best for which type of data. Their tutorials are here

They say:

  • Use lists and tables to show simple ranking of data;
  • Use maps for location-specific data;
  • Use scatter plots to show the relationship between two numbers;
  • Use donut and pie charts for showing proportions and distributions;
  • Use vertical column charts to compare a few items;
  • Use horizontal bar charts to compare many items;
  • Use stack charts to show relative components as part of a whole;
  • Use grids and mosaics to show images; and
  • Use groups to organize items sharing a characteristic or property.

Look at the map visualisation and you’ll see a pin for each station, and when you click on it you get information on how many people use it.

station locations

Click on the ‘Colour by’ dialogue and you can add a visual indication of how busy that station is.

Here’s that information published as a silk:


Next: Project 4:  Guardian data and Silk visualisation – World population by country