Introduction
In this masterclass we offer a new range of data journalism projects,
with step-by-step instructions on completing them.
The first is designed to replace the project found in module 13B4 of the
print and ebook versions of Multimedia Journalism.
A replacement has become necessary because Many Eyes, the platform we
used there to turn our data into a range of visualisations, has now been
withdrawn by its makers, IBM.
The three steps of data journalism: gather, process, visualise
Here’s a reprise of the approach to data journalism we
take in Chapter 13 of MMJ.
The easiest way to grasp how to do data journalism is
to think of what you do in more traditional practises of the journalism craft.
You gather information, you sift it, pick out the
significant and interesting bits, and present it to the audience in as
interesting way as you can. 
In short, you:  
  • gather raw information, 
  • process or filter it,
    and 
  • shape or visualise it.
You do exactly the same when data is your source
material, rather than a collection of quotes, documents and events.
So in this and all other practical demonstrations of
doing data journalism we will follow three steps:
  • Gather
    or find data
  • Process
  • or filter data
  • Visualise
    data
DATA JOURNALISM PROJECT 1: Using Socrata data in a Silk visualisation
Gather or find data 
Go to Open Data by Socrata, https://opendata.socrata.com/
I’ve taken as an example a data set I found there on alcohol consumption
per country from the World Health Organization (WHO). It offers a breakdown of
per capita alcohol consumption among adults over 15 across 193 countries.
You can find that dataset here (you’ll need to open an account at
Socrata if you haven’t already got one): https://opendata.socrata.com/Government/Alcohol-Consumption-Per-Country/hj43-2bpj
You’ll see this is a very simple data set with only two columns of
information, which makes it ideal as a first data journalism project
Process or Filter data
Filtering data involves two tasks:
· Cleaning it up by removing any information that we
do not want in our visualisation
· Sorting the information in the data by, for
example, adding subsections of the overall data.
We aren’t going to do either with this piece of data, but if you want an
idea of how you might filter data in Socrata, they have a useful video
demonstration here: https://opendata.socrata.com/videos/popup#basic-filtering
Visualise data
We are going to use a platform called Silk https://www.silk.co to visualise this data.
So we need to export it from Socrata and upload it to Silk.
Under the Export options in Socrata, choose ‘Export  as a CSV
for Excel’ and, for ease, save it to your desktop.
Open Silk and click on Create a new Silk.
Name it.
You’ll be invited to view a three minute video of how Silk works. It’s worth
pausing to take a look as it explains how silk is organised
Here’s what it says in summary:
When you upload a spreadsheet to Silk, each row of your spreadsheet is a
unit of data. With the alcohol consumption example we are working on, each line
has the name of the country and the alcohol consumption in litres per individual
over the age of 15 in that country.
Silk turns each of those lines into what it calls data cards.
So, with this example, when we upload it, a data card will be created for
each of the countries covered.
Your spreadsheet needs a row at the top which has the titles or headings
that enable the software to organise your data.
This one has just two: location, and alcohol consumption per capita. If
that line were missing for any reason, Silk or any other visualisation tool
could not make sense of the data, so you’d need to add appropriate headings.
That’s something you’d do at the Filtering data stage.
I could also have a column that grouped individual countries into the continents they are a part of. If I did, then Silk would organise these data
cards into  groups, which would mean the
information could be filtered and presented continent by continent
Your data is also converted by Silk into what it calls pages. You can
add elements to those pages, each of which is given its own unique url.
So if you are writing an article and want to embed pages of data –
subsets of the overall dataset – at particular points, you can do so.
Let’s go ahead and upload our data into Silk.
Click to ‘Proceed’ and choose the ‘Upload spreadsheet’ option. Then click
and drag the spreadsheet you have saved to your desktop into this area of Silk.
Click ‘Start import’.
The data on your spreadsheet is being turned into data cards. When that
has happened, you can click to ‘Explore’ data cards.
‘Explore’ is where you create visualisations in Silk
Try them out. Think, in each case, how easy that particular
visualisation makes the data to ‘read’.
Ideally, we’d like the type of visualisation we finally choose to enable
readers to see at a glance some salient facts. For example, which countries
have the highest  per capita consumption,
and which the lowest.
Some types of visualisation aren’t much use. ‘List’ gives each country
in alphabetical order, with the consumption. ‘Grid’ and ‘Mosaic’ don’t add
anything.
Groups is useful.
If you click on Group and then, under the ‘Group by’ option that appears,
use ‘Litres per capita’, you get individual countries grouped under levels of
consumption, which means you can quickly see geographic and cultural patterns in
the data.

 

Visualise on a map
Map sounds promising but you’ll find you are prompted to add categories
via dialogue boxes to organise the data. Silk gives suggestions.
Experiment with them.
You should find a set of markers added to the map, and when you click on
them you get information on alcohol consumption, as in this example:

 But that takes a lot of work by the reader, who has to click on a country pin to find out what the alcohol consumption there is, so doesn’t help them all that much.

Use the ‘Colour by’ dialogue and you get a more useful picture.
Now, a coloured disc appears on each country, with a number on it. The
number represented the consumption, rounded to the nearest litre, and the discs
vary in size depending on the level of consumption in that country.

So, now we are beginning to get a visual representation of the data that helps readers interpret it at glance, rather than by wading through lists of figures.

Publish your visualisations
At any point I can publish the visualisation Silk has created for me, and share it in various ways, including via a link:

Here’s that link: http://goo.gl/Mxm04p

Click on it and my visualisation will open up in a nicely presented
Google map, with the map element presented effectively in context. This is one
of the  individual pages Silk has created
for me, so I could use it as part of the article I am writing.
From there I can pick up code to enable me to embed it in the article I
am writing, if I’d like to:

If you do click to publish you’ll then have to click on Explore again to get back into the data.

Run along the rest of the options.
Bars and columns give you an immediate comparison.
With columns you need to scroll across to reveal which country is
represented in each stack.
See anything immediately interesting or surprising?
Which country would you have guessed, before looking at this data, had
the highest alcohol intake?
Russia?
Infact, according to his WHO data, it is little Luxembourg.

Second comes Ireland.

In terms of giving an instant indication of what the data shows, this is
an effective visualisation.
As you’d expect, Muslim countries show the lowest readings.
Note: Silk limits you to 3,000 rows of data, and will reject any data set that exceeds
that total when you try to upload it.
Further tuition on Silk and Socrata Open Data
Silk has a guide: How to use silk for journalism, here: https://www.silk.co/help/tutorials/how-to-use-silk-for-journalism
Silk’s YouTube channel is here: https://www.youtube.com/user/silkapp
Socrata’s guide to its Open Data initiative is here: http://www.socrata.com/open-data-field-guide/