A while ago, I mentioned that I wanted to get some school pupils together with some data analysts to see if we could organise a ‘data day’.
The good news is that (with the kind help of Deloittes and the London Borough of Barnet) we did it a few months ago (it was picked up by The Evening Standard here.
Over the next few weeks, I’d like to share everything that we learned doing this, but today, I’m going to start with the transcript of an interview that I did (partly by email) with Adrian Tan – one of Deloitte’s team. I think this captures what we set out to do quite well, and I hope it’s worth reading.
Data gathering for Barnet Schools Data Project
Paul Evans: Adrian, we’ve agreed that this exercise is a starting point for involving school pupils in the London Borough of Barnet in the use of open data. We want to show them what is available, how it needs to be formatted so that it can be used and how it can be juxtaposed and visualised to draw some conclusions.
Personally, I’m interested in the democratic and participative aspects of this. I want to…
- Increase awareness among the pupils of the availability and potential uses for public data
- Consequently, increasing awareness more widely within the borough of this opportunity
- Illustrating ways that this method of working can tell us new things that we didn’t know about the borough that we live in – thereby contributing to local policy development
- Highlighting problems that we encounter in working towards these aims (so that others can think about how these obstacles can be removed)
We’d like to look at what we can do with the public data that the government are pulling together from various sources onto a one stop shop at data.gov.uk along with anything else that we can get our hands on.
Adrian Tan: From a policymaking point of view, the potential use of public data is interesting.
- One could look at the data and identify clusters and patterns of occurrence. This is an evidence based approach which tells how events have actually taken place, rather than simply how we think they might have had.
- Armed with hindsight, we can then use the open data, with the right level of granularity, to mash it up with other sources to gain insights into the possible contributing factors. Open data enables valuable lessons to be learnt and unravels precious insights. What we need next is foresight
- Open data needs to be synergized with the right tools such as predictive analytics to proactively identify and address issues.
Fundamentally we are interested in how things will happen before they do, and not just simply why after they have.
PE: So what are we going to achieve by introducing school pupils to this idea?
AT: The reason it’s worth involving school pupils is that – while the Open Data initiative is undoubtedly commendable,not everyone is capable of appreciating data.
A common drawback is the vast quantities of data one has to go through to identify patterns and insights. Manipulating and making sense of data requires experience and knowledge – it’s not a skill that can be acquired in a day. For a layperson, it’s hard to know where to begin with raw data. It’s not always straightforward even for data professionals.
We’ve been working on ways to ensure that the data isn’t going to be overwhelming. Properly presented, it should engage people in an intellectually stimulating manner.
We will have an introductory day with the team at Deloitte, trying out some new and existing tools and we’ll be writing this up once the day is complete.
PE: What can we do in one day with school pupils?
AT: We’ve been testing out new visualisation tools and we’d like to show some of these to the pupils. People who have the experience and statistical grasp can look at vast quantities of data quickly and become aware of the important details about the place that we live in, or alternatively we can put things that we broadly know under the microscope and see if we can learn something useful.
You need the right tools of course. Dull data is hard to engage with. But if it’s presented in an intellectually stimulating way, it makes you think. We’ve always presented numbers as graphs, but today’s software really allows you to get the numbers into a place where anyone – not just a statistician – can learn something interesting from them.
Visualisation provides a bird’s eye view to an ocean of data and it also allows the user to put a magnifying glass on areas that matter most. Above all, visualisation allows people to lift themselves from the details, offer a fresh eye and possibly come up with novel ideas to address issues.
We want to show them what is available, how it needs to be formatted so that it can be used and how it can be juxtaposed and visualised to draw some conclusions/implications.
A good place to start with school pupils is to look at how they can use data to support what they’re already learning in schools. Different groups of people will be interested in different kinds of data. If we can find the right questions, we can encourage school pupils to add to the sum of human knowledge – to explain their surroundings to the rest of their school. They could even explain their surroundings to their local authority and precipitate real change at a local level.
PE: So what useful data have we found so far?
AT: Well, let’s start with traffic data. We can go here (insert link) to find out all sorts of information about local traffic accidents. We have the following information about each accident
- Age Band of Casualty
- Age Band of Driver
- Car Passenger
- Carriageway Hazards
- Casualty Severity
- Casualty Type
- Control
- Junction Detail
- Junction Location
- Light Conditions
- Pedestrian Crossing – Human Control
- Pedestrian Crossing – Physical Facilities
- Pedestrian Location
- Road Surface Conditions
- Road Type
- Sex of Casualty
- Sex of Driver
- Special Conditions at Site
- Speed Limit
- Vehicle Manoeuvre
- Weather Conditions
There is more information than this, but I had to do some cleaning up and I had to exclude some information that was simply not in a useable format. The 21 streams listed here are the ones I could use usefully.
So are there combinations of ‘road type’ and ‘speed limit’ that are particularly likely to result in accidents or the severity of accidents? How far does weather conditions affect the dangerousness of a particular type of road?
PE: So we can plot these on a map and offer predictions about where accidents are likely to happen? Can we say ‘when it’s rainy and dark, you should particularly avoid this stretch of road’? Will school pupils be able to print off maps that they can show to their school, providing useful pedestrian warnings?
AT: As the data came with longitude and latitude figures, we were able to plot these on a map.
As for predicting about where accidents are more likely to happen, we have to look at this with a statistician’s eye. If most of the accidents happen in fine weather, we can’t necessarily conclude that bad weather has nothing to do with accidents for all of the obvious reasons – especially if there are no bad-weather days in the period we’re sampling. And so on. We have to weight figures accordingly and not succumb to any of the usual cognitive biases.
This still leaves us with a lot to play with though. The interesting point about this set of data is the level of granularity. For each accident, there is record of whether it is slight, serious or fatal. This piece of information is crucial to a predictive approach. We could look at, within each category of vehicles type, the percentage of accidents that have resulted in a fatality and compare it across other vehicles. So we weren’t surprised when we found that accidents involving motorcycles of over 500 cc were twice as likely to result in a fatality than other vehicle types. Although this finding is intuitive and nothing new, by using this ‘granular insights’ approach, we could apply this methodology to other data fields and predict fatalities as well as accidents. Obviously, this is useful information for any policy-maker. Using this methodology, we hope school pupils can print off maps that they can show to their school on a number of variables, providing useful driver and pedestrian warnings.
PE: What else is interesting about the Accidents data?
AT: If we can draw data from several sources, we can provide a more useful picture. We can find things we hadn’t even thought of looking for. However, this isn’t always that easy. Matching up data may not always work. buy cialis online Different sources can granularise their data in different ways so we aren’t always comparing apples with apples. Different time frames (data collected over different periods of ime) means we are unable to draw conclusive inferences about what is actually driving the behavior /outcome hat we are interested in. The Accidents data is an exception. Weather, road condition, junction type, accident severity etc is all recorded at the same time. But once we juxapose it with other information, it becomes trickier.
PE: OK. I know we’ll be looking at other bits of information around crime and health, but let’s stick with the Accidents data for now. Is there anything we could d to take it further than just a day event?
AT: We think that uploading the data used on that day onto a public server where the pupils can continue to play with it and share it with their family and friends will be useful. Having data on a public server means people can share a view. Sharing a visualisation helps to create a shared view of a situation and draws members of the community together and aligns thinking on how we can improve on things that matters most to us.
PE: And can councils and councillors get anything from this?
AT: In an ideal world, the pupils would get this data together and come out with recommendations that could be taken to a council meeting. In reality, that probably won’t happen yet. What we’re doing here, for now, is stimulating thinking. It should be of interest to councillors that school pupils can draw interesting conclusions from data. It shows councils how they can raise their game. It also illustrates the need to collect, store and share better data.
It will also result in an instructive process where we all understand statistics a little better. The dialogue between people crunching data and people who have to make decisions on it will always be interesting. We all have to learn various golden rules about how correlation doesn’t guarantee a cause, or how common biasses skew findings and mislead us all.
But we need to make a start – hopefully we’ll know a lot more once we’ve spent a day mashing data around with a group of school pupils.