NetIKX meeting about Open Data (14th May 2015)

Open Data

Write-up by Conrad Taylor (rapporteur)

The May 2015 meeting of the Network for Information and Knowledge
Exchange took Open Data as its subject. There was a smaller crowd than
usual, about 15 people in total, and rather than splitting into table
discussion groups halfway, as NetIKX meetings usually do, we kept
together for one extended conversation of about two and a half hours,
with Q&A sprinkled throughout.

The meeting had been organised by Steve Dale, who also chaired. As
first speaker he had invited Mark Braggins, whom he had met while
working on the Local Government Association’s Knowledge Hub project.
Mark was on the advisory board for that project, having previously
played a role in setting up the Hampshire Hub, which he later
described. Mark’s co-speaker was  Steven Flower, an independent
consultant in Open Data applications with experience of the technical
implementations of Open Data.

The slide sets for the meeting can be found HERE and HERE

Introducing Open Data

Mark started with an introduction to and definition of Open Data:
briefly put, it is data that has been made publicly available in a
suitable form, and with permissions, so that anyone can freely use it
and re-use it, for any purpose, at no cost; the only constraint is
that usually the licence requires the user to acknowledge and
attribute the source of the data.

Tim Berners-Lee, the inventor the the Web, has suggested a five-star
categorisation scheme for Open Data, which is explained at the Web
site http://5stardata.info. The lowest level, deserving one star, is
when you make your data available on the Web in any format, under an
open licence; it could be in a PDF document for example, or even a Web
graphic, and it needs a human to interpret it. You can claim two stars
if the data is put online in a structured form, for example in an
Excel spreadsheet, and three stars if the data is structured but also
in a non-proprietary format, such as a CSV file.

If your data is structured as a bunch of entities, each of which has a
Uniform Resource Indicator (URI) so it can be independently referenced
over the Internet, you deserve a fourth star, and if you then link
your data to other data sources, creating Open Linked Data, you can
claim that fifth star.

Steve Flower came in at this point and explained that his experience
is that most organisations who are going down the Open Data road are
at the three-star level. Moving further along requires some technical
knowledge, expertise and resource.

Mark displayed for us a list of organisations which publish Open Data,
and some of them are really large. The World Bank is a leader in this
(see http://data.worldbank.org). The UK Government’s datastore is at
http://data.gov.uk, and that references data published by a wide
variety of organisations. Some local government authorities publish
open data: London has the London Data Store, Bristol has Open Data
Bristol, Manchester has Open GM, Birmingham has its Data Factory,
Hampshire has its Hub and the London Borough of Lambeth has Lambeth in
Numbers. A good ‘five star’ example is the Open Data Communities
resource of the Department for Communities and Local Government
(DCLG).

Asked how the notion of Open Data relates to copyright, Mark explained
that essential to data having Open status is that it must be made
available with a licence which explicitly bestows those re-use rights.
In the UK, for government data, the relevant license is usually the
Open Government License, OGL. Other organisations often use a Creative
Commons license.

A key driver for national government departments, and local government
too, to get involved in the world of Open Data is that these bodies
now operate under an obligation to be transparent, for example local
government bodies are obliged to publish data about each item of
expenditure over £500 and about contracts they have awarded. Similar
requirements are driving Open Data also in the USA.

Mark recommended that anyone with an interest in Open Data must take a
look at the Web site of Owen Boswarva (http://www.owenboswarva.com/)
which has a number of demo examples of maps driven by Open Data.

I asked if the speakers knew of example of Open Data in the sciences.
Steven said that in the Open Data Manchester meetings which he
organises, they had had an interesting talk from a Manchester academic, Prof Carole Goble about experimental data being made available ‘openly’ in support of a
transparent scientific process and reproduceability. (Open Data is one of the ‘legs’ of a wider project dubbed Open Science, in which one publishes not only conclusions but also experimental data.)

Some Examples

Mark then illustrated how Open Data works and what benefits it can
bring with a number of examples. His first example was a map
visualisation based on a variety of data sources, which was
constructed for the Mayor of Trafford to discover where it would make
most sense to locate some extra defibrillators in the community. A mix
of data sources, some open and some closed, were tapped to inform the
decision making: population densities, venue footfall, age profiles,
venues with first-aiders on the staff, ambulance call-out data.
Happily, there is now real proof that the defibrillators which were
distributed be means of this evidence have been used and have
undoubtedly saved lives.

Mark’s second example was from Hampshire. A company called Nquiring
Minds has taken Open Data from a variety of sources, and used a
utility to construct an interactive map illustrating predictions of
the pressure from users on GP surgeries, looking forward year by year
to the year 2020. You can see this map and interact with it by going
to http://gpmap.ubiapps.com. This kind of application can inform
public policy debates and planning. (incidentally, that project was
funded by Innovate UK, the former Technology Strategy Board).

Steven described another health-and-location-data project which
gathered together statistics about the prescribing of statin
medication by GPs, in particular looking at where big-name-pharma
branded statins were being prescribed, at some considerable expense to
the NHS and public purse, compared to surgeries where cheaper generic
statins were being prescribed.

Another example was about a database system whereby universities can
share specialised scientific equipment with each other. (Take a look
at this at http://equipment.data.ac.uk.) The project was funded by
EPSRC. Of course it was important that the participating institutions
agreed a standard for which data should be shared (what fields, what
units of measure etc); and establishing this standard was perhaps the
most time-consuming part of getting the scheme going.

A very attractive example shown was the global Wind Map created by
Cameron Beccario. Using publicly available earth observation and
meteorological data from a variety of sources, and using a JavaScript
library called D3 (http://d3js.org), he has constructed a Web site
that in its initial appearance animates a display of wind direction
and velocity in animated form for everywhere in the world.

The Wind Map site is at http://earth.nullschool.net and it is worth
looking at. You can spin the globe around and if you sample a point on
the surface you will get a readout of latitude, longitude, wind speed
and direction. In fact it’s more than a wind map because you can also
switch to view other forms of data such as ocean currents, temperature
and humidity for either atmosphere or ocean, and you can change from
the default rotatable globe to a variety of map projections. While you
are there you can see what data source is being accessed for each
display. (Thanks to a minimalist interface design this is not at all
obvious but if you click on the word ‘earth’ a control menu will
reveal itself.)

About the Hampshire Hub

Mark then turned to the example closest to his heart, the Hampshire
Hub (http://www.hampshirehub.net). This is a collaboration between 22
partnering organisations, including the County Council, three unitary
authorities and 11 district councils, the Ordnance Survey, the police
and fire services, the Department for Communities and Local
Government, the British Army and two national parks. A lot of the data
is ‘five-star’ quality. That’s not true of all the sources, for
example Portsmouth health data is posted as Excel spreadsheets, but
there is an ongoing process over time to try to turn as much as is
practical into Linked Open Data.

Together with these data components, the Hub site also hosts articles,
news, tweets etc. and other kinds of ‘softer’ and contextual
information.

The site gives access to Area Profiles, which are automatically
generated from the data, with charts from which one can drill down to
the original datasets.

Building new projects around the Hampshire Hub

Around these original data resources, new initiatives are popping up
to both build on the datasets and contribute back to the ecosystem
with an open licence. An example which Mark displayed is the Flood
Event Model, which combines historical data with current environmental
conditions to attempt predictions of which places might be most at
risk from a severe weather event. It has taken quite a bit of effort
to use historical data to elicit hypotheses about chains of causation,
then re-test those against other more recent datasets, and as new
datasets can be linked into such as from the Met Office and the
Environment Agency, a really useful tool can emerge. And this could
have very practical benefits for advising the public and planning
emergency response.

Another example project is Landscape Watch Hampshire, which is sharing
a rather complex type of data: aerial photography from 2005 and 2013.
To compare these with the aim of documenting changes to the landscape
and its use really requires humans, so the plan of Hampshire’s
collaboration with the University of Portsmouth and Remote Sensing
Applications Consultants Ltd is to crowdsource analytical input from
the public. This is explained in more detail at
http://www.hampshirehub.net/initiatives/crowdsourcing-landscape-change.

Another initiative around the Hampshire Hub is to link open data
around planning applications. There is a major problem in keeping tabs
of planning applications because they are lodged with many different
planning authorities and in ways that don’t inter-operate, and if your
area is on the border between planning jurisdictions, it’s all the
more problematic. And this despite the fact that for example the
building of a hypermarket or the siting of a bypass has effects which
radiate for dozens of miles around.

Hampshire has taken the lead together with neighbour Surrey but also
with Manchester to determine what might be an appropriate standard
form in which planning data could be exported to be shared as Open
Data. The Open Data User Group (a ministerial advisory body to the
Cabinet Office) is building on this work.

Mark finished his part of the presentation by mentioning the Open
Data Camp event in Winchester, 21–22 February 2015 (including Open
Data Day). Over 100 people attended this first UK conference on Open
Data which he described as ‘super fun’. See http://www.odcamp.org.uk
where there are many stories and blogposts about the event and the
varied interests of participants in Open Data. Similar ‘camps’ coming
up are BlueLightCamp (Birmingham, 6–7 June), LocalGovCamp, UKGovCamp
and GovCamp Cymru.

The Open Data Ecosystem

Steven Flower spoke for a shorter time, and structured his talk around
the past, the present and our future relationship to Data; slightly
separately, the business of being Open; and the whole ecosystem that
Open Data works within. Steven is an independent consultant helping
organisations with implementing Open Data, many of then
non-governmental aid organisations, and indeed that morning he had had
a meeting with WaterAid.

When Steven has conversations with groups in organisations about their
data and how and why they want to make it ‘open’, often various kinds
of diagram are resorted to in order to articulate their ideas. Mostly
these are ‘network diagrams’, boxes and arrows, blobs joined by lines,
entities and their interrelationships. Sometimes the connected blobs
are machines, sometimes they are data entities, sometimes they are
people or departments.

In New York City their diagrams (which are quite pretty) show
‘islands’ or more properly floating clusters of datasets, mostly
tightly gathered around city functions such as transport, education,
the police; with other datasets being tethered ‘offshore’ and some
tied up to two or three such clusters. Some diagramming focuses more
on the drivers behind Open Data projects, and the kinds of people they
are meant to serve.

Past and present attitudes to data

Steven took us back with a few slides to the days when there were few
computers, and later when there were just 25 million Web pages. Now
the Internet has mushroomed almost beyond comprehension of its scope.
Perhaps when it comes to Open Data, we are still in ‘the early days’.
And data is something we are just not comfortable with. We struggle to
manage it, to exercise governance and security over it. We can’t
easily get our heads around it, it seems so abstract.

It seems that big companies get data, and they get Big Data. Tesco
aggregates, segregates and analyses its customers’ shopping
preferences through its Clubcard scheme. Spotify has been buying
companies such as The Echo Nest, whose software can help them analyse
what customers listen to, and might want to listen to next.

More and more people carry a ‘smartphone’, and these are not only the
means to access data: we continuously leak data whether deliberately
in the form of tweets, or unconsciously as GPS notes where we are.
People increasingly wear devices such as the FitBit which monitor our
personal health data. Sensors in people’s homes, smoke alarms and
security cameras for example, are being hooked up to the Internet
(‘the Internet of Things’) so they can communicate with us when they
are out.

People and organisations also worry about the ‘Open’ bit. Does it mean
we lose control of our personal data profile? If there is an
uncomfortableness about Open Data, why might we want to do this?

As an object to think about Steven offered us Mount Everest. It is
8,850 metres high (that is data). Digging deeper, Everest has a rich
set of attributes. Under the surface there is a complex geology, and
it is being pushed up by tectonic plate movements. Recently, one of
those movements resulted in an earthquake and thousands of deaths. To
help in this situation have come a number of volunteers from the Open
Streetmap community, who are working collaboratively to fill in the
gaps in mapping, something which can greatly benefit interventions in
disasters (the same community helped out during the Haiti earthquake).
Given a defined framework for location data, the task can be split up
into jobs, farmed out to volunteer teams and collected in a database.

Turning to his personal experience of Open Data projects, Steven gave
us a view of ‘past, present and future’. Around 2007, he was involved
in setting up a service called ‘Plings’ (places to go, things to do)
which merged data from various sources into a system so that young
people could find activities.

Moving to the present, Steven touched on a number of Open Data
projects with which he is involved. WaterAid is one of a number of
charities which believes strongly in transparency about how it works,
how it spends funds and so on. They are involved in an project called
the International Aid Transparency Initiative, which is building a
standard for sharing data about aid programmes. He showed us a
screenful of data about a WaterAid project in Nepal, structured in
XML.

Published as Open Data, this information can then be accessed and used
in a number of ways. Some of this is internal to the organisation so
they can make better sense of what they are doing, but because it is
Open Data it can also be visualised externally, and Steven showed us
some screens from the Development Portal (http://d-portal.org),
displaying IATI data sourced from WaterAid.

As data on situations requiring overseas aid and disaster
interventions is shared more transparently, it should become more
possible for use data as the basis of better aid delivery decisions.

The future generation, the future view

And what of the future? Well, in Steven’s immediate future, just a few
days ahead, he said he would be working with a group of schoolchildren
in Stockport, who have ‘Smart Citizen’ miniature computers in their
school, based on an Arduino board with an ‘Urban Shield’ board
attached. The sensors on this apparatus harvest data about
temperature, humidity, nitrous oxide and carbon monoxide emissions,
light and noise, and their data is uploaded and shared on the Smart
Citizen Web site (https://smartcitizen.me/). Thus the school is
joining a worldwide project with over a thousand deployed devices.

I personally know of a very similar project in South Lambeth, where a
school is collaborating with a school science project called Loop Labs
founded by Nicky Gavron, a member of the London Assembly. The Lambeth
project differs in a number of respects: it is focused only on the air
quality data, but rather that having a single sensor in the classroom
they have a number of ‘air quality egg’ sensors deployed to the
children’s homes to get local comparison data. Also, the Loop Labs
project is strongly linked to environmental health issues.

Where the Salford schools project looks like being really
adventurously pioneering is in the ways they aim to use the data
harvest locally. Spreadsheets is one option, but how cool are they?
Steven hope that the data can be sucked into the Minecraft environment
for building virtual items assembled from blocks, and maybe data
patterns can be made audible by transforming the values captured into
the code that programs the Sonic Pi music synthesiser software for
Raspberry Pi, Mac OS X and Windows (Google that, download the code and
play with it — great fun!).

The schoolkids have also used 3D printing to create a small fleet of
Cannybots — or rather, as I understand it, what you 3D print is a kind
of Airfix kit for the casing and then you install a small circuit
board with an ARM processor on it, plus Bluetooth communication for
control (see http://cannybots.com/overview.html). The next step will
be to use the data from their Citizen Science sensor to drive patterns
of behaviour in the mini robots. There is no guarantee that this will
work, but that is the nature of the future!

What does an Ecosystem need?

To be healthy, any ecosystem for Open Data absolutely require
standards, and the IATI standard is an example of that in action.
Steven is also working on a standard for a project called 360 Giving
(http://threesixtygiving.org/), which is a movement that encourages UK
grant makers and philanthropists to publish their grant information
online. It is a small beginning but currently 14 grant makers are
participating.

Steven’s also working on the international Open Contracting data
standard (http://standard.open-contracting.org/), where the Open Data
publishing pioneers are the Colombian and Mexican governments. With an
estimated US$9.5 trillion being spent every year on government
contracts, transparency in this area could help to hem in and expose
corruption and mismanagement.

A healthy Open Data ecosystem also requires feedback loops, and for
Open Data these should be super-responsive. The Citizens Advice Bureau
in the UK does this well; Steven showed us one of their Web pages
which in near real time shows which are the main topics people are
searching on from the CAB.

Finally, a healthy Open Data ecosystem needs the engagement of people.
At the moment, the Open Data system does tend to look like this: young
men, at a weekend, gathered around laptops and with pizza. What the
scene needs is greater diversity. That is one reason that Steven is
involved in the Manchester CoderDojo project, which monthly gathers
large numbers of young people to engage with coding and data. You can
find out more about that here: http://mcrcoderdojo.org.uk/ Five years
ago, the average participant was age 15; now, it is age 10!

In discussion: where does the data come from?

One of our number (Stuart) said that in the presentations, almost all
of the ‘supply’ side had come from the public sector (or NGOs). What
was the private sector contributing? Mark said that this is generally
true; but the UK Department for International Development (DfID) now
require all of their international contractors to publish IATI open
data about their contract work: the contracting firm Capita is a case
in point.

The Open Contracting programme is also interesting here, and the
biggest driver in that is the World Bank, plus very large contracting
organisations such as IBM who think they will benefit from
transparency.

In discussion: Open and Big Data

Graham Robertson wondered what intersect there was between Open Data
and Big Data. Steven said that personally he works with organisations
publishing small datasets, but through their openness they do connect
into something larger. Steve Dale thought that ‘Big Data’ has a
meaning that is hard to pin down; one organisation’s Big might be
another’s Small.

Graham wondered if the huge volumes of data now processed in weather
forecasting provide an example of Big Data. I think they do, and even
larger volumes are processed in the supercomputers of the Hadley
Centre for Climate Change which is co-located in Exeter. Developing,
testing and running climate change prediction models also benefits
increasingly for Open Data, as nearly 90 countries and as many
international organisations now share sensor data and observations
from platforms from the ocean bed to outer space, through GEOSS (the
Global Earth Observation System of Systems).

In discussion: privacy issues

We did along the way have a discussion about the interrelation between
the openness of data, and the privacy of individuals on whom that data
is based: health data being a particularly sensitive case. To what
extent can data really be anonymised, when location information is
also implicated, or anything else which might identify a data subject?
I described some recent voluntary work for Tollgate Lodge Health
Centre in Stoke Newington, in which I examined demographics data from
the Census about family size and age profiles of the population in
Hackney. At a Borough level, age data can be obtained at single-year
granularity; at Ward level, the population is aggregated into age
bands; and the bands are even coarser ond more aggregated if
geographically you go down to Super Output Areas.

Steve came up with an example of how Open Data about taxi journeys in
New York had been collated by someone with tweets by celebrities about
their taxi journeys, leading to the possibility of figuring who had
made what journeys. It’s worth remembering that potentially we are
shedding flakes of data all the time, like dandruff.

In discussion: a skill gap?

From his experience of the data sector in Manchester, Steven said that
business complains that they can’t find people with the relevant
skills, and why can the schools address this better? But on the other
hand, industry is also unclear about what skills it will need in five
years’ time. Perhaps we would do well to think about ‘data literacy’,
and he feels pretty sure that the kids with whom he was about to start
working might not know how to interpret a graph or a data map.

Steve Dale referred back to the previous NetIKX meeting (see blog post
here: https://netikx.wordpress.com/2015/03/23/seek-and-you-will-find-wednesday-18th-march-2015/)
and observations made by the speaker Tony Hirst. Tony has said that
people these days tend to trust too much in algorithms without asking
what it does or who made it; from that it follows that excessive trust
is placed in the output of those algorithms. The pre-election
predictions of the pollster algorithms certainly went wrong!

Stuart Ward (Forward Consulting) thought that driving Open Data and
gaining organisational advantage from it, requires information
officers to be encouraged to be pro-active in contacting their peers
in other organisations, actively looking for opportunities to
collaborate (as indeed does seem to have been happening in local
government).

Steven reported on an interesting collaboration between WaterAid and
JPMorgan. The latter has an undergraduate programme to find the best
talent and recruit them, and they set these people to work on
WaterAid’s open data, e.g. to produce visualisations; thus they could
spot the best people to employ in their business.

In discussion: miscellaneous thoughts

As for me, I wondered if too great a concentration on the value of
data in research might have the unfortunate effect of further
sidelining qualitative forms of enquiry in the social sciences and in
knowledge management. Both forms of enquiry have their value, and it
is interesting to note approaches which link narrative enquiry to
big-data scale, as is possible using Cognitive Edge’s SenseMaker
software and methods (See explanation by Dave Snowden at
http://www.sensemaker-suite.com/)

How well is the UK placed in a ‘league table’ of countries doing Open
Data? Our speakers thought pretty much at the top, followed by the
USA; Steve Dale though France had claimed the lead. The 2014 index
maintained by the Open Knowledge Foundation, for what it is worth,
ranks UK first at ‘97% open’, France third at 80%, and the USA in
eighth place at 70% open (see http://index.okfn.org/place/).

In Britain there has been an interesting history of struggle over
barriers that haven’t been present in the USA: for example, their
postcode data has always been open-access, whereas it took a bit of a
battle to get Royal Mail to open it up. People who want to know more
about the history and status of address data in the UK would do well
to read the report and listen to the MP3 podcasts from the recent
‘Address Day’ conference organised by the BCS Location Information
Specialist Group.

Twitter

A certain amount of tweeting was done through the afternoon — the
hashtag was #netikx73

1 reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published.