Event report: From data and information to knowledge: the Web of tomorrow – a talk by Dr Serge Abiteboul

Some notes taken at the Milner Award Lecture by Dr Serge Abiteboul for the Royal Society on 12th November, From data and information to knowledge: the Web of tomorrow. Dr Abiteboul was awarded the 2013 Milner Award, given annually for outstanding achievement in computer science by a European researcher.

Serge Abiteboul

Dr Abiteoul’s research work focuses mainly on data, information and knowledge management, particularly on the Web. Like NetIKX members, he is interested in the transition from data to knowledge. Among many prestigious projects, he has worked on Apple’s Siri interface and Active XML, a declarative framework that harnesses web services for data integration.

In a charming French accent, he explained to us that he was going to talk about networks – networks of machines (Internet), of content (Web) and people (social media).

Nowdays information is everywhere, worldwide. Everything is big and getting bigger – the size of the digital world is estimated to be doubling every 18 months. A web search engine now is a cluster of machines – maybe a million machines. In the past getting ten machines to work together was a big challenge! Engineering achievements have enabled hundreds of thousands of computers to work together.

Dr Abiteoul’s assumptions

1. The size will continue to grow
2. The information will continue to be managed by many systems (rather than a company like Facebook taking over all the world’s information).
3. These systems will be intelligent – in the sense that they produce/consume knowledge and not simply raw data.

The 4 + 1 V’s of Big Data…

Volume, Velocity, Variety, Veracity = four difficulties of big data. There is a huge mass of data, more than can be retrieved. And it is changing fast, particularly sets of data like the stock market. Furthermore, the information on the web is uncertain, full of imprecisions and contradictions. Search engines must contend with lies and opinions, not just facts.

Dr  Abiteoul’s +1 is Value – the bottom line is, what value comes from all this data? How does a computer decide what is important to present?

Data analysis is a technical challenge as old as computer science. We know how to do it with a small amount of data; the next challenge is to do it with a huge amount. Complex algorithms will have to be designed. These will need to do low level statistical analysis, because finding the perfect statistics will take too long. Maths, informatics, engineering and hardware are all needed.

But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die. (Genesis 2.17)

People often prefer being given one answer rather than a multitude of options to sort through. When we ask another person an answer, they don’t reply by giving us twenty pages to read through, so why should we interact with machines (search engines) like that? (Note – should information professionals be very selective and choosy with the information we put forward to customers, would they prefer a reading list of five books rather than twenty?).

Machines prefer formatted knowledge, logical statements. Machines can be programmed to find patterns – e.g. Woody Allen ‘is married to’ Soon-Yi Previn. But people write that two people are married in many different ways. How does a search engine cope with all the false statements and contradictions, e.g. ‘Elvis Presley died on 16 August 1977’ and ‘The King is alive’!

The real problem with the accuracy of Wikipedia is not incorrect amateurs but paid professionals with their own agenda, paid by companies to take a particular viewpoint.

The difficulty is when to stop searching – when to find just enough right answers. Precision, the fraction of results that are correct, must be balanced between the amount of results retrieved. There is a trade off between finding more knowledge and finding the correct knowledge. Machines will have to be programmed to separate the wheat from the chaff. Knowing the good sources, the trustable sources, is a huge advantage for this.

Serendipity

Next, Dr Abiteoul mentioned librarians! He praised the way that a librarian may suggest you read an article that transforms your research. Or you may hear by chance a song that totally obsesses you. Computers lack this serendipity – they’re square. Information professionals take heart: there is value in chance, in browsing shelves, in the ability of your brain to make suggestions computers wouldn’t.

Hyperamnesia

We cannot archive all the data we produce – there’s a lack of storage resources. How do we choose what we keep? The British Library is tackling this question through its UK Web Archive project, which involves archiving 4.8 million UK websites and one billion web pages.

The BL Web Archiving page says: “We are developing a new crowd sourcing application that will use Twitter to support an automated selection process. We envisage that in the future, automated selection of this sort will compliment manual selection by subject experts, resulting in a more representative and well-rounded set of collections.” So perhaps the web of the future will need both expert people and star computing systems.

The decisions of machines

Decisions are increasingly made by machines. For instance, automated transport systems like the Docklands Light Railway, or auto trading on the stock market. How far do we go with this, asked Dr Abiteoul. Would a machine be allowed to decide that someone is a terrorist and kill them, and if so at what level of certainty? At 90% sure? At 95% sure?

Soon machines will acquire knowledge for us, remember knowledge for us, reason for us. We should get prepared by learning informatics, so that we understand them.

There were so many ideas flying about that I was unable to note them all down! Luckily the whole lecture is freely available to watch at www.youtube.com/watch?v=to9_Xc9f96E.

Blog post by Emily Heath.

Event report: Knowledge organisation – past, present and future

Our latest NetIKX event on 26th November was all about information and knowledge management within organisations. We took a look at how IKM has evolved and where it’s likely to go next. Our speakers were Dr David Skryme, Analyst and Management Consultant at David Skryme Associates and Danny Budzak, Senior Information Manager at the London Legacy Development Corporation.

Dr David Skryme
In his presentation ‘The 7 Ages of IKM in Organizations’ David talked us through the development of information and knowledge management over the past few decades.

Dr David J. Skyrme

Dr David J. Skyrme

David sees capturing the most important information as being a vital part of knowledge management. Communities are essential for developing tacit knowledge, through people talking to other people and sharing their knowledge. Work organisations are really social places, about human relationships and people.

In the beginning knowledge was passed down verbally – the Icelandic sagas are a good example. Storytelling has come back into popularity as a tool for knowledge managers to bring knowledge management to life.

1995-1997
David compared the development of IKM to Shakespeare’s seven stages of mankind. He dates the formal emergence of knowledge management as a topic discussed in boardrooms by senior managers to 1995-1997, when Nonaka & Takeuchi’s seminal book ‘The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation’ was published. This highlighted many advantages of knowledge management for organisations, particularly from a research and development point of view.

  • 1997-1998 – IKM still adolescent, growing up, coming of age
  • 1998-2002 – Segmentation/consolidation
  • 2003-2005 – Re-evaluation and re-definition. Reaching middle age, maturity
  • 2005-2012 – social media emerges as ‘grass-roots’ IKM. Wikis are increasingly used as a good way of harvesting information.
  • 2013 – Big Data & Analytics. What do you do with it all? Are we overlooking the human element, getting carried away with IT?

“The true success of knowledge management is when it disappears” – that’s when it becomes part and parcel of working life.

The American Productivity & Quality Center’s website AQPC.org has lots of survey data on knowledge management through the years. Challenges between 1997 and 2013 have consistently been achieving knowledge capture and reuse, but there are new challenges now – social media, visualisation, ramification, co-creation with customers. Still, David feels we shouldn’t necessarily prefer the snazzy new vs the proven old; there are a lot of solid knowledge management techniques already out there.

Danny Budzak
Next Danny talked us through how he is developing data, information and knowledge management at the London Legacy Development Corporation (LLDC), in his role as Senior Information Manager.

Danny Budzak

Danny Budzak

The LLDC has a big job to do in its role to regenerate the Olympic Park and surrounding area. There are lots of policies at local and national level to comply with, and complex financial data that must be published in an annual report.

Danny feels that information professionals can benefit from linking good data quality to risk management. For example, LLDC health and safety files need to be well maintained to avoid fines and keep employees safe.

At a conference he attended, Danny heard an analogy that chief executives are like nursery school children – they like simple things and primary colours! Use bright graphics, try and capture your organisation’s knowledge pictorially. Big paper maps on the wall can be a good way of capturing and displaying information in an easy way to see, while mind maps can be a fast and effective way of taking notes at a conference.

Email encapsulates a lot of knowledge – but unfortunately accounts are set up individually. The metadata is hard for others to access again. To try and overcome this, LLDC has set up a collaborative environment for employees to communicate within. The organisation is now not going to fall over if key people leave.

To make sure document management is more efficient, Danny has introduced document control templates and version control to make sure documents are numbered properly – e.g. v0.1 for a first draft, v0.2 for a second draft. There have been issues with some people renaming the documents in their own way, but most people are using the new system.

In Danny’s opinion, information professionals are too timid. Nothing should be too complicated or complex for us – it should all be knowable. We should get very involved in our organisations. Go to meetings you’re not invited to, offer training sessions without being asked.

Once others trust you, they will share dark secrets and opinions, like ‘If I buy my own laptop, it won’t be subject to freedom of information‘.

Make data visible – say how many files you have in total and how many of these are duplicates. People understand concrete numbers and will appreciate how much it’s costing them.

Some really good ideas to take in here. Finally, think it was Danny that mentioned this delightful Dilbert strip, Knowledge worker!

Further reading: Our Storify collection of tweets from this event.

Blog post by Emily Heath. Many thanks to both our speakers.