November 2019 Seminar: Useful evidence: getting research evidence valued and used.

Summary

At this meeting Jonathan Breckon of the Alliance for Useful Evidence explained the role of his organisation. It is a network, hosted by the UK’s innovation charity Nesta, the government’s vehicle for designing, testing and scaling new solutions to some identified problems. These organisations champion the smarter use of evidence in social policy and practice. They do this through advocacy, convening events, sharing ideas and resources, and supporting individuals and organisations through advice and training. The work is promoted through an open access network of more than 4,300 individuals from across government, universities, charities, businesses, and local authorities in the UK and internationally which we could all join for free. He then described the COM-B framework for encouraging research use, developed by researchers at UCL, and adopted by a range of foundations and government agencies in UK and overseas.

COM-B is a ‘behaviour system’ involving three essential conditions: Capability, Opportunity and Motivation and it forms the hub of a ‘behaviour change wheel’ (BCW), around which are positioned the nine intervention functions aimed at addressing deficits in one or more of these conditions; around this are placed seven categories of policy that could enable those interventions to occur. It has been used in a number of applications, mainly health related. We looked at ways to improve hearing-aid use in adult auditory rehabilitation. He also presented the factors that influence medication-taking behaviour, a tobacco control strategy, and the NICE Obesity Guidelines.

We learned that research evidence is just one factor that can influence decision-making at a policy and practice level. Various interventions have been developed to enhance and support the use of research evidence by decision-makers. Jonathan explained that in order to decide which interventions are most effective, there had been a systematic review of research into research use, leading to these new methods for characterising and designing behaviour change intervention. He provided us with some practical tips on the best mechanisms for more generally getting research evidence valued and used. This then led into small group discussions where we were given time to consider what was relevant to our own organisations and how we could make use of these insights within their own work programmes.

Speakers

Jonathan Breckon has been the Director of The Alliance for Useful Evidence since it was created at Nesta in 2012. Formerly Director of Policy and Public Affairs at the Arts and Humanities Research Council, he has had policy roles at the Royal Geographical Society, the British Academy, and Universities UK. He is a member of the Cabinet Office What Works Council and a Director of the Department for Education’s What Works for Children’s Social Care. His research and professional interests cover politics and psychology, particularly the relationship between evidence and policy-making. He is a Visiting Professor at Strathclyde University and a Visiting Senior Research Fellow at King’s College London’s Policy Institute. His research and professional interests cover politics and psychology, the relationship between evidence and policy-making, and the role of professional bodies, such as medical Royal Colleges, in applying evidence-based practice. When not at work, or being ordered around by two young children, he wastes time protecting his family’s veg garden from slugs or finding excuses to go sailing in the Solent.

Time and Venue

2pm on 21st November 2019, The British Dental Association, 64 Wimpole Street, London W1G 8YS

Slides

No slides available for this presentation

Tweets

#netikx100

Blog

There is no blog report for this seminar.

Study Suggestions

Nesta Website: https://www.nesta.org.uk/

January 2019 Seminar: Making and sharing knowledge in communities: can wikis and related tools help?

Summary

At this meeting Andy Mabbett, a hugely experienced Wikipedia editor, gave an introduction to the background of Wikipedia and discussed many of the issues that it raises.
Accumulating, organising and sharing knowledge is never easy; this is the problem Knowledge Management sought to address. Today we hope networked electronic platforms can facilitate the process. They are never enough in themselves, because the issues are essentially human, to do with attitudes, social dynamics and work culture — but good tools certainly help.

In past seminars, NetIKX has looked at MS Sharepoint, but that is proprietary and commercial, and it doesn’t work for wider communities of practice and interest. In this seminar, we looked at a range of alternatives, some of them free of charge and/or open source, together with the social dynamics that make them succeed or fail.
First we looked at the wiki model. The case study was Wikipedia — famous, but poorly understood. Andy Mabbett presented this. Andy is a hugely experienced Wikipedia editor, who inspires respect and affection around the world for his ability to explain how Wikipedia works, and for training novices contributing content – including as a ‘Wikipedian In Residence’ encouraging scientific and cultural organisations to contribute their knowledge to Wikipedia.

A few stats: Wikipedia, the free online encyclopedia that anyone can in theory edit, has now survived 18 years, existing on donations and volunteering. It has accumulated over 40 million articles in 301 languages, and about 500 million visitors a month. The English edition has nearly 5.8 million articles. There are about 300,000 active contributors, of whom 4,000 make over a hundred edits annually.

Under the wider banner of ‘Wikimedia’, there are sister projects such as Wiktionary, Wikiversity, which hosts free learning materials, Wikidata, which is developing a large knowledge base, and the Wikimedia Commons, which holds copyright-free photos, audio and other multimedia resources.

And yet, as the Wikipedia article on Wikipedia admits, “Wikipedia has been criticized for exhibiting systemic bias, for presenting a mixture of ‘truths, half truths, and some falsehoods’, and for being subject to manipulation and spin in controversial topics.” This isn’t so surprising, because humans are involved. It’s a community that has had to struggle with issues of authority and quality control, partiality and sundry other pathologies. Andy provided insight into these problems, and explained how the Wikipedia community organises itself to define, defend and implement its values.

No NetIKX seminar would be complete without syndicate sessions, conducted in parallel table groups. For the second half of the afternoon, each group was presented in turn with tales from two further case studies of knowledge sharing using different platforms and operating under different rules. These endeavours might have used email lists, Google Docs, another kind of wiki software, or some other kind of groupware. There were tales of triumph, but of tribulation too.

At the end of the afternoon polling thoughts helped to identify key factors that may point the way towards building better ways of sharing knowledge.

Speakers

Andy Mabbett has been a Wikipedia editor (as User:Pigsonthewing) since 2003 and involved with Wikidata since its inception in 2012. He has given presentations about Wikimedia projects on five continents, and has a great deal of experience working with organisations that wish to engage with Wikipedia and its sister projects. With a background in programming and managing websites for local government, Andy has been ‘Wikimedian in Residence’ at ORCID; TED; the Royal Society of Chemistry; The Physiological Society; the History of Modern Biomedicine Research Group; and various museums, galleries and archives. He is also the author of three books on the rock band Pink Floyd.

Our case-study witnesses

Sara Culpin is currently Head of Information & Knowledge at CRU International, where she has implemented a successful information and knowledge strategy on a shoestring budget. Since graduating from Loughborough University, she has spent over 25 years in information and knowledge roles at Aon, AT Kearney, PwC, and Deloitte. She is passionate about getting colleagues to share their knowledge across their organisations, while ensuring that their senior managers see the business value. https://www.linkedin.com/in/sara-culpin-2a1b051

Dr Richard Millwood has a background in school maths education, with a history of applying computers to education, and is Director of Core Education UK. As a researcher in the School of Computer Science & Statistics, Trinity College Dublin, he is developing a community of practice for computer science teachers in Ireland and creating workshops for families to develop creative use of computers together. In the 1990s Richard worked with Professor Stephen Heppell to create Ultralab, the learning technology research centre at Anglia Polytechnic University, acting as head 2005–2007. He researched innovation in online higher education in the Institute for Educational Cybernetics at the University of Bolton until 2013, gaining a PhD by Practice in ‘The Design of Learner-centred, Technology-enhanced Education’.

Time and Venue

2pm on 24th January 2019, The British Dental Association, 64 Wimpole Street, London W1G 8YS

Pre Event Information

None

Slides

No slides available for this presentation

Tweets

#netikx96

Blog

See our blog report: Wikipedia & knowledge sharing

Study Suggestions

Andy Mabbett, experienced Wikipedian

Andy Mabbett is an experienced editor of Wikipedia with more than a million edits to his name. Here’s a link to a ‘Wikijabber’ audio interview with Andy by Sebastian Wallroth (Sept 2017)
https://wikijabber.com/wikijabber-0005-with-pigsonthewing/

Blog for the July 2018 seminar: Machines and Morality: Can AI be Ethical?

In discussions of AI, one issue that is often raised is that of the ‘black box’ problem, where we cannot know how a machine system comes to its decisions and recommendations. That is particularly true of the class of self-training ‘deep machine learning’ systems which have been making the headlines in recent medical research.

Dr Tamara Ansons has a background in Cognitive Psychology and works for Ipsos MORI, applying academic research, principally from psychology, to various client-serving projects. In her PhD work, she looked at memory and how it influences decision-making; in the course of that, she investigated neural networks, as a form of representation for how memory stores and uses information.

At our NetIKX seminar for July 2018, she observed that ‘Artificial Intelligence’ is being used across a range of purposes that affect our lives, from mundane to highly significant. Recently, she thinks, the technology has been developing so fast that we have not been stepping back enough to think about the implications properly.

Tamara displayed an amusing image, an array of small photos of round light-brown objects, each one marked with three dark patches. Some were photos of chihuahua puppies, and the others were muffins with three raisins on top! People can easily distinguish between a dog and a muffin, a raisin and an eye or doggy nose. But for a computing system, such tasks are fairly difficult. Given the discrepancy in capability, how confident should we feel about handing over decisions with moral consequences to these machines?

Tamara stated that the ideas behind neural networks have emerged from cognitive psychology, from a belief that how we learn and understand information is through a network of interconnected concepts. She illustrated this with diagrams in which one concept, ‘dog’, was connected to others such as ‘tail’, ‘has fur’, ‘barks’ [but note, there are dogs without fur and dogs that don’t bark]. From a ‘connectionist’ view, our understanding of what a dog is, is based around these features of identity, and how they are represented in our cognitive system. In cognitive psychology, there is a debate between this view and a ‘symbolist’ interpretation, which says that we don’t necessarily abstract from finer feature details, but process information more as a whole.

This connectionist model of mental activity, said Tamara, can be useful in approaching some specialist tasks. Suppose you are developing skill at a task that presents itself to you frequently – putting a tyre on a wheel, gutting fish, sewing a hem, planning wood. We can think of the cognitive system as having component elements that, with practice and through re-inforcement, become more strongly associated with each other, such that one becomes better at doing that task.

Humans tend to have a fairly good task-specific ability. We learn new tasks well, and our performance improves with practice. But does this encapsulate what it means to be intelligent? Human intelligence is not just characterised by ability to do certain tasks well. Tamara argued that what makes humans unique is our adaptability, the ability to learnings from one context and applying them imaginatively to another. And humans don’t have to learn something over many, many trials. We can learn from a single significant event.

An algorithm is a set of rules which specify how certain bits of information are combined in a stepwise process. As an example, Tamara suggested a recipe for baking a cake.

Many algorithms can be represented with a kind of node-link diagram that on one side specifies the inputs, and on the other side the outputs, with intermediate steps between to move from input to output. The output is a weighted aggregate of the information that went into the algorithm.

When we talk about ‘learning’ in the context of such a system – ‘machine learning’ is a common phrase – a feedback or evaluation loop assesses how successful the algorithms are at matching input to acceptable decision; and the system must be able to modify its algorithms to achieve better matches.

Tamara suggests that at a basic level, we must recognise that humans are the ones feeding training data to the neural network system – texts, images, audio etc. The implication is that the accuracy of machine learning is only as good as the data you give it. If all the ‘dog’ pictures we give it are of Jack Russell terriers, it’s going to struggle at identifying a Labrador as a dog. We should also think about the people who develop these systems – they are hardly a model of diversity, and women and ethnic minorities are under-represented. The cognitive biases of the developer community can influence how machine learning systems are trained, what classifications they are asked to apply, and therefore how they work.

If the system is doing something fairly trivial, such as guessing what word you meant to type when you make a keyboarding mistake, there isn’t much to worry about. But what if the system is deciding whether and on what terms to give us insurance, or a bank loan or mortgage? It is critically important that we know how these systems have been developed, and by whom, to ensure that there are no unfair biases at work.

Tamara said that an ‘AI’ system develops its understanding of the world from the explicit input with which it is fed. She suggested that in contrast, humans make decisions, and act, on the basis of myriad influences of which we are not always aware, and often can’t formulate or quantify. Therefore it is unrealistic, she suggests, to expect an AI to achieve a human subtlety and balance in its .

However, there have been some very promising results using AI in certain decision-making contexts, for example, in detecting certain kinds of disease. In some of these applications, it can be argued that the AI system can sidestep the biases, especially the attentional biases, of humans. But there are also cases where companies have allowed algorithms to act in highly inappropriate and insensitive ways towards individuals.

But perhaps the really big issue is that we really don’t understand what is happening inside these networks – certainly, the really ‘deep learning’ networks where the hidden inner layers shift towards a degree of inner complexity which it is beyond our powers to comprehend. This is an aspect which Stephanie would address.
Stephanie Mathieson is the policy manager at ‘Sense About Science’, a small independent campaigning charity based in London. SAS was set up in 2002 as the media was struggling to cope with science-based topics such as genetic modification in farming, and the alleged link between the MMR vaccine and autism.

SAS works with researchers to help them to communicate better with the public, and has published a number of accessible topic guides, such as ‘Making Sense of Nuclear’, ‘Making Sense of Allergies’ and other titles on forensic genetics, chemical stories in the press, radiation, drug safety etc. They also run a campaign called ‘Ask For Evidence’, equipping people to ask questions about ‘scientific’ claims, perhaps by a politician asking for your vote, or a company for your custom.

But Stephanie’s main focus is around their Evidence In Policy work, examining the role of scientific evidence in government policy formation. A recent SAS report surveyed how transparent twelve government departments are about their use of evidence. The focus is not about the quality of evidence, nor the appropriateness of policies, just on being clear what evidence was taken into account in making those decisions, and how. In talking about the use of Artificial Intelligence in decision support, ‘meaningful transparency’ would be the main concern she would raise.

Sense About Science’s work on algorithms started a couple of years ago, following a lecture by Cory Doctorow, the author of the blog Boing Boing, which raised the question of ‘black box’ decision making in people’s lives. Around the same time, similar concerns were being raised by by the independent investigative newsroom ‘ProPublica’, and Cathy O’Neil’s book ‘Weapons of Math Destruction’. The director of Sense About Science urged Stephanie to read that book, and she heartily recommends it.

There are many parliamentary committees which scrutinise the work of government. The House of Commons Science and Technology Committee has an unusually broad remit. They put out an open call to the public, asking for suggestions for enquiry topics, and Stephanie wrote to suggest the role of algorithms in decision-making. Together with seven or eight others, Stephanie was invited to come and give a presentation, and she persuaded the Committee to launch an enquiry on the issue.

The SciTech Committee’s work was disrupted by the 2016 snap general election, but they pursued the topic, and reported in May 2018. (See https://www.parliament.uk/business/committees/committees-a-z/commons-select/science-and-technology-committee/news-parliament-2017/algorithms-in-decision-making-report-published-17-19-/)

Stephanie then treated us to a version of the ‘pitch’ which she gave to the Committee.

An algorithm is really no more than a set of steps carried out sequentially to give a desired outcome. A cooking recipe, directions for how to get to a place, are everyday examples. Algorithms are everywhere, many implemented by machines, whether controlling the operation of a cash machine or placing your phone call. Algorithms are also behind the analysis of huge amounts of data, carrying out tasks that would be beyond the capacity of humans, efficiently and cheaply, and bringing a great deal of benefit to us. They are generally considered to be objective and impartial.

But in reality, there are troubling issues with algorithms. Quite rapidly, and without debate, they have been engaged to make important decisions about our lives. Such a decision would in the past have been made by a human, and though that person might be following a formulaic procedure, at least you can ask a person to explain what they are doing. What is different about computer algorithms is their potential complexity and ability to be applied at scale; which means, if there are biases ingrained in the algorithm, or in the data selected for them to process, those shortcomings will also be applied at scale, blindly, and inscrutably.

  • In education, algorithms have been used to rank teachers, and in some cases, to summarily sack the ‘lower-performing’ ones.
  • Algorithms generate sentencing guidelines in the criminal justice system, where analysis has found that they are stacked against black people.
  • Algorithms are used to determine credit scores, which in turn determine whether you get a loan, a mortgage, a credit card, even a job.
  • There are companies offering to create a credit score for people who don’t have a credit history, by using ‘proxy data’. They do deep data mining, investigate how people use social media, how they buy stuff online, and other evidences.
  • The adverts you get to see on Google and Facebook are determined through a huge algorithmic trading market.
  • For people working for Uber or Deliveroo, their bosses essentially are algorithms.
  • Algorithms help the Government Digital Service to decide what pages to display on the gov.uk Web site. The significance is, that site is the government’s interface with the public, especially now that individual departments have lost their own Web sites.
  • A recent Government Office for Science report suggests that government is very keen to increase its use of algorithms and Big Data – it calls them ‘data science techniques’ – in deploying resources for health, social care and the emergency services. Algorithms are being used in the fire service to determine which fire stations might be closed.

In China, the government is developing a comprehensive ‘social credit’ system – in truth, a kind of state-run reputation ranking system – where citizens will get merits or demerits for various behaviours. Living in a modestly-sized apartment might add points to your score; paying bills late or posting negative comments online would be penalised. Your score would then determine what resources you will have access to. For example, anyone defaulting on a court-ordered fine will not be allowed to buy first-class rail tickets, or to travel by air, or take a package holiday. That scheme is already in pilots now, and is supposed to be fully rolled out as early as 2020.

(See Wikipedia article at https://en.wikipedia.org/wiki/Social_Credit_System and Wired article at https://www.wired.co.uk/article/china-social-credit.)

Stephanie suggested a closer look at the use of algorithms to rank teacher performance. Surely it is better to do so using an unbiased algorithm? This is what happened in the Washington school district in the USA – an example described in some depth in Cathy O’Neil’s book. At the end of the 2009–2010 school year, all teachers were ranked, largely on the basis of a comparison of their pupils’ test scores between one year and the next. On the basis of this assessment, 2% of teachers were summarily dismissed and a further 5% lost their jobs the following year. But what if the algorithms were misconceived, and the teachers thus victimised were not bad teachers?

In this particular case, one of the fired teachers was rated very highly by her pupils and their parents. There was no way that she could work out the basis of the decision; later it emerged that it turned on this consecutive-year test score proxy, which had not taken into account the baseline performance from which those pupils came into her class.

It cannot be a good thing to have such decisions taken by an opaque process not open to scrutiny and criticism. Cathy O’Neil’s examples have been drawn from the USA, but Stephanie is pleased to note that since the Parliamentary Committee started looking at the effects of algorithms, more British examples have been emerging.

Summary:

  • They are often totally opaque, which makes them unchallengeable. If we don’t know how they are made, how do we know if they are weighted correctly? How do we know if they are fair?
  • Frequently, the decisions turned out by algorithms are not understood by the people who deliver that decision. This may be because a ‘machine learning’ system was involved, such that the intermediate steps between input and output are undiscoverable. Or it may be that the service was bought from a third party. This is what banks do with credit scores – they can tell you Yes or No, they can tell you what your credit score is, but they can’t explain how it was arrived at, and whether the data input was correct.
  • There are things that just can’t be measured with numbers. Consider again that example of teacher rankings; the algorithm just can’t process issues such as how a teacher deals with the difficult issues that pupils bring from their home life, not just the test results.
  • Systems sometimes cannot learn when they are wrong, if there is no mechanism for feedback and course correction.
  • Blind faith in technology can lead to the humans who implement those algorithmically-made decisions failing to take responsibility.
  • The perception that algorithms are unbiased can be unfounded – as Tamara had already explained. When it comes to ‘training’ the system, which data do you include, which do you exclude, and is the data set appropriate? If it was originally collected for another purpose, it may not fit the current one.
  • ‘Success’ can be claimed even when people are having harm done to them. In the public sector, managers may have a sense of problems being ‘fixed’ when teachers are fired. If the objective is to make or save money, and teachers are being fired, and resources saved to be redeployed elsewhere, or profits are being made, it can seem like the model is working. The fact that that objective defined at the start has been met, makes it justify itself. And if we can’t scrutinise or challenge, agree or disagree, we are stuck in that loop.
  • Bias can exist within the data itself. A good example is university admissions, where historical and outdated social norms which we don’t want to see persist, still lurk there. Using historical admissions data as a training data set can entrench bias.
  • Then there is the principle of ‘fairness’. Algorithms consider a slew of statistics, and come out with a probability that someone might be a risky hire, or a bad borrower, or a bad teacher. But is it fair to treat people on the basis of a probability? We have been pooling risk for decades when it comes to insurance cover – as a society we seem happy with that, though we might get annoyed when the premium is decided because of our age rather than our skill in driving. But when sending people to prison, are we happy to tolerate the same level of uncertainty within data? And is past behaviour really a good predictor of future behaviour? Would we as individual be happy to treated on the basis of profiling statistics?
  • Because algorithms are opaque, there is a lot of scope for ‘hokum’. Businesses are employing algorithms; government and its agencies, are buying their services; but if we don’t understand how the decisions are made, there is scope for agencies to be sold these services by snake oil salesmen.

What next?

In the first place, we need to know where algorithms are being used to support decision-making, so we know how to challenge the decision.

When the SciTech committee published its report at the end of May, Stephanie was delighted that they took her suggestion to ask government to publish a list of all public-sector uses of algorithms, and where that use is being planned, where they will affect significant decisions. The Committee also wants government to identify a minister to provide government-wide oversight of such algorithms, where they are being used by the public sector, to co-ordinate departments’ approaches to the development and deployment of algorithms, and such partnerships with the private sector. They also recommended ‘transparency by default’, where algorithms affect the public.

Secondly, we need to ask for the evidence. If we don’t know how these decisions are being made, we don’t know how to challenge them. Whether teacher performance is being ranked, criminals sentenced or services cut, we need to know how those decisions are being made. Organisations should apply standards to their own use of algorithms, and government should be setting the right example. If decision-support algorithms are being used in the public sector, it is so important that people are treated fairly, that someone can be held accountable, and that decisions are transparent, and that hidden prejudice is avoided.

The public sector, because it holds significant datasets, actually holds a lot of power that it doesn’t seem to appreciate. In a couple of cases recently, it’s given data away without demanding transparency in return. A notorious example was the 2016 deal between the Royal Free Hospital and Google DeepMind, to develop algorithms to predict kidney failure, which led to the inappropriate transfer of personal sensitive data.

In the Budget of November 2017, the government announced a new Centre for Data Ethics and Innovation, but it hasn’t really talked about its remit yet. It is consulting on this until September 2018, so maybe by the end of the year we will know something. The SciTech Committee report had lots of strong recommendations for what its remit should be, including evaluation of accountability tools, and examining biases.

The Royal Statistical Society also has a council on data ethics, and the Nuffield Foundation set up a new commission, now the Convention on Data Ethics. Stephanie’s concern is that we now have several different bodies paying attention, but they should all set out their remits to avoid the duplication of work, so we know whose reports to read, and whose recommendations to follow. There needs to be some joined-up thinking, but currently it seems none are listening to each other.

Who might create a clear standard framework for data ethics? Chi Onwurah, the Labour Shadow Minister for Business, Energy and Industrial Strategy, recently said that the role of government is not to regulate every detail, but to set out a vision for the type of society we want, and the principles underlying that. She has also said that we need to debate those principles; once they are clarified, it makes it easier (but not necessarily easy) to have discussions about the standards we need, and how to define them and meet them practically.

Stephanie looks forward to seeing the Government’s response to the Science and Technology Committee’s report – a response which is required by law.

A suggested Code of Conduct came out in late 2016, with five principles for algorithms and their use. They are Responsibility – someone in authority to deal with anything that goes wrong, and in a timely fashion; Explainability – and the new GDPR includes a clause giving a right to explanation, about decisions that have been made about you by algorithms. (Although this is now law, but much will depend on how it is interpreted in the courts.) The remaining three principles are Accuracy, Auditability and Fairness.

So basically, we need to ask questions about the protection of people, and there have to be these points of challenge. Organisations need to ensure mechanisms of recourse, if anything does go wrong, and they should also consider liability. In a recent speakimg engagement on this topic, Stephanie was speaking to a roomful of lawyers, and to them she said, they should not see this as a way to shirk liability, but think about what will happen.

This conversation is at the moment being driven by the autonomous car industry, who are worried about insurance and insurability. When something goes wrong with an algorithm, whose fault might it be? Is it the person who asked for it to be created, and deployed it? The person who designed it? Might something have gone wrong in the Cloud that day, such that a perfectly good algorithm just didn’t work as it was supposed to? ‘People need to get to grips with these liability issues now, otherwise it will be too late, and some individual or group of individuals will get screwed over,’ said Stephanie, ‘while companies try to say that it wasn’t their fault.’

Regulation might not turn out to be the answer. If you do regulate, what do you regulate? The algorithms themselves, similar to the manner in which medicines are scrutinised by the medicines regulator? Or the use of the algorithms? Or the outcomes? Or something else entirely?

Companies like Google, Facebook, Amazon, Microsoft – have they lost the ability to be able to regulate themselves? How are companies regulating themselves? Should companies regulate themselves? Stephanie doesn’t think we can rely on that. Those are some of the questions she put to the audience.

Tamara took back the baton. She noted, we interact extensively with AI though many aspects of our lives. Many jobs that have been thought of as a human preserve, thinking jobs, may become more automated, handled by a computer or neural network. Jobs as we know them now may not be the jobs of the future. Does that mean unemployment, or just a change in the nature of work? It’s likely that in future we will be working side by side with AI on a regular basis. Already, decisions about bank loans, insurance, parole, employment increasingly rely on AI.

As humans, we are used to interacting with each other. How will we interact with non-humans? Specifically, with AI entities? Tamara referenced the famous ‘ELIZA’ experiment conducted 1964–68 by Joseph Weizenbaum, in which a computer program was written to simulate a practitioner of person-centred psychotherapy, communicating with a user via text dialogue. In response to text typed in by the user, the ELIZA program responded with a question, as if trying sympathetically to elicit further explanation or information from the user. This illustrates how we tend to project human qualities onto these non-human systems. (A wealth of other examples are given in Sherry Turkle’s 1984 book, ‘The Second Self’.)

However, sometimes machine/human interactions don’t happen so smoothly. Robotics professor Masahiro Mori studies this in the 1970s, studying people’s reaction to robots made to appear human. Many people responded to such robots with greater warmth as they were made to appear more human, but at a certain point along that transition there was an experience of unease and revulsion which he dubbed the ‘Uncanny Valley’. This is the point when something jarring about the appearance, behaviour or mode of conversation with the artificial human makes you feel uncomfortable and shatters the illusion.

‘Uncanny Valley’ research has been continued since Mori’s original work. It has significance for computer-generated on-screen avatars, and CGI characters in movies. A useful discussion of this phenomenon can be found in the Wikipedia article at https://en.wikipedia.org/wiki/Uncanny_valley

There is a Virtual Personal Assistant service for iOS devices, called ‘Fin’, which Tamara referenced (see https://www.fin.com). Combining an iOS app with a cloud-based computation service, ‘Fin’ avoids some of the risk of Uncanny Valley by interacting purely through voice command and on-screen text response. Is that how people might feel comfortable interacting with an AI? Or would people prefer something that attempts to represent a human presence?

Clare Parry remarked that she had been at an event about care robots, where you don’t get an Uncanny Valley effect because despite a broadly humanoid form, they are obviously robots. Clare also thought that although robots (including autonomous cars) might do bad things, they aren’t going to do the kind of bad things that humans do, and machines do some things better than people do. An autonomous car doesn’t get drunk or suffer from road-rage…

Tamara concluded by observing that our interactions with these systems shapes how we behave. This is not a new thing – we have always been shaped by the systems and the tools that we create. The printing press moved us from an oral/social method of sharing stories, to a more individual experience, which arguably has made us more individualistic as a society. Perhaps our interactions with AI will shape us similarly, and we should stop and think about the implications for society. Will a partnership with AI bring out the best of our humanity, or make us more machine-like?

Tamara would prefer us not to think of Artificial Intelligence as a reified machine system, but of Intelligence Augmented, shifting the focus of discussion onto how these systems can help us flourish. And who are the people that need that help the most? Can we use these systems to deal with the big problems we face, such as poverty, climate change, disease and others? How can we integrate these computational assistances to help us make the best of what makes us human?

There was so much food for thought in the lectures that everyone was happy to talk together in the final discussion and the chat over refreshments that followed.  We could campaign to say, ‘We’ve got to understand the algorithms, we’ve got to have them documented’, but perhaps there are certain kinds of AI practice (such as those involved in medical diagnosis from imaging input) where it is just not going to be possible.

From a blog by Conrad Taylor, June 2018

Some suggested reading

 

 

June 2018 Seminar: Organising Medical and Health-related Information

Summary

At this meeting held in Leeds, Ewan Davis, one of the first developers of a GP computerised health record system, discussed Electronic Health Records. This is a joint meeting with ISKO UK.

Speakers

Ewan Davis was one of the first developers of a GP computerised health record system. His background is solidly in Health Informatics and more recently he has been championing two things: the use of apps on handheld devices to support medical staff, patients and carers, and the use of open (non-proprietary) standards and information exchange formats in health informatics. Indeed, he is not long back from a launch in Plymouth by the local NHS trust of an integration system based on the OpenEHR standard – see https://en.wikipedia.org/wiki/OpenEHR. We also hope to have a second speaker on other aspects of EHR.

Time and Venue

2pm on 7th June 2018, The British Dental Association, 64 Wimpole Street, London W1G 8YS

Pre Event Information

This meeting, NetIKX’s first outside London for several years, will focus on health-related information. The main speakers will be Ewan Davis, who has pioneered Electronic Health Records (EHR) and, in particular, the relationship between clinical EHR (prepared by medical professionals), Personal Health Records (PHR), which are managed by individuals themselves, and Co-produced PHR, which is a proposal for a hybrid between these two types of record.

Slides

No slides available for this presentation

Tweets

Due to a power cut there were no tweets from this event

Blog

See our blog report: Organising Medical and Health Related Information

Study Suggestions

Our partner organisation can be found here

Titus Oates claims that there is a Catholic plot against the King’s life

May 2018 Seminar: Trust and integrity in information

Summary

The question of how we identify trustworthy sources of information formed the basis of this varied and thought-provoking seminar. Hanna Chalmers, Senior Director of IPSOS Mori, detailed the initial results of a recent poll on trust in the media. Events such as the Cambridge Analytica scandal have resulted in a general sense that trust in the media is in a state of crisis. Hanna suggested that it is more accurate to talk of trust in the media as being in flux, rather than in crisis. Dr Brennan Jacoby from Philosophy at Work, approached the topic of trust from a different angle – what do we mean by trust? The element of vulnerability is what distinguishes trust from mere reliance: when we trust, we risk being betrayed. This resulted in a fascinating discussion with practical audience suggestions.

Speakers

Hanna Chalmers is a media research expert, having worked client side at the BBC and Universal Music before moving agency side with a stint at IPG agency Initiative and joining Ipsos as a senior director in the quality team just under three years ago. Hanna works across a broad range of media and tech clients exploring areas that are often high up the political agenda. Hanna has been looking at trust in media over the last year and is delighted to be showcasing some of the most recent findings of a global survey looking at our relationship with trust and the media around the world.
Dr Brennan Jacoby is the founder of Philosophy at Work. A philosophy PhD, Brennan has spent the last 6 years helping businesses address their most important issues. While he specialises on bringing a thoughtful approach to a range of topics from resilience, communication, innovation and leadership, his PhD analysed trust, and he has written, presented and trained widely on the topic of trustworthiness and how to build trust. Recent organisations he has worked with include: Deloitte, Media Arts Lab and Viacom. Website: https://philosophyatwork.co.uk/dr-brennan-jacoby/

Time and Venue

2pm on 24th May 2018, The British Dental Association, 64 Wimpole Street, London W1G 8YS

Pre Event Information

In this new media age, the flow of information is often much faster than our ability to absorb and criticise it. This poses a whole set of problems for us individually, and in our organisations and social groupings, especially as important decisions with practical consequences are often made on the basis of our possibly ill-informed judgements. There is currently a huge interest in the area of ‘fake news’ and ‘alternative facts’ and other ‘post truth’ information disorders circulating in the traditional and social media, and it is appropriate for us as Knowledge and Information Professionals to be able to operate successfully in this increasingly difficult environment, and provide expertise in information literacy and fact-checking to bring to our workplaces.

Slides

No slides available for this presentation

Tweets

#netikx91

Blog

See our blog report: Trust and Integrity in Information

Study Suggestions

Article on Global Trust in Media: https://digiday.com/media/global-state-trust-media-5-charts/

Making true connections in a complex world – Graph database technology and Linked Open Data – 25th January 2018

Conrad Taylor writes:

The first NetIKX meeting of 2018, on 25 January, looked at new technologies and approaches to managing data and information, escaping the limitations of flat-file and relational databases. Dion Lindsay introduced the concepts behind ‘graph databases’, and David Clarke illustrated the benefits of the Linked Data approach with case studies, where the power of a graph database had been enhanced by linking to publicly available resources. The two presentations were followed by a lively discussion, which I also report here.

The New Graph Technology of Information – Dion Lindsay

dionlindsayDion is an independent consultant well known to NetIKX members. He offered us a simple introduction to graph database technology, though he avers he is no expert in the subject. He’d been feeling unclear about the differences between managing data and information, and thought one way to explore that could be to study a ‘fashionable’ topic with a bit of depth to it. He finds graph database technology exciting, and thinks data- and information-managers should be excited about it too!

Flat-file and relational database models

In the last 40 years, the management of data with computers has been dominated by the Relational Database model devised in 1970 by Edgar F Codd, an IBM employee at their San José Research Center.

FLAT FILE DATABASES. Until then (and also for some time after), the model for storing data in a computer system was the ‘Flat File Database’ — analogous to a spreadsheet with many rows and columns. Dion presented a made-up example in which each record was a row, with the attributes or values being stored in fields, which were separated by a delimiter character (he used the | sign, which is #124 in most text encoding systems such as ASCII).

Example: Lname, Fname, Age, Salary|Smith, John, 35, £280|
Doe, Jane 28, £325|Lindsay, Dion, 58, £350…

In older flat-file systems, each individual record was typically input via a manually-prepared 80-column punched card, and the ingested data was ‘tabulated’ (made into a table); but there were no explicit relationships between the separate records. The data would then be stored on magnetic tape drives, and searching through those for a specific record was a slow process.

To search such a database with any degree of speed required loading the whole assembled table into RAM, then scanning sequentially for records that matched the terms of the query; but in those early days the limited size of RAM memory meant that doing anything clever with really large databases was not possible. They were, however, effective for sequential data processing applications, such as payroll, or issuing utility bills.

IBM-2311

The IBM 2311 (debut 1964) was
an early hard drive unit with 7.25 MB storage. (Photo from Wikimedia Commons user
‘I, Deep Silence’
[Details])

HARD DISKS and RELATIONAL DATABASES. Implementing Codd’s relational database management model (RDBM) was made possible by a fast-access technology for indexed file storage, the hard disk drive, which we might call ‘pseudo-RAM’. Hard drives had been around since the late fifties (the first was a component of the IBM RAMAC mainframe, storing 3.75 MB on nearly a ton of hardware), but it always takes time for the paradigm to shift…

By 1970, mainframe computers were routinely being equipped with hard disk packs of around 100 MB (example: IBM 3330). In 1979 Oracle beat IBM to market with the first Relational Database Management System (RDBMS). Oracle still has nearly half the global market share, with competition from IBM’s DB2, Microsoft SQL Server, and a variety of open source products such as MySQL and PostgreSQL.

As Dion pointed out, it was now possible to access, retrieve and process records from a huge enterprise-level database without having to read the whole thing into RAM or even know where it was stored on the disk; the RDBMS software and the look-up tables did the job of grabbing the relevant entities from all of the tables in the system.

TABLES, ATTRIBUTES, KEYS: In Codd’s relational model, which all these RDBMS applications follow, data is stored in multiple tables, each representing a list of instances of an ‘entity type’. For example, ‘customer’ is an entity type and ‘Jane Smith’ is an instance of that; ‘product’ is an entity type and ‘litre bottle of semi-skimmed milk’ is an instance of that. In a table of customer-entities, each row will represents a different customer, and columns may associate that customer with attributes such as her address or loyalty-card number.

One of the attribute columns is used as the Primary Key to quickly access that row of the table; in a classroom, the child’s name could be used as a ‘natural’ primary key, but most often a unique and never re-used or altered artificial numerical ID code is generated (which gets around the problem of having two Jane Smiths).

Possible/permitted relationships can then be stated between all the different entity types; a list of ‘Transactions’ brings a ‘Customer’ into relationship with a particular ‘Product’, which has an ‘EAN’ code retrieved at the point of sale by scanning the barcode, and this retrieves the ‘Price’. The RDBMS can create temporary and supplementary tables to mediate these relationships efficiently.

Limitations of RDBMs, benefits of graphs

However, there are some kinds of data which RDBMSs are not good at representing, said Dion. And many of these are the sorts of thing that currently interest those who want to make good use of the ‘big data’ in their organisations. Dion noted:

  • situations in which changes in one piece of data mean that another piece of data has changed as well;
  • representation of activities and flows.

Suppose, said Dion, we take the example of money transfers between companies. Company A transfers a sum of money to Company B on a particular date; Company B later transfers parts of that money to other companies on a variety of dates. And later, Company A may transfer monies to all these entities, and some of them may later transfer funds in the other direction… (or to somewhere in the British Virgin Islands?)

Graph databases represent these dynamics with circles for entities and lines between them, to represent connections between the entities. Sometimes the lines are drawn with arrows to indicate directionality, sometimes there is none. (This use of the word ‘graph’ is not be confused with the diagrams we drew at school with x and y axes, e.g. to represent value changes over time.)

This money-transfer example goes some way towards describing why companies have been prepared to spend money on graph data technologies since about 2006 – it’s about money laundering and compliance with (or evasion of?) regulation. And it is easier to represent and explore such transfers and flows in graph technology.

Dion had recently watched a YouTube video in which an expert on such situations said that it is technically possible to represent such relationships within an RDBMS, but it is cumbersome.


NetIKX-tablegroups

Most NetIKX meetings incorporate one or two table-group
sessions to help people make sense of what they have learned. Here, people
are drawing graph data diagrams to Dion Lindsay’s suggestions.

Exercise

To get people used to thinking along graph database lines, Dion distributed a sheet of flip chart paper to each table, and big pens were found, and he asked each table group to start by drawing one circle for each person around the table, and label them.

The next part of the exercise was to create a circle for NetIKX, to which we all have a relationship (as a paid-up member or paying visitor), and also circles representing entities to which only some have a relation (such as employers or other organisations). People should then draw lines to link their own circle-entity to these others.

Dion’s previous examples had been about money-flows, and now he was asking us to draw lines to represent money-flows (i.e. if you paid to be here yourself, draw a line from you to NetIKX; but if your organisation paid, that line should go from your organisation-entity to NetIKX). I noted that aspect of the exercise engendered some confusion about the breadth of meaning that lines can carry in such a graph diagram. In fact they can represent any kind of relationship, so long as you have defined it that way, as Dion later clarified.

Dion had further possible tasks up his sleeve for us, but as time was short he drew out some interim conclusions. In graph databases, he summarised, you have connections instead of tables. These systems can manage many more complexities of relationships that either a RDBMS could cope with, or that we could cope with cognitively (and you can keep on adding complexity!). The graph database system can then show you what comes out of those complexities of relationship, which you had not been able to intuit for yourself, and this makes it a valuable discovery tool.

HOMEWORK: Dion suggested that as ‘homework’ we should take a look at an online tool and downloadable app which BP have produced to explore statistics of world energy use. The back end of this tool, Dion said, is based on a graph database.

https://www.bp.com/en/global/corporate/energy-economics/energy-charting-tool.html


Building Rich Search and Discovery: User Experiences with Linked Open Data – David Clarke

daveclarke

DAVE CLARKE is the co-founder, with Trish Yancey, of Synaptica LLC, which since 1995 has developed
enterprise-level software for building and maintaining many different types of knowledge organisation systems. Dave announced that he would talk about Linked Data applications, with some very practical illustrations of
what can be done with this approach.

The first thing to say is that Linked Data is based on an ‘RDF Graph’ — that is, a tightly-defined data structure, following norms set out in the Resource Description Framework (RDF) standards described by the World Wide Web Consortium (W3C).

In RDF, statements are made about resources, in expressions that take the form: subject – predicate – object. For example: ‘daffodil’ – ‘has the colour’ – ‘yellow’. (Also, ‘daffodil’ – ‘is a member of’ – ‘genus Narcissus’; and ‘Narcissus pseudonarcissus’ – ‘is a type of’ – ‘daffodil’.)

Such three-part statements are called ‘RDF triples’ and so the kind of database that manages them is often called an ‘RDF triple store’. The triples can also be represented graphically, in the manner that Dion had introduced us to, and can build up into a rich mass of entities and concepts linked up to each other.

Describing Linked Data and Linked Open Data

Dion had got us to do an exercise at our tables, but each table’s graph didn’t communicate with any other’s, like separate fortresses. This is the old database model, in which systems are designed not to share data. There are exceptions of course, such as when a pathology lab sends your blood test results to your GP, but those acts of sharing follow strict protocols.

Linked Data, and the resolve to be Open, are tearing down those walls. Each entity, as represented by the circles on our graphs, now gets its own ‘HTTP URI’, that is, its own unique Universal Resource Identifier, expressed with the methods of the Web’s Hypertext Transfer Protocol — in effect, it gets a ‘Web address’ and becomes discoverable on the Internet, which in turn means that connections between entities are both possible and technically fairly easy and fast to implement.

And there are readily accessible collections of these URIs. Examples include:

We are all familiar with clickable hyperlinks on Web pages – those links are what weaves the ‘classic’ Web. However, they are simple pointers from one page to another; they are one-way, and they carry no meaning other than ‘take me there!’

In contrast, Linked Data links are semantic (expressive of meaning) and they express directionality too. As noted above, the links are known in RDF-speak as ‘predicates’, and they assert factual statements about why and how two entities are related. Furthermore, the links themselves have ‘thinginess’ – they are entities too, and those are also given their own URIs, and are thus also discoverable.

People often confuse Open Data and Linked Data, but they are not the same thing. Data can be described as being Open if it is available to everyone via the Web, and has been published under a liberal open licence that allows people to re-use it. For example, if you are trying to write an article about wind power in the UK, there is text and there are tables about that on Wikipedia, and the publishing licence allows you to re-use those facts.

Stairway through the stars

Tim Berners-Lee, who invented the Web, has more recently become an advocate of the Semantic Web, writing about the idea in detail in 2005, and has argued for how it can be implemented through Linked Data. He proposes a ‘5-star’ deployment scheme for Open Data, with Linked Open Data being the starriest and best of all. Dave in his slide-set showed a graphic shaped like a five-step staircase, often used to explain this five-star system:

starsteps

The ‘five-step staircase’ diagram often used to explain the hierarchy of Open Data types

  • One Star: this is when you publish your data to the Web under open license conditions, in whatever format (hopefully one like PDF or HTML for which there is free of charge reading software). It’s publishable with minimal effort, and the reader can look at it, print it, download and store it, and share it with others. Example: a data table that has been published as PDF.
  • Two stars: this is where the data is structured and published in a format that the reader can process with software that accesses and works with those structures. The example given was a Microsoft Excel spreadsheet. If you have Excel you can perform calculations on the data and export it to other structured formats. Other two-star examples could be distributing a presentation slide set as PowerPoint, or a document as Word (though when it comes to presentational forms, there are font and other dependencies that can trip us up).
  • Three stars: this is where the structure of a data document has been preserved, but in a non-proprietary format. The example given was of an Excel spreadsheet exported as a CSV file (comma-separated values format, a text file where certain characters are given the role of indicating field boundaries, as in Dion’s example above). [Perhaps the edges of this category have been abraded by software suites such as OpenOffice and LibreOffice, which themselves use non-proprietary formats, but can open Microsoft-format files.]
  • Four stars: this is perhaps the most difficult step to explain, and is when you put the data online in a graph database format, using open standards such as Resource Description Framework (RDF), as described above. For the publisher, this is no longer such a simple process and requires thinking about structures, and new conversion and authoring processes. The advantage to the users is that the links between the entities can now be explored as a kind of extended web of facts, with semantic relationships constructed between them.
  • Five stars: this is when Linked Data graph databases, structured to RDF standards, ‘open up’ beyond the enterprise, and establish semantic links to other such open databases, of which there are increasingly many. This is Linked Open Data! (Note that a Linked Data collection held by an enterprise could be part-open and part-closed. There are often good commercial and security reasons for not going fully open.)

This hierarchy is explained in greater detail at http://5stardata.info/en/

Dave suggested that if we want to understand how many organisations currently participate in the ‘Linked Open Data Cloud’, and how they are linked, we might visit http://lod-cloud.net, where there is an interactive and zoomable SVG graphic version showing several hundred linked databases. The circles that represent them are grouped and coloured to indicate their themes and, if you hover your cursor over one circle, you will see an information box, and be able to identify the incoming and outgoing links as they flash into view. (Try it!)

The largest and most densely interlinked ‘galaxy’ in the LOD Cloud is in the Life Sciences; other substantial ones are in publishing and librarianship, linguistics, and government. One of the most central and most widely linked is DBpedia, which extracts structured data created in the process of authoring and maintaining Wikipedia articles (e.g. the structured data in the ‘infoboxes’). DBpedia is big: it stores nine and a half billion RDF triples!

LOD-interactive

Screen shot taken while zooming into the heart of the Linked Open Data Cloud (interactive version). I have positioned the cursor over ‘datos.bne.es’ for this demonstration. This brings up an information box, and lines which show links to other LOD sites: red links are ‘incoming’ and green links are ‘outgoing’.

The first case study Dave presented was an experiment conducted by his company Synaptica to enhance discovery of people in the news, and stories about them. A ready-made LOD resource they were able to use was DBpedia’s named graph of people. (Note: the Named Graphs data model is a variant on the RDF data model,: it allows RDF triples to talk about RDF graphs. This creates a level of metadata that assists searches within a graph database using the SPARQL query language).

Many search and retrieval solutions focus on indexing a collection of data and documents within an enterprise – ‘in a box’ if you like – and providing tools to rummage through that index and deliver documents that may meet the user’s needs. But what if we could also search outside the box, connecting the information inside the enterprise with sources of external knowledge?

The second goal of this Synaptica project was about what it could deliver for the user: they wanted search to answer questions, not just return a bunch of relevant electronic documents. Now, if you are setting out to answer a question, the search system has to be able to understand the question…

For the experiment, which preceded the 2016 US presidential elections, they used a reference database of about a million news articles, a subset of a much larger database made available to researchers by Signal Media (https://signalmedia.co). Associated Press loaned Synaptica their taxonomy collection, which covers more than 200,000 concepts covering names, geospatial entities, news topics etc. – a typical and rather good taxonomy scheme.

The Linked Data part was this: Synaptica linked entities in the Associated Press taxonomy out to DBpedia. If a person is famous, DBpedia will have hundreds of data points about that person. Synaptica could then build on that connection to external data.

SHOWING HOW IT WORKS. Dave went online to show a search system built with the news article database, the AP taxonomy, and a link out to the LOD cloud, specifically DBpedia’s ‘persons’ named graph. In the search box he typed ‘Obama meets Russian President’. The results displayed noted the possibility that Barack or Michelle might match ‘Obama’, but unhesitatingly identified the Russian President as ‘Vladimir Putin’ – not from a fact in the AP resource, but by checking with DBpedia.

As a second demo, he launched a query for ‘US tennis players’, then added some selection criteria (‘born in Michigan’). That is a set which includes news stories about Serena Williams, even though the news articles about Serena don’t mention Michigan or her birth-place. Again, the link was made from the LOD external resource. And Dave then narrowed the field by adding the criterion ‘after 1980’, and Serena stood alone.

It may be, noted Dave, that a knowledgeable person searching a knowledgebase, be it on the Web or not, will bring to the task much personal knowledge that they have and that others don’t. What’s exciting here is using a machine connected to the world’s published knowledge to do the same kind of connecting and filtering as a knowledgeable person can do – and across a broad range of fields of knowledge.

NATURAL LANGUAGE UNDERSTANDING. How does this actually work behind the scenes? Dave again focused on the search expressed in text as ‘US tennis players born in Michigan after 1980’. The first stage is to use Natural Language Understanding (NLU), a relative of Natural Language Processing, and long considered as one of the harder problem areas in Artificial Intelligence.

The Synaptica project uses NLU methods to parse extended phrases like this, and break them down into parts of speech and concept clusters (‘tennis players’, ‘after 1980’). Some of the semantics are conceptually inferred: in ‘US tennis players’, ‘US’ is inferred contextually to indicate nationality.

On the basis of these machine understandings, the system can then launch specific sub-queries into the graph database, and the LOD databases out there, before combining them to derive a result. For example, the ontology of DBpedia has specific parameters for birth date, birthplace, death date, place of death… These enhanced definitions can bring back the lists of qualifying entities and, via the AP taxonomy, find them in the news content database.

Use case: understanding symbolism inside art images

Dave’s second case study concerned helping art history students make searches inside images with the aid of a Linked Open Data resource, the Getty Art and Architecture Thesaurus.

A seminal work in Art History is Erwin Panofsky’s Studies in Iconology (1939), and Dave had re-read it in preparation for building this application, which is built on Panofskyan methods. Panofsky describes three levels of analysis of iconographic art images:

  • Natural analysis gives a description of the visual evidence. It operates at the level of methods of representation, and its product is an annotation of the image (as a whole, and its parts).
  • Conventional analysis (Dave prefers the term ‘conceptual analysis’) interprets the conventional meanings of visual components: the symbolism, allusions and ideas that lie behind them. This can result in semantic indexing of the image and its parts.
  • Intrinsic analysis explores the wider cultural and historical context. This can result in the production of ‘knowledge graphs’

 

earthlydelights

Detail from the left panel of Hieronymous Bosch’s painting ‘The Garden of Earthly Delights’, which is riddled with symbolic iconography.

THE ‘LINKED CANVAS’ APPLICATION.

The educational application which Synaptica built is called Linked Canvas (see http://www.linkedcanvas.org/). Their first step was to ingest the art images at high resolution. The second step was to ingest linked data ontologies such as DBpedia, Europeana, Wikidata, Getty AAT, Library of Congress Subject Headings and so on.

The software system then allows users to delineate Points of Interest (POIs), and annotate them at the natural level; the next step is the semantic indexing, which draws on the knowledge of experts and controlled vocabularies.
Finally users get  to benefit from tools
for search and exploration of the
annotated images.

With time running tight, Dave skipped straight to some live demos of examples, starting with the fiendishly complex 15th century triptych painting The Garden of Earthly Delights. At Panofsky’s level of ‘natural analysis’, we can decompose the triptych space into the left, centre and right panels. Within each panel, we can identify ‘scenes’, and analyse further into details, in a hierarchical spatial array, almost the equivalent of a detailed table of contents for a book. For example, near the bottom of the left panel there is a scene in which God introduces Eve to Adam. And within that we can identify other spatial frames and describe what they look like (for example, God’s right-hand gesture of blessing).

To explain semantic indexing, Dave selected an image painted 40 years after the Bosch — Hans Holbein the Younger’s The Ambassadors, which is in the National Gallery in London. This too is full of symbolism, much of it carried by the various objects which litter the scene, such as a lute with a broken string, a hymnal in a translation by Martin Luther, a globe, etc. To this day, the meanings carried in the painting are hotly debated amongst scholars.

If you zoom in and browse around this image in Linked Canvas, as you traverse the various artefacts that have been identified, the word-cloud on the left of the display changes contextually, and what this reveals in how the symbolic and contextual meanings of those objects and visual details have been identified in the semantic annotations.

An odd feature of this painting is the prominent inclusion in the lower foreground of an anamorphically rendered (highly distorted) skull. (It has been suggested that the painting was designed to be hung on the wall of a staircase, so that someone climbing the stairs would see the skull first of all.) The skull is a symbolic device, a reminder of death or memento mori, a common visual trope of the time. That concept of memento mori is an element within the Getty AAT thesaurus, and the concept has its own URI, which makes it connectable to the outside world.

Dave then turned to Titian’s allegorical painting Bacchus and Ariadne, also from the same period and also from the National Gallery collection, and based on a story from Ovid’s Metamorphoses. In this story, Ariadne, who had helped Theseus find his way in and out of the labyrinth where he slew the Minotaur, and who had become his lover, has been abandoned by Theseus on the island of Naxos (and in the background if you look carefully, you can see his ship sneakily making off). And then along comes the God of Wine, Bacchus, at the head of a procession of revellers and, falling in love with Ariadne at first glance, he leaps from the chariot to rescue and defend her.

Following the semantic links (via the LOD database on Iconography) can take us to other images about the tale of Ariadne on Naxos, such as a fresco from Pompeii, which shows Theseus ascending the gang-plank of his ship while Ariadne sleeps. As Dave remarked, we generate knowledge when we connect different data sets.

Another layer built on top of the Linked Canvas application was the ability to create ‘guided tours’ that walk the viewer around an image, with audio commentary. The example Dave played for us was a commentary on the art within a classical Greek drinking-bowl, explaining the conventions of the symposium (Greek drinking party). Indeed, an image can host multiple such audio commentaries, letting a visitor experience multiple interpretations.

In building this image resource, Synaptica made use of a relatively recent standard called the International Image Interoperability Framework (IIIF). This is a set of standardised application programming interfaces (APIs) for websites that aim to do clever things with images and collections of images. For example, it can be used to load images at appropriate resolutions and croppings, which is useful if you want to start with a fast-loading overview image and then zoom in. The IIIF Search API is used for searching the annotation content of images.

Searching within Linked Canvas is what Dave described as ‘Level Three Panofsky’. You might search on an abstract concept such as ‘love’, and be presented us with a range of details within a range of images, plus links to scholarly articles linked to those.

Post-Truth Forum

As a final example, Dave showed us http://www.posttruthforum.org, which is an ontology of concepts around the ideas of ‘fake news’ and the ‘post-truth’ phenomenon, with thematically organised links out to resources on the Web, in books and in journals. Built by Dave using Synaptica Graphite software, it is Dave’s private project born out of a concern about what information professionals can do as a community to stem the appalling degradation of the quality of information in the news media and social media.

For NetIKX members (and for readers of this post), going to Dave’s Post Truth Forum site is also an opportunity to experience a public Linked Open Data application. People may also want to explore Dave’s thoughts as set out on his blog, www.davidclarke.blog.

Taxonomies vs Graphs

In closing, Dave wanted to show a few example that might feed our traditional post-refreshment round-table discussions. How can we characterise the difference between a taxonomy and a data graph (or ontology)? His first image was an organisation chart, literally a regimented and hierarchical taxonomy (the US Department of Defense and armed forces).

His second image was the ‘tree of life’ diagram, the phylogenetic tree that illustrates how life forms are related to each other, and to common ancestor species. This is also a taxonomy, but with a twist. Here, every intermediate node in the tree not only inherits characteristics from higher up, but also adds new ones. So, mammals have shared characteristics (including suckling young), placental mammals add a few more, and canids such as wolves, jackals and dogs have other extra shared characteristics. (This can get confusing if you rely too much on appearances: hyenas look dog-like, but are actually more closely related to the big cats.)

So the Tree of Life captures systematic differentiation, which a taxonomy typically cannot. However, said Dave, an ontology can. In making an ontology we specify all the classes we need, and can specify the property sets as we go. And, referring back to Dion’s presentation, Dave remarked that while ontologies do not work easily in a relational database structure, they work really well in a graph database. In a graph database you can handle processes as well as things and specify the characteristics of both processes and things.

Dave’s third and final image was of the latest version of the London Underground route diagram. This is a graph, specifically a network diagram, that is characterised not by hierarchy, but by connections. Could this be described in a taxonomy? You’d have to get rid of the Circle line, because taxonomies can’t end up where they started from. With a graph, as with the Underground, you can enter from any direction, and there are all sorts of ways to make connections.

We shouldn’t think of ditching taxonomies; they are excellent for some information management jobs. Ontologies are superior in some applications, but not all. The ideal is to get them working together. It would be a good thought-experiment for the table groups to think about what, in our lives and jobs, are better suited to taxonomic approaches and what would be better served by graphs and ontologies. And, we should think about the vast amounts of data out there in the public domain, and whether our enterprises might benefit from harnessing those resources.


Discussion

Following NetIKX tradition, after a break for refreshments, people again settled down into small table groups. We asked participants to discuss what they had heard and identify either issues they thought worth raising, or thinks that they would like to know more about.

I was chairing the session, and I pointed out that even if we didn’t have time in subsequent discussion to feed everyone’s curiosity, I would do my best to research supplementary information to add to this account which you are reading.

I ran the audio recorder during the plenary discussion, so even though I was not party to what the table groups had discussed internally, I can report with some accuracy what came out of the session. Because the contributions jumped about a bit from topic to topic, I have resequenced them to make them easier for the reader to follow.

AI vs Linked Data and ontologies?

Steve Dale wondered if these efforts to compile graph databases and ontologies was worth it, as he believed Artificial Intelligence is reaching the point where a computer can be thrown all sorts of data – structured and unstructured – and left to figure it out for itself through machine learning algorithms. Later, Stuart Ward expressed a similar opinion. Speaking as a business person, not a software wizard, he wonders if there is anything that he needs to design?

Conrad, in fielding this question, mentioned that on the table he’d been on (Dave Clarke also), they had looked some more into the use in Dave’s examples of Natural Language Understanding; that is a kind of AI component. But they had also discussed the example of the Hieronymous Bosch painting. Dave himself undertook the background research for this and had to swot up by reading a score of scholarly books. In Conrad’s opinion, we would have to wait another millennium before we’d have an AI able to trace the symbolism in Bosch’s visual world. Someone else wondered how one strikes the right balance between the contributions of AI and human effort.

Later, Dave Clarke returned to the question; in his opinion, AI is heavily hyped – though if you want investment, it’s a good buzz-word to throw about! So-called Artificial Intelligence works very well in certain domains, such as pattern recognition, and even with images (example: face recognition in many cameras). But AI is appalling at semantics. At Synaptica, they believe that if you want to create applications using machine intelligence, you must structure your data. Metadata and ontologies are the enablers for smart applications.

Dion responded to Stuart’s question by saying that it would be logical at least to define what your entities are – or at least, to define what counts as an entity, so that software can identify entities and distinguish them from relationships. Conrad said that the ‘predicates’ (relationships) also need defining, and in the Linked Data model this can be assisted if you link out to publicly-available schemas.

Dave added that, these days, in the Linked Data world, it has become pretty easy to adapt your database structures as you go along. Compared to the pain and disruption of trying to modify a relational database, it is easy to add new types of data and new types of query to a Linked Data model, making the initial design process less traumatic and protracted.

Graph databases vs Linked Open Data?

Conrad asked Dave to clarify a remark he had made at table level about the capabilities of a graph database product like Neo4j, compared with Linked Open Data implementations.

Dave explained that Neo4j is indeed a graph database system, but it is not an RDF database or a Linked Data database. When Synaptica started to move from their prior focus on relational databases towards graphical databases, Dave became excited about Neo4j (at first). They got it in, and found it was a wonderfully easy system to develop with. However, because its method of data modelling is not based on RDF, Neo4j was not going to be a solution for working with Linked Data; and so fervently did Dave believe that the future is about sharing knowledge, he pulled the plug on their Neo4j development.

He added that he has no particular axe to grind about which RDF database they should use, but it has to be RDF-conforming. There are both proprietary systems (from Oracle, IBM DB2, OntoText GraphDB, MarkLogic) and open-source systems (3store, ARC2, Apache Jena, RDFLib). He has found that the open-source systems can get you so far, but for large-scale implementations one generally has to dip into the coffers and buy a licence for something heavyweight.

Even if your organisation has no intention to publish data, designing and building as Linked Data lets you support smart data and machine reasoning, and benefit from data imported from Linked Open Data external resources.

Conrad asked Dion to say more about his experiences with graph databases. He said that he had approached Tableau, who had provided him with sample software and sample datasets. He hadn’t yet had a change to engage with them, but would be very happy to report back on what he learns.

Privacy and data protection

Clare Parry raised issues of privacy and data protection. You may have information in your own dataset that does not give much information about people, and you may be compliant with all the data protection legislation. However, if you pull in data from other datasets, and combine them, you could end up inferring quite a lot more information about an individual.

(I suppose the answer here is to do with controlling which kinds of datasets are allowed to be open. We are on all manner of databases, sometimes without suspecting it. A motor car’s registration details are held by DVLA, and Transport for London; the police and TfL use ANPR technology to tie vehicles to locations; our banks have details of our debit card transactions and, if we use those cards to pay for bus journeys, that also geolocates us. These are examples of datasets that by ‘triangulation’ could identify more about us than we would like.)

URI, URL, URN

Graham Robertson reported that on his table they discussed what the difference is between URLs and URIs…

(If I may attempt an explanation: the wider term is URI, Uniform Resource Identifier. It is ‘uniform’ because everybody is supposed to use it the same way, and it is supposed uniquely and unambiguously to identify anything which might be called a ‘resource’. The Uniform Resource Locator (URL) is the most common sub-type of URI, which says where a resource can be found on the Web.

But there can be other kinds of resource identifiers: the URN (Uniform Resource Name) identifies a resource that can be referenced within a controlled namespace. Wikipedia gives as an example ISBN 0-486-27557-4, which refers to a specific edition of Shakespeare’s Romeo and Juliet. In the MeSH schema of medical subject headings, the code D004617 refers to ‘embolism’.)

Trustworthiness

Some people had discussed the issue of the trustworthiness of external data sources to which one might link – Wikipedia (and WikiData and DBpedia) among them, and Conrad later asked Mandy  to say more about this. She wondered about the wisdom of relying on data which you can’t verify, and which may have been crowdsourced. But Dave has pointed out that you might have alternative authorities that you can point to. Conrad thought that for some serious applications one would want to consult experts, which is how the Getty AAT has been built up. Knowing provenance, added David Penfold, is very important.

The librarians ask: ontologies vs taxonomies?

Rob Rosset’s table was awash with librarians, who tend to have an understanding about what is a taxonomy and what an ontology. How did Dave Clarke see this, he asked?

Dave referred back to his closing three slides. The organisational chart he had shown is a strict hierarchy, and that is how taxonomies are structured. The diagram of the Tree of Life is an interesting hybrid, because it is both taxonomic and ontological in nature. There are things that mammals have in common, related characteristics, which are different from what other groupings such as reptiles would have.

But we shouldn’t think about abandoning taxonomy in favour of ontology. There will be times where you want to explore things top-down (taxonomically), and other cases where you might want to explore things from different directions.

What is nice about Linked Data is that it is built on standards that support these things. In the W3C world, there is the SKOS standard, Simple Knowledge Organization Systems, very light and simple, and there to help you build a taxonomy. And then there is OWL, the Web Ontology Language, which will help you ascend to another level of specificity. And in fact, SKOS itself is an ontology.

Closing thoughts and resources

This afternoon was a useful and lively introduction to the overlapping concepts of Graph Databases and Linked Data, and I hope that the above account helps refresh the memories of those who attended, and engage the minds of those who didn’t. Please note that in writing this I have ‘smuggled in’ additionally-researched explanations and examples, to help clarify matters.

Later in the year, NetIKX is planning a meeting all about Ontologies, which will be a way to look at these information and knowledge management approaches from a different direction. Readers may also like to read my illustrated account of a lecture on Ontologies and the Semantic Web, which was given by Professor Ian Horrocks to a British Computer Society audience in 2005. That is still available as a PDF from http://www.conradiator.com/resources/pdf/Horrocks_needham2005.pdf

Ontologies, taxonomies and knowledge organisation systems are meat and drink to the UK Chapter of the International Society for Knowledge Organization (ISKO UK), and in September 2010 ISKO UK held a full day conference on Linked Data: the future of knowledge organization on the Web. There were nine speakers and a closing panel session, and the audio recordings are all available on the ISKO UK Web site, at http://www.iskouk.org/content/linked-data-future-knowledge-organization-web

Recently, the Neo4j team produced a book by Ian Robinson, Jim Webber and Emil Eifrem called ‘Graph Databases’, and it is available for free (PDF, Kindle etc) from https://neo4j.com/graph-databases-book/ Or you can get it published in dead-tree form from O’Reilly Books. See https://www.amazon.co.uk/Graph-Databases-Ian-Robinson/dp/1449356265

Hillsborough : Information Work and Helping People – July 21 2015

Jan Parry , CILIP’s President, gave a talk to NetIKX at the British Dental Association on the Hillsborough Disaster in 1989 and her role with the Hillsborough Independent Panel set up in 2009 to oversee the release of documents arising from the tragedy in which 96 people lost their lives at an FA Cup semi-final between Liverpool and Nottingham Forest held at Hillsborough, the home ground of Sheffield Wednesday FC. It was a very thought provoking talk which was received in near silence. Jan began by outlining the previous signals of potential disaster that had occurred in the 1980’s when serious ‘crushing’ incidents took place in the “pens” – standing areas in front of the West Stand accessed by gates on Lepping Lanes. She then talked about the day in question where Liverpool fans arrived late after being delayed by roadworks on the M62, traffic flowed along Leppings Lane until 38 minutes before the kick off and there was no managed queues at the turnstiles. The “pens”  were full 10 minutes before the match started. There was a lack of signs and stewarding to direct fans to other standing areas. At 3:00pm  crowds were still outside the turnstiles and the police Chief Superintendant in charge – who had been appointed to oversee policing on the day a little before the semi-final event itself –  gave an order to open the gates. There was a rush of fans towards the “pens” – people at the front were pushed forward, crushing and fatalities took place quickly. At 3:06pm the game was stopped. Then there was a Police Control Box Meeting at 3:15pm. The gymnasium became a temporary mortuary and witness statements started to be taken.

Official investigations began – Lord Justice Taylor (1990); West Midlands Police investigated South Yorks Police (1990); The Inquest (1990); Lord Justice Smith Scrutiny (1998). On the 20th anniversary memorial Andy Burnham (then a government minister) called for the early release of all documents. The Hillsborough Independent Panel was set up. Jan’s role was to undertake research and families disclosure :- oversee document discovery; manage information; consult the families. It began with finding family information – there were 3 established groups of families and all the other families as well.

There were lots of issues. Significantly, there had been a big impact on the mental health of the families involved in the tragedy. Also, regarding documents – that is, getting hold of them, it needed real persuasion to obtain them. Following on from that the documents had to be scanned, digitised, catalogued and redacted on a secure system. This called for researchers with medical knowledge too. What came out of this great exercise ?

In essence, the last valid safety certificate for the football stadium was issued in 1979; the code word for a “major incident” was never used; there was poor communication between ALL agencies; there was minimal medical treatment at the ground; witness statements had been changed; information on “The Sun’s” notorious leading article was obtained. Having achieved so much a disclosure day was put in the calendar – 12th September 2012. Again, the families were put first and informed that 41 victims could have lived.

On Disclosure Day itself PM David Cameron publicly apologised for the tragedy. The report was put on the website. Note that this website is a permanent archive for the documents : http://hillsborough.independent.gov.uk Disclosure had quite an impact – Sir Norman Bettinson (Chief Constable of South York at the time of the tragedy) resigned; the original inquests were quashed. Now there are new inquests and inquiries. Lord Justice Golding started a new Inquest in March 2014. There is an IPCC investigation and a Police investigation into misconduct or criminal behaviour by police officers post-tragedy. Coroners Rules 1984 have been tightened up regarding consistency of classes of documents. Police Force records have been put under legislative control. Crucially, for the families and Information Professionals records discovery and information management delivered the truth.

Jan showed a couple of video clips during her talk these are available from the Report pages online but you need to scroll down to the bottom of the page :

http://hillsborough.independent.gov.uk/report/main-section/part-1/page-4/

http://hillsborough.independent.gov.uk/report/main-section/part-1/page-7/

 

 

 

 

 

 

 

Rob Rosset

 

 

 

 

Seek and you will find? Wednesday 18th March 2015

We had two excellent speakers for our Seminar on 18th March, entitled “Search and you will find?” Karen Blakeman and Tony Hirst. The question mark in the title was deliberate, since the underlying message was that search and discovery might sometimes throw up the unexpected.

Learning objectives for the day were:

  • To understand the commercial, social and regulatory influences that have (or will) influence Google search engine results.
  • To be able to apply new search behaviours that will improve accuracy and relevance of search results.
  • An appreciation of data mining and data discovery techniques and the risks involved in using them, as well as the education and skills required for their disciplined and ethical use

Karen Blakeman delivered an informative and thought-provoking talk about our possibly misplaced reliance on Google search results. She discussed how Google is undergoing major changes in the way it analyses our searches and presents results, which are influenced by what we’ve searched for previously and information pulled from our social media circles. She also covered how EU regulations are dictating what the likes of Google can and cannot display in their results.

Amongst many examples that Karen gave of imperfect search results, this one of Henry VIII’s wives stood out – note the image of Jane Seymour, where Google has sourced the image of the actress Jane Seymour.

Blog image re Jane Seymour

This is an obvious and easily spotted error, others are far subtler, and probably go unnoticed by the vast majority of search users. The problem, as Karen explained, is that Google does not always provide attribution for where it is sourcing its results, and where attribution is provided, the user must (or should) decide whether this is a reliable or authoritative source. Users beware if searching for medical or allergy symptoms; the sources can be arbitrary and not necessarily from authoritative medical websites. It would appear that Google’s algorithms decide what is scientific fact and what is aggregated opinion!

The clear message was to use Google as a filter to point us to likely answers to our queries, but to apply more detailed analysis of the search results before assuming the information is correct.

Karen’s slides are available at:  http://www.rba.co.uk/as/

Tony Hirst gave us an introduction into the world of data analytics and data visualisation and challenges of abstracting meaning from large datasets. Techniques such as data mining and knowledge discovery in databases (KDD) use machine learning and powerful statistics to help us discover new insights from ever-larger datasets. Tony gave us an insight into some of the analytical techniques and the risks associated with using them. In particular, if we leave decision making up to machines and the algorithms inside them, are we introducing new forms of bias that human decision makers might avoid? What do we, as practitioners need to know in order to use these tools in a responsible way?

As Tony explained, the most effective data analysis comes down to discovering relationships and patterns that would otherwise be missed by looking at just one dataset in isolation, or analysing data in ranked lists.  Multifaceted data analysis, using – for example – datasets applied to maps, can give unique visualisations and more insightful sense making.

Amongst many other techniques, Tony discussed Concordance Correlation, Lexical Dispersion, Partial (Fuzzy) String Matching and Anscombe’s Quartet.

Tony’s slides will be available at: http://www.slideshare.net/psychemedia

Following the keynote presentations from Karen and Tony, the following questions were put to the delegates:

  • How can organisations ensure their staff is using (external) search engines effectively?
  • How do you determine the value of search in terms of accuracy, time, and cost?
  • If I wanted to know how to use data visualisation and data analysis tools, where do I go? Who do I ask?

 

The delegates moved into three groups to discuss and respond to these questions (one group per question). The plenary feedback as follows:

Group 1 – How can organisations ensure their staff is using (external) search engines effectively?

  • Ban them from using Google
  • More training
  • Employ specialists to do research
  • Use subscription services
  • Change the educations system.

Group 2 – How do you determine the value of search in terms of accuracy, time, and cost?

  • Cost and Time are variable
  • Accuracy is the most important criterion
  • Differentiate between “value” and “cost”

Group 3 – If I wanted to know how to use data visualisation and data analysis tools, where do I go? Who do I ask?

Lastly, we’d like to thank our speakers and the delegates for making this such an interesting, educational and engaging seminar.

Karen Blakeman (@karenblakeman) is an independent consultant providing a wide range of organisations with training, help and advice on how to search more effectively, how to use social and collaborative tools for research, and how to assess and manage information. Prior to setting up her own company Karen worked in the pharmaceutical and healthcare industry, and for the international management consultancy group Strategic Planning Associates. Her website is at www.rba.co.uk <http://www.rba.co.uk/> and her blog at www.rba.co.uk/wordpress/<http://www.rba.co.uk/wordpress/>.

Tony Hirst (@psychemedia) is a lecturer in the Department of Computing and Communications at the Open University, where he has authored course material on Artificial Intelligence and Robotics, Information Skills, Data Analysis and Visualisation, and a Data Storyteller with the Open Knowledge School of Data. An open data advocate and Formula One data junkie, he blogs regularly on matters relating to social network analysis, data visualisation, open education and open data policy at blog.ouseful.info

Steve Dale
20/03/15

 

 

 

 

Business Information Review is seeking a new editor

Business Information Review is seeking a new editor to replace Val Skelton and Sandra Ward from the end of March/Early April next Year. They will have completed five years of editing by then – and they think it’s time to hand over what is fun, exciting and challenging! Due to the decision of Val Skelton and Sandra Ward to complete their joint editorship of Business Information Review in March/April 2015, Sage Publications would like to find replacement editor(s). Val and Sandra have job shared the editorship. Details of the post, which is remunerated, and how to apply for it can be found at : http://bir.sagepub.com/site/includefiles/BIR%20Call%20for%20Editor%28s%29.pdf

Val and Sandra are happy to answer queries about the post. Contact :

 

Communities of Practice for the Post Recession Environment Tuesday 16th September 2014

35 people attended this Event at the British Dental Association in Wimpole Street. Our speaker was Dion Lindsay of Dion Lindsay Consulting : http://www.linkedin.com/pub/dion-lindsay/3/832/920 . Dion tackled big questions in his presentation. Are the principles established for successful Communities of Practice (CoP’s) in the 1990’s and earlier still sound today ? AND what new principles and good practices are emerging as social media and other channels of communication become part of the operational infrastructure that we all inhabit ? Dion started of with a couple of definitions. He explained the characteristics of CoP’s. In essence it begins with ‘practice’. Practitioners who discuss and post about practical problems. Practitioners who suggest solutions and develop practice. These solutions are at the practical level. Hence, competence at individual and corporate level is increased.  It continues with collaboration – the development of competence in an environment short of money ! He instanced the Motor Neurone Disease Association (MNDA) where he had developed an electronic discussion board in the 1990’s. In 1998 this electronic discussion board was taken over by University College London (UCL) and became an electronic discussion forum. It had cumulated 40,0000 posts. An analysis showed that the forum splits 80% moral support and 20% problem solving in terms of posts.

How about Communities of Interest (CoI’s) ? These are all about people who share an identity. They have a shared voice and conduct a shared activity. So ‘identity’ is a critical characteristic Also, there is an ongoing discussion about interests, an ongoing organisation of events and an interest in problems and solutions. This can take place in the workplace or in the public arena. Now to differentiate CoP’s from CoI’s. CoP’s get most attention in the workplace. CoI’s – there most serious work is detached from the workplace. There is a dearth of literature on this.

Success factors for CoP’s :  A successful CoP must be a physical community / A successful CoP must not have management setting the Agenda / To be successful CoP’s must have recognisable outcomes / Treat CoP discussions as conversations. Just taking the recognisable outcomes aspect it is necessary to emphasise that ‘the knowledge as it is created must be communicated’. In @ 2005 Shell and MNDA () reported similar findings in creating a Knowledge Base from CoP outcomes :  Cost :- 20% (30%). Value :- 85% (90%). Compare to standard  Knowledge Base stats : Cost :- 80% (70%). Value :- 15% (10%). These figures speak for themselves.  So we can sum up the reasons for a revival in interest for CoP’s as follows : Cost pressure on training and formal means of development in the workplace / collaboration and social media are accustoming organisations to non-structured working / the need to find ways of keeping employees engaged / technology for discussion forums is less of a challenge.

Dion concluded his talk by saying that ‘you really have to want  to do it’ to run a successful CoP. There is a benefit in commencing. There must be proper facilitation. There must be adherence to best management practice. A CoP is, in reality, a ‘Community of Commitment’. It fits in very well indeed with project management.

Graham Robertson – a NetIKX ManCom Member – then gave a brief history of NetIKX going back many, many years to when it started up at Aslib. Lissi Corfield – another NetIKX ManCom Member – spoke about our current ideas at NetIKX to take things forward as people are not coming along to meetings as frequently as they used to do. She talked about building resources in Information Management and Knowledge Management on the website and publicising and, indeed, interacting with our group on LinkedIN. Both Graham and Lissi are practitioners in Knowledge Management.

Under Lissi’s supervision we then broke up and started syndicate sessions at the close of which each syndicate reported back to the meeting. The main points are highlighted below.

Syndicate 1 : How to gain management support for CoP’s – the fears and successes.

 

  • Fear may be seen as presenting formal advice.
  • Encourage openness with no anonymity.
  • Resource of sharing policy together.
  • Each table is its own CoP.

Syndicate 2 : How do you become involved in existing CoP’s ? Should you bother ?

  • Senior actors are already connected.
  • Impose / grow organically.
  • Cross organisation / grows out of a need.
  • Can we learn from Quality Circles ?

Syndicate 3 : What is a good moderator ?

  • Challenging
  • Active/passive
  • Online/in person
  • CoP/CoI
  • Ground rules
  • FAQ’s/steering friendly discussion
  • Energy
  • LinkedIN

Syndicate 4 : Developing IM and KM resources for the NetIKX website

Valuable contributions were made by David Penfold, Martin Newman and Conrad Taylor.

Robert Rosset input suggestions of individuals and organisations from whom NetIKX had learned on the WIKI page of the website.  Rather like potter’s clay it needs to be worked into shape. An ounce of practice is worth a ton of theory.

Rob Rosset 22/09/15