This is a three part Appetizer to introduce linked library data. The appetizers are based on 2 presentations that I gave at University of Waterloo. The first to public services staff and the second to university (not library) IT staff.
- Part 1 – Introducing Linked Data (Without All the Techie Stuff!)
- Part 2 – Introducing Resource Description Framework (RDF)
- Part 3 – The Potential for Linked Data in Libraries, Archives and Museums
Why should you read this appetizer? Mentions of linked data (linked library data, linked open data etc.) are increasing at conferences, in webinars, tweets, blogs and articles and you may want to know what it refers to!
In part 1, I will introduce you to the topic, but stick to concepts rather than the technical side of how it is modelled and accomplished. I will be using (most of) the slides from “Introducing Linked Data.”
Defining Linked Data
One definition of linked data can be found on the linked data Wikipedia page.
There are a few points to highlight in this definition:
- data is being published so that it is available on the web
- this data is structured so that it is easier to use
- it is published using standard web technologies so that it is easier to use
- there is a difference between how humans consume information on the web and how computers consume information on the web
Human-readable vs. machine-actionable
Rather than use “machine-readable” library consultant Karen Coyle often uses the term “machine-actionable” data. I find this terminology more helpful, although it is the software that is acting rather than the machine. To think about the difference between human-readable and machine-actionable start by looking at this Wikipedia page for Margaret Atwood. Ask yourself, what do I know by reading the information on this page?
Some of the things that an English language reader can quickly know about Margaret Atwood are:
- Margaret Atwood is a person — she looks like a person in her picture, she has a birth date and birth place, she is an author; these are types of information we associate with being a person
- We know where and when she was born, what she does, what she has written, what her father does and that she is a voracious reader
- We assume that if we click on the “Arthur C. Clarke Award” link text it will take us to a page about that award
We can easily digest these sentences and extract meaning or information from them. This page is pretty easy to use and understand by a person who reads English.
But what about machines?
The Classic Web
(This slide is based on the classic web diagram by Eric Miller)
In the classic web a machine would have a hard time acting on the information on a web page in any meaningful way. It can follow links from one page to another but it doesn’t have any information about how those pages relate to each other and what information those pages have about a resource. A user simply clicks on the link text for the URL and the machine takes her to a new location or resource. The two resources are linked by a miscellaneous, meaningless hyper-link. The user may assume that by clicking on a link called “homepage” that she will be taken to Margaret Atwood’s homepage but the machine is simply going from one location to another without any semantic information.
A Linked Data Web
(This slide was inspired by a semantic web slide by Eric Miller)
In the linked data web we give machines more information about the relationships between things. Sometimes there is no more information, I may just have a generic hyper-link from my page about Margaret Atwood to her home page. However, Wikipedia, using its DBpedia data, might define that Margaret Atwood is a person, that this URL being linked on the Wikipedia page is the URL for her homepage, and that this other page they are linking to has Margaret Atwood as a subject.
These statements allow the machine to follow its nose –what other attributes does DBpedia know about the person Margaret Atwood? what other pages are about Margaret Atwood? what other persons are there in Wikipedia who have a relationship to Margaret Atwood? The machine can act on the structured data.
Use Structured Data
I don’t like to assume that we all know what structured data is. The way I like to show structured data is by using something that we are all familiar with to some degree – spreadsheets! Librarians love spreadsheets, right?
If you type something like 4-30-2011 into Excel it recognizes the format and automatically changes it to a date. This is because you have used a standard, well-defined format. However, you can go one step further and format a cell or row of cells yourself and say to Excel these pieces of data are all dates. Excel then knows the rules for what it can do with dates. So you have one piece of information in a cell and you have told the machine the type of information it is and because of that type the machine knows what kinds of things can be done with it, for example how to sort, how to calculate number of days etc. It knows to treat dates differently than currency and differently again than textual data. Textual data itself can be totally unstructured or it might be a value in a field, for example the title field of a book record in a database.
Those of us who create catalogue records are used to thinking about data elements such as author, title, publisher, date. The question is have they been recorded in such a way that the machine knows which piece of information is the title and which piece of information is the author and what resource these pieces of information refer to?
Identify Your Data
To be structured we should identify our data elements and when applicable use standard formats, such as date formats. We need to identify for example the type of resource, her name, her occupation and so on as separate pieces of information rather than burying that information in long paragraphs.
Publish Your Data on the Web
Once our data is identified then we should make that data available on the web so that it can be used with other data. For example, other resources about Margaret Atwood could retrieve from the Virtual International Authority File (VIAF) variant representations of her name and also a list of selected titles she has written. This is because VIAF has established a permanent identifier for the resource Margaret Atwood and has provided the associated data in multiple web-friendly formats.
When publishing our data on the web it is especially useful if we can build connections between our data and other datasets. A useful dataset to link to is DBpedia, the dataset for Wikipedia, because many others also link here; you will have linked your data to a much larger universe with one mapping!
If we look at the DBpedia page for Margaret Atwood we can see that it includes the VIAF identifier. Do a search for VIAF to find the identifier.
The Famous Linked Data Cloud
How do I know that we should link to DBpedia to link to other things? By looking at the linked data cloud! You can see the large number of connections coming in and out of DBpedia in this visualization. The library related datasets are over on the right.
Connect Your Data
So we don’t simply expose our data to the web, we make connections in various ways that will lead to other connections! For example, if we want to define that Margaret Atwood is a person we may want to use the persons class from the Friend of a Friend (FOAF) ontology; if the place of birth is Ottawa then instead of the text string “Ottawa” we may want to link to the Ottawa in the Geonames database; if the occupation is novelist we may want to link to the LCSH linked data collection; and if the person is the author of “The Handmaid’s Tale” we may want to assert that this is the same resource as in the Open Library.
Some Technical Stuff
In part 2 of this appetizer I will give you some of the technical information around linked data, for example the use of identifiers, the RDF model and some common serializations.
Library Use Cases
Part 3 of this appetizer will go into more detail about the uses of linked data for libraries, archives and museums but here are a few ideas to get you thinking. We could used open linked data to enrich our bibliographic and authority data, for example pulling in biographical information to help disambiguate authors for our users, we may be able to align subject vocabularies, and we can share our many unique library collections with other communities on the web.
While you are waiting you may want to start your own exploration. Here are just a few linked data resources:
- Colye, Karen. Understanding the semantic web: bibliographic data and metadata. Chicago: American Library Association, 2010 (Library Technology reports ; v. 46, no. 1) (subscription required)
- Harper, Corey. Library linked data: tuning library metadata for the semantic web. An ALCTS webcast, March 16. 2011. (open access)
- Berners-Lee, Tim. The next web. A TED talk, February 2009. (open access)
- Heath, Tom and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space. 1st ed. Morgan & Claypool, 2011. (Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1) (open access)
- Koster, Lukas (2011) Brief Introduction to Linked Data (open access)
Hope to see you for part 2!
I’d like to thank Corey Harper and MJ Suhonos for being patient and helpful when I have linked data questions. Also thanks to Nick Ruest and Lukas Koster who did a quick double-check of this post for me.
This content is published under the Attribution-Noncommercial-Share Alike 3.0 Unported license.