This is the second part of a three part Appetizer on linked library data. If you are new to linked data please take a couple of minutes to read part 1. In the first part of this appetizer I introduced you to some of the concepts behind linked data. In part 2 we will look at some of the web standards that make linked data work:
- Model data using RDF
- Identify everything using a uniform resource identifier (URI)
- Use a common web format such as XML, JSON etc.
Model data using RDF
In part 1 we talked about using structured data so that machines can make use of it. We identified several pieces of information about Margaret Atwood that I can turn into statements:
- Margaret Atwood is a person
- Margaret Atwood’s name is “Margaret Atwood”
- Margaret Atwood was born 19391118
- Margaret Atwood was born in Ottawa, Ontario
- Margaret Atwood is a novelist
- Margaret Atwood wrote “The Handmaid’s Tale”
If you look closely at the statements I have created about Margaret Atwood you can break them up into 3 parts:
- Margaret Atwood (the resource we are interested in)
- Was born in (the relationship between the resource and something else)
- Ottawa, Ontario (the thing that describes the resource)
In the model behind linked data, RDF, this statement is called a triple. Each triple, or statement, is made up of 3 pieces: subject — predicate–object. The subject is the resource we are focused on, the object is what we are saying about the resource, or our description of the resource, and the predicate explains the relationship between the resource and what we want to say about it.
RDF Data as a Web of Data
The statements about a particular subject, Margaret Atwood, can be visualized as a web of information about that resource. The statements all relate through the subject Margaret Atwood but are otherwise independent of each other unlike in a database where several pieces of information about a resource are commonly collected together in a record, for example in a MARC authority record.
The neat thing about this web of information is that an object in the Margaret Atwood web, for example Ottawa, can be the subject of another web in your data set.
For example, we may know that:
- Ottawa is the capital of Canada
- Ottawa has a population of 883, 391
- Ottawa is the birthplace of Bruce Cockburn
By joining the object Ottawa from the data around Margaret Atwood to the resource Ottawa as a subject in its own right we can follow different paths to learn more about a resource. For example, we now know that Margaret Atwood was born in the capital of Canada and we gain some information about other artists who were born in the same place.
This is nice within your own data set; you can pivot on your information from different perspectives. However, the point of linked data is to connect, or allow the potential for connection, with other data sets. For example, DBpedia makes connections to data sets such as GeoNames and the New York Times linked data pages which can allow mash-ups of descriptions about resources. How does a machine make these connections?
Identify Everything with a URI
A linked data web is not just a web of documents it is a web of things. That is, we don’t just have a URL for a page about Margaret Atwood, we also need to identify Margaret Atwood herself as a resource. We identify resources by giving them a uniform resource identifier (URI) which is generally in the form of an HTTP URL so that a machine can go to the location and find out more about the resource, such as the resource type and relationships to other resources. Ideally the URL returns information in RDF. Using the HTTP standard and a standard RDF model allows a program to find connections to other related data without needing to know the specifics of many different application programming interfaces (APIs).
In the statement “Margaret Atwood has birth place Ottawa” we can represent each part of the statement with a URI in DBpedia (in this example I have used the human-readable URLs):
- http://dbpedia.org/page/Margaret_Atwood for Margaret Atwood
- http://dbpedia.org/ontology/birthPlace for birth place
- http://dbpedia.org/page/Ottawa for Ottawa
In some cases it doesn’t make sense to create an identifier for a piece of information, for example a birth date or a title. However, you can specify information to let the program know date format or language of the title.
While there are some formal differences between schemas, vocabularies and ontologies, I’m going to use the term vocabulary for simplicity. In the example above we stayed within the DBpedia vocabulary but in RDF you can mix and match vocabularies in one statement. In fact it is encouraged to use common vocabularies. For example, when linking the DBpedia resource “Ottawa” to the GeoNames resource “Ottawa” DBpedia uses a property from the Web Ontology Language (OWL): owl:sameAs. This is a really commonly used relationship to link data sets together:
Ottawa (DBpedia) is the same as Ottawa (GeoNames)
http://dbpedia.org/page/Ottawa — http://www.w3.org/2002/07/owl#sameAs — http://sws.geonames.org/6094817/
Similarly rather than defining in a DBpedia what a “homepage” is DBpedia simply uses the Friend of a Friend (foaf) vocabulary which has already created a property for homepage.
Using a Standard Format
Finally you need to publish these RDF statements in a format that is commonly used on the web. In order for programs to use these statements about things they need to be serialized in a way that a program can understand. There are many different ways to serialize RDF including XML, Turtle, JSON, and JSON-LD. Any of these formats should lead to the same triples. If you are on a linked data site, such as a DBpedia page or the New York Times linked data pages, look for a link to different serializations or try entering .rdf at the end of the URL.
Here is an abridged version of the N3/Turtle serialization of the DBpedia resource Margaret Atwood. I’ve emphasized a couple of areas for comment:
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix ns5: <http://data.nytimes.com/> .
ns5:N10507958644473712303 owl:sameAs dbpedia:Margaret_Atwood .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns16: <http://en.wikipedia.org/wiki/> .
ns16:Margaret_Atwood foaf:primaryTopic dbpedia:Margaret_Atwood .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
dbpedia:Margaret_Atwood rdf:type dbpedia-owl:Person ,
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
dbpedia:Margaret_Atwood rdfs:label “Margaret Atwood”@en ;
dbpedia:Margaret_Atwood dbpedia-owl:birthDate ”1939-11-18″^^xsd:date ;
dbpedia-owl:birthPlace dbpedia:Ottawa ;
dbpprop:placeOfBirth ”Ottawa, Ontario, Canada”@en ;
dbpprop:website <http://margaretatwood.ca/> ;
foaf:surname ”Atwood”@en ;
In the bold areas you can see that the New York Times resource “N10507958644473712303” is the same as the DBpedia resource Margaret_Atwood and that the DBpedia resource Margaret_Atwood has the type “Person.”
The web is a messy place and not all data can be nicely formatted and interconnected but in the next appetizer on linked data I will talk about some of the ways linked data can be used by libraries and why libraries may want to publish their own data to the web.
Tim Berners-Lee (2009) Design Issues. Linked Data (open access)
Tim Berners-Lee (2009) The next web. A TED talk, February 2009. (open access)
Karen Coyle. Understanding the semantic web: bibliographic data and metadata. Chicago: American Library Association, 2010 (Library Technology reports ; v. 46, no. 1) (subscription required)
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. (open access)
LinkedDataTools.com (2009) Introducing Linked Data and the Semantic Web (open access)
W3C Working Group (2014) RDF 1.1 Primer.
Slide images are from a presentation I did for a University of Waterloo IST Friday morning seminar in December 2013.
Thank you to Dan Scott, Corey Harper and MJ Suhonos who all answered questions for me related to this post! I appreciate everyone’s patience. Thanks also to Dan for help with proofreading.
This content is published under the Attribution-Noncommercial-Share Alike 3.0 Unported license.