Preserving the Web of Linked Data

Miel Vander Sande, Ghent Universityimec

MEPDaW @ ESWC, 3 June 2018

Preserving a
Web of Linked Data

Lessons and challenges from a fading Web

Miel Vander Sande

Ghent Universityimec

There are many sides
to preservation.

Web of Linked Data?

We are loosing thousands of Alexandria libraries each day

We have lost so much of the early Web history, just as we have lost so much of early Human history.

Kalev H. Leetaru - University of Illinois

The forces of decay

Digital Preservation Business Case Toolkit http://wiki.dpconline.org/

Link Rot

Illustration by the Project Twins

Content Drift

Significant change in content within a 3-Month Period

Preserving a Web of Linked Data

  1. Yesterday: Web archiving strategies
  2. Today: Tools for a Web of Linked Data
  3. Tomorrow: Things to keep in mind

Preserving a Web of Linked Data

  1. Yesterday: Web archiving strategies
  2. Today: Tools for a Web of Linked Data
  3. Tomorrow: Things to keep in mind

Strategies

Snapshot

Web archive

See: Open Wayback

Versioning systems

See: MediaWiki

Transactional

See: SiteStory apache plugin

If a representation
changes and nobody is
around to see it,
should it be archived?

Notification-based

Memento: travelling to the Web of the Past

https://tools.ietf.org/html/rfc7089

Preserving a Web of Linked Data

  1. Yesterday: Web archiving strategies
  2. Today: Tools for a Web of Linked Data
  3. Tomorrow: Things to keep in mind

Archive or Archiving?

Linked Data archiving as the product

Technical
(Increasingly) Popular research tracks.

Linked Data archiving as the process

Technical, as well as Infrastructural & Societal.
Rather unknown territory (but there are technologies).

What assumptions are there about data evolution?

Decay becomes more complex

Study these issues within Linked Data

Archiving for the
Reproducibility of Query results

Federated querying is highly affected

How to shape a decentralized Quality of Service?

The Hyperlink is the simplest form of decentralization,
which we are already failing to preserve.

Persistent Identification

Figure by Herbert Van de Sompel

Persistent Identification

Possibly replacing one potential Link rot problem by another

Who are you to tell me my URI is not persistent?

ISWC Resources track:

Consensus on and trust in persistence in a decentralized Web:
community-driven? standardization? blockchain,...?

Robust links


	<a href="B"
	data-versionurl="URL of snapshot of B"
	data-versiondate="datetime of snapshot of B">

http://robustlinks.mementoweb.org/spec/

Robust Links

Open Annotation
& Memento vocab

Can be linked
to PROV

Figure by Herbert Van de Sompel

Open challenges with Memento

Real-time data
HTTP Datetime format is per second
Parallel truths
No solution for accessing Versioned Data

Who will be responsible for archiving?

Snapshot

Versioning systems

Web
MediaWiki
RDF
Storage: Dydra, Virtuoso, ...
Memento-supported publishing: DBpedia Wayback machine, Linked Data Fragments Server

Hybrid: Snapshot + Versioning

Discrete snapshots + index for continuous versions

Linked Data pages
Tailr, ...
Triple Patterns
Ostrich (offset-enabled), ...

Web archive

Transactional

Notification-based

Preserving a Web of Linked Data

  1. Yesterday: Web archiving strategies
  2. Today: tools for a Web of Linked Data
  3. Tomorrow: things to keep in mind

Data archiving intrests more than curators & activists

For instance, Data driven journalism.

Scolary communication, cultural heritage, legal publications, community databases (Wikipedia & Wikidata)

Archivability of Linked Data

Linked Data is in essence easier to archive.

Accessibility of content to stimulate archiving.

The content in HTML+RDFa that dokieli produces is accessible (readable) without requiring any CSS or JavaScript, ie. text-browser safe. Breaking this "rule" in future development should be considered an anti-pattern (or a bug) in dokieli.

dokieli documentation, Sarven Capadisli

Choices in Linked Data interface
increase or decrease archiving.

Intelligent Server
High resource granularity
Intelligent Client
Data not as accessible
Need to participate in archiving process

Prevent mistakes from the past in standardization

Preserving a Web of Linked Data

  1. Yesterday: Web archiving strategies
  2. Today: Tools for a Web of Linked Data
  3. Tomorrow: Things to keep in mind

There are many sides
to preservation.

We don't start from scratch,
many technologies are there.

Start covering the uncovered sides.

Add archiving to the discussion.

Preserving a Web of Linked Data

Lessons and challenges from a fading Web

Miel Vander Sande

Ghent Universityimec