![]() |
Memento Guide: |
About Demos Guide Tools Depot |
| This document describes how an arbitrary resource can be characterized as being a Memento, a TimeGate or an Original Resource.
It does so by listing responses to a HTTP HEAD issued against a resource and by indicating how those responses can be interpreted
to determine the resource's type.
Use this document alongside the Introduction to Memento. Feedback is welcome on the Memento Development Group list at memento-dev@googlegroups.com. |
Is this resource a Memento, a TimeGate, or an Original Resource? |
| The datetime negotiation component of the Memento framework introduces
three resource types: Original Resource (URI-R),
Memento (URI-M), and TimeGate (URI-G). However, when stumbling upon a resource (URI-Q) out there on the Web, how
does one determine which type it is? Answering that question is important, for example, when developing Memento client applications such as browser plug-ins.
|
| This document details how the resource type can be determined by means of HTTP headers
returned in response to a HTTP HEAD issued against URI-Q. It shows how determing the type is straightforward when resources comply with
the Memento specification, but may involve some heuristics when they don't. |
The following conventions are used in the remainder of this document:
|
[1] |
The resource URI-Q is a Memento for the Original Resource URI-R. |
|
|
|
The resource (URI-Q) can be unambiguously characterized as a Memento because both a Memento-Datetime
header and a HTTP Link header with a Relation Type of "original" are included
in its HTTP response headers.
The value of the Memento-Datetime header indicates the archival datetime of the resource, i.e. the datetime since when the resource has been stable and will remain to be stable. The "original" Link points at the Original Resource (URI-R) for which the current resource (URI-Q) is a Memento. Further HTTP Link relationships may be included in the response header from URI-Q, specifically "timegate", "timemap" and "memento" Links. |
|
For example, the resource http://mementoarchive.lanl.gov/ta/20100320000003/http://lanlsource.lanl.gov/hello is a Memento for the Original Resource
http://lanlsource.lanl.gov/hello. This is the HTTP response header that allows recongizing the resource as a Memento: |
[2] |
The resource URI-Q is both a Memento and an Original Resource. |
|
|
|
As per [1] above, the resource (URI-Q) can unambiguously be characterized as a Memento because it has the necessary HTTP response headers.
In this particular case, however, both an "original" and a "memento" link point at URI-Q itself, indicating that URI-Q is both a Memento and an Original Resource. This happens in cases where a resource is archived/stabilized at its original Web location and may not necessarily also be available at another Web location for archival purposes. The presence of the Memento-Datetime header entails a promise by the custodian of the resource that the resource will not undergo any changes beyond the datetime expressed as the header's value. |
|
For example, the entry page for the 2008 JCDL conference at http://jcdl2008.org/ has not changed since June 24th 2008, and will not change anymore beyond that date.
Hence, the resource is both an Original Resource and a Memento, and could return the below HTTP response header. Note that the response indicates that
this resource is also aware of another archived version
of itself that exists at http://jcdl.org/archived-conf-sites/jcdl2008/, and provides a "memento" link to it. |
[3] |
The resource URI-Q is a TimeGate for Original Resource URI-R. |
|
|
|
The resource (URI-Q) can be unambiguously characterized as a TimeGate because of the presence of a Vary header with as values both
"negotiate" and
"accept-datetime", which indicate that the resource
supports HTTP content negotiation in the datetime dimension.
The response header will typcially also include HTTP Link headers, specifically "original", "timemap", and "memento" Links. Also, additional values may be provided in the Vary header, for example, "accept" to indicate that HTTP content negotiation is available in the media type dimension too. |
|
For example, the resource http://mementoarchive.lanl.gov/ta/timegate/http://lanlsource.lanl.gov/hello is a TimeGate for the Original Resource
http://lanlsource.lanl.gov/hello. Below is the HTTP response header that allows recongizing the resource as a TimeGate. Note that the Location header
points at a Memento for the Original Resource, and that the Link header contains links to the Original Resource, to several Mementos, and to a TimeBundle. |
[4] |
The resource URI-Q is an intermediate resource that sits on the path from a TimeGate to a Memento for Original Resource URI-R. |
|
|
|
Sometimes, intermediate redirects are involved on the path from a TimeGate to a Memento for an Original Resource. This happens, for example, in Wayback archives
that have a set of URIs of Mementos (URI-Mi, i=1..n) for a given Original Resource, all of which have the same representation.
In that case, accessing a URI-Mi (i > 1) from that set will result
in a redirect to the URI-M1 of the temporally first Memento with this shared representation, as it is the only one the archive stores.
In cases like this, the redirecting response from URI-Mi (i > 1) to URI-M1 includes a HTTP Link header that contains a link with an "original" relation type that points at the Original Resource (URI-R), and it does not contain a Memento-Datetime header (as no Memento is served yet), and no Vary header (as URI-Mi is not a TimeGate). In this scenario, a client can determine it is still on a successful path towards a Memento, and can follow the redirect to the URI that is provided as the value of the Location header. |
[5] |
URI-Q is a resource that sits on the path from a TimeGate to a Memento for Original Resource URI-R, but accessing URI-Q is unsuccessful. |
|
|
|
Sometimes, error conditions occur on the path from a TimeGate to a Memento for an Original Resource. This happens, for example, when Web Archives
experience singificant load and can not respond in a timely manner to an incoming request for a Memento (URI-M).
In cases like this, the response (typically with a HTTP 504 status code) includes a HTTP Link header that contains a link with an "original" relation type that points at the Original Resource (URI-R), and it does not contain a Memento-Datetime header (as no Memento is served yet), and no Vary header (as URI-Q is not a TimeGate). In this scenario, a client can decide to make no further attempts to access URI-Q, but rather start a new Memento cycle using the URI-R of the Original Resource that is provided in the "original" link. |
[6] |
The resource URI-Q is an Original Resource or a Memento. |
|
|
|
In a perfectly Memento-ized Web, the abscence of both a Memento-Datetime header
and an "original" HTTP Link header
would be sufficient to decide that URI-Q is an Original Resource.
Unfortunately, until further notice, the Web is not perfect that way. Many Web Archives and Content Management Systems do not yet include these crucial headers for their archived/stable resources. As a result, trying to determine whether a resource, which at first glance looks like an Original Resource, might actually be a Memento necessarily entails imperfect heuristics. One way to approach the problem is by introducing a reference list for URI syntaxes used for Mementos by major Web Archives and Content Management Systems. This is rather imperfect and definitely not very scalable. But all of this sorrow goes away if Web Archives and Content Management Systems implement the Memento-Datetime and the "original" HTTP Link header. What are you waiting for? Ping that archive or CMS near to you and get them going! |
|
For example, Web Archives that use the Wayback software use fine granular datetime indicators in the URIs of their Mementos
(e.g. /20010911203610/), and MediaWiki installations consistently use the same query parameters in URIs of old resource versions
(e.g. oldid and title).
Such information, in combination with the base URI of these systems, can be included in a reference list. Then, it becomes possible to test
whether URI-Q matches one of the URI syntaxes in the reference list. If it does, URI-Q can be categorized as a Memento, and the value of its
Last-Modified header (if available) can be interpreted as the Memento's datetime,
which, in a perfect world, would be expressed in the
Memento-Datetime header.
|