For those of you interested in deploying RDF on the Web, I'd
like to draw your attention to three new proposed standards from
IETF, "
Web Linking ", "
Defining Well-Known URIs ", and "
Web
Host Metadata", that create new follow-your-nose tricks that
could be used by semantic web clients to obtain RDF connected to a
URI - RDF that presumably defines what the URI 'means' and/or
describes the thing that the URI is supposed to refer to.
Most semantic web application developers are probably familiar
with three ways to nose-follow from a URI:
- For # URIs - for X#F, the document X tells you about
<X#F>
- When the response to GET X is a 303 - the redirect target tells
you about <X>
- When the response to GET X is a 200 - the content may tell you
about <X>
In case 3, X refers to what I'll call a "web page" (a more
technical term is used in the TAG's
httpRange-14 resolution). One of the new RFCs extends case 3 to
situations where the RDF can't be embedded in the content, either
because the content-type doesn't provide a place to put it (e.g.
text/plain) or because for administrative reasons the content can't
be modified to include it (e.g. a web archive that has to deliver
the original bytes faithfully). The others cover this case as well
as offering improved performance in case 2.
Web pages as RDF subjects
Before getting into the new nose-following protocols, I'll
amplify case 3 above by listing a few applications of RDF in which
a web page occurs as a subject. I'll rather imprecisely call such
RDF "metadata".
- Bibliographic metadata - tools such as Zotero might be
interested in obtaining Dublin Core, BIBO, or other citation data
for the web page.
- Stability metadata - for annotation and archiving purposes it
may be useful to know whether the page's content is committed to be
stable over time (e.g.
this has changing
content versus
this
has unchanging content ). See
TimBL's
Generic Resources note.
- Historical and archival metadata - it is useful to have links
to other versions of a document - including future versions.
All sorts of other statements can be made about a web page, such
as a type (wiki page, blog post, etc.), SKOS concepts, links to
comments and reviews, duration of a recording, how to edit, who
controls it administratively, etc. Anything you might want to say
about a web page can be said in RDF.
Embedded metadata is easy to deploy and to access, and should be
used when possible. But while embedded metadata has the advantages
of traveling around with the content, a protocol that allows the
server responsible for the URI to provide metadata over a separate
"channel" has two advantages over embedded metadata: First, the
metadata doesn't have to be put into the content; and second, it
doesn't have to be parsed out of the content. And it's not
either/or: There is no reason not to provide metadata through both
channels when possible.
Link: header
The 'Web Linking'
proposed standarddefines the HTTP Link: header, which provides
a way to communicate links rooted at the requested resource. These
links can either encode interesting information directly in the
HTTP response, or provide a link to a document that packages
metadata relevant to the resource.
In the former case, one might have:
Link: <http://xmlns.com/foaf/0.1/Document>;
rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
meaning that the request URI refers to something of type
foaf:Document. In the latter case one might have:
Link: <http://example.com/about/foo.rdf>;
rel="describedby"; type=application/rdf+xml
meaning that metadata can be found in
<http://example.com/about/foo.rdf>, and
hintingthat the latter resource might have a
'representation' with media type application/rdf+xml.
Host-wide nose-following rules
The motivation for the "well-known URIs" RFC is to collect all
"well-known URIs" (analogous to "robots.txt") in a single place, a
root-level ".well-known" directory, and create a registry of them
to avoid collisions. The most pressing need comes from protocols
such as webfinger and OpenID; see
Eran Hammer-Lahav's blog postfor the whole story.
For linked data, .well-known provides an opportunity for
providing metadata for web pages, as well improving the efficiency
of obtaining RDF associated with other "slash URIs", what is
currently done using 303 responses.
Ever since the TAG's httpRange-14 decision in 2005, there have
been
concernsthat it takes two round trips to collect RDF associated
with a slash URI. While some might question why those complaining
aren't using hash URIs, in any case the "well-known URIs" mechanism
gives a way to reduce the number of round trips in many cases,
eliminating many GET/303 exchanges.
The trick is to obtain, for each host, a generic rule that will
transform the URI at that host that you want RDF for into the URI
of a document that carries that RDF. This generic rule is stored in
a file residing in the .well-known space at a path that is fixed
across all hosts. That is: to find RDF for http://example.com/foo,
follow these steps:
- obtain the host name, "example.com"
- form the URI with that host name and path
"/.well-known/host-meta", i.e.
"http://example.com/.well-known/host-meta" (see
here)
-
if not already cached,fetch the document at that URI
- in that document find a rule generically transforming
original-URI -> about-URI
- apply the rule to "http://example.com/foo" obtaining (say)
"http://example.com/about/foo"
- find RDF about "http://example.com/foo" in document
"http://example.com/about/foo"
The form of the about-URI is chosen by the particular host, e.g.
"http://example.com/foo,about" or "http://about.example.com/foo" or
whatever works best.
Why is this fewer round trips than using 303? Because you can
fetch and cache the generic rule once per site. The first use of
the rule still costs an extra round trip, but subsequent URIs for a
given site can be nose-followed without any extra web accesses.
A worked example can be found
here.
Next steps
As with any new protocol, figuring out exactly how to apply the
new proposed standards will require coordination and
consensus-building. For example, the choice of the "describedby"
link relation and "host-meta" well-known URI need to be confirmed
for linked data, and agreement reached on whether multiple Link:
headers is in good taste or poor taste. (Link: and .well-known put
interesting content in a peculiarly obscure place and it might be a
good idea to limit their use.) Consideration should be given to
Larry Masinter's suggestion to use multiple relations reflecting
different attitudes the server might have regarding the various
metadata sources: For example the server may choose to announce
that it wants the Link: metadata to override any embedded metadata,
or vice versa. Agreement should be reached on the use of Link: and
host-meta with redirects (302 and so on) - personally I think it
would be a great thing as you could then use a value-added
forwarding service to provide metadata that the target host doesn't
or can't provide.
This is not a particularly heavy coordination burden; the design
odds-and-ends and implementations are all simple. The impetus might
come from inside W3C (e.g. via SWIG) or bottom-up. All we really
need to get this going are a bit of community discussion, a server,
and a cooperating client, and if the protocols actually fill a
need, they will take off.
For past TAG work on this topic, please see
TAG
issue 62 and the "
Uniform Access to Metadata" memo.