TOWARDS A MORE REGULAR RDF PATH MODEL Sergei Egorov A path expression is constructed from binary relations defined below; every path is, by construction, a binary relation. Path expressions can be used to check if the described relation holds between two resources, or produce "solution sets" (a solution is a pair of resources for which the relation holds). In this mode, "relaxed" relations that do not bind their left or right argument return "generic" solutions in a form of blanks, interpreted as existential assertions (cf. Prolog's unbound variables). Alternatively, path expressions may be treated as functions, mapping a set of resources to a set of resources (left-to-right or in reverse). SYNTAX CONVENTIONS u ranges over uri constants (e.g. ) b ranges over blank node constants (e.g. _:g123) l ranges over literal constants (e.g. "Foo", "Paris"@en, "45"^^xsd:int) n ranges over numerical/boolean constants (e.g. -5, 3.1415, true) c ranges over all of the above (any non-qname constants) q ranges over qname constants (e.g. foaf:Person) x, y, z, t range over resources f, g, h range over path expressions (binary relations between resources) is an RDF triple with subject x, predicate y, and object z PATH EXPRESSIONS .=q (x, y) <=> x == y == resource denoted by q (expanded to full uri) .=b (x, y) <=> x == y == resource denoted by b (blank node ID) .=u (x, y) <=> x == y == resource denoted by u (full uri) .=l (x, y) <=> x == y == literal denoted by l .=n (x, y) <=> x == y == literal denoted by n (e.g. +5 denotes "5"^^xsd:integer) s2o(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) o2s(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) s2p(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) p2s(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) o2p(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) p2o(f) (x, y) <=> there are z, t such that there is a triple and f(z, t) . (x, y) <=> x == y ^ (x, y) <=> true $ (x, y) <=> false ..f (x, y) <=> f(y, x) f/g (x, y) <=> there is t such that f(x, t) and g(t, y) f|g (x, y) <=> f(x, y) or g(x, y) f&g (x, y) <=> f(x, y) and g(x, y) [g] (x, y) <=> x == y and there is t such that g(y, t) SOME ALGEBRAIC PROPERTIES OF PATH EXPRESSIONS / is associative (f/g)/h = f/(g/h) . is neutral element for / f/. = ./f = f $ is universal element for / f/$ = $/f = $ ^ absorption ^/^ = ^ | is commutative f|g = g|f | is associative (f|g)|h = f|(g|h) | is idempotent f|f = f ^ is universal for | ^|f = ^ $ is neutral for | $|f = f & is commutative f&g = g&f & is associative (f&g)&h = f&(g&h) & is idempotent f&f = f $ is universal for & $&f = f ^ is neutral for & ^&f = f | distributes over & f|(g&h) = (f|g)&(f|h) & distributes over | f&(g|h) = (f&g)|(f&h) absorption of | over & f|(f&g) = f absorption of & over | f&(f|g) = f . is reverse-neutral ..(.) = . ^ is reverse-neutral ..(^) = ^ $ is reverse-neutral ..($) = $ .. cancellation ..(..(f)) = f .. distributes over & ..(f&g) = (..f)&(..g) .. distributes over | ..(f|g) = (..f)|(..g) .. over / ..(f/g) = (..g)/(..f) [] cancellation [[f]] = [f] [], / commutativity [f]/[g] = [g]/[f] [], / absorption [g]/[g] = [g] absorption of .. in [], / f/[..f] = f definition for . [^] = . $ is universal for [] [$] = $ .=q is reverse-neutral .=q = ..(.=q) .=c is reverse-neutral .=c = ..(.=c) o2s is s2o, reversed o2s(f) = ..s2o(f) p2s is s2p, reversed p2s(f) = ..s2p(f) o2p is p2o, reversed o2p(f) = ..p2o(f) ABBREVIATIONS AND OPERATOR PRECEDENCE There are six syntactic abbreviations, simplifying common use: q is an abbreviation for s2o(.=q) -- qname moves along s2o c is an abbreviation for .=c -- other constants filter f=q is an abbreviation for f/.=q -- postfix filter f=c is an abbreviation for f/.=c -- postfix filter * is an abbreviation for s2o(.) -- moves along s2o f[g] is an abbreviation for f/[g] -- xpath-style postfix filter These abbreviations don't introduce any ambiguities in the grammar and can be parsed by a precedence parser as if they were regular expression forms. All infix operators are associative, so they can be grouped left or right. Operator precedence is (from lowest to highest): | (infix) & (infix) / (infix) [] =q =c (posfix, abbreviation) .. (prefix) The expressions in each of the following groups produce the same parse trees (after abbreviation substitutions): foo:bar/*/..* s2o(.=foo:bar)/s2o(.)/..s2o(.) foo:bar[baz:quux]="xyzzy" foo:bar/[baz:quux]/"xyzzy" s2o(.=foo:bar)/[s2o(.=baz:quux)]/.="xyzzy" Parentneses () may be used for grouping in the usual manner. EXAMPLES We assume the following namespace definitions: @prefix rdf: . @prefix rdfs: . @prefix foaf: . "Finding" with path expressions means producing a set of resource pairs satisfying the expression (interpreted as a binary relation between two resources) 0) Find nothing $ (the $ relation is false for every pair of arguments so the solution set is empty) 1) Find everything ^ (the ^ relation is true for every pair of arguments, so, theoretically, it should start producing all possible pairs of whatever, which may not be even constrained by the contents of the RDF graph. In fact, ^'s definition makes no reference to the RDF graph! In reality, the system will return a single "generic" solution, the one that leaves both arguments unbound. This solution "represents" every possibility in a very compact form) 2) Find all subjects, objects (i.e. return pairs) * (this is an abbreviation for s2o(.) which is the relation we need) 3) Find all subjects, objects related via foaf:knows predicate foaf:knows (this is an abbreviation for s2o(.=foaf:knows)) 4) Find all objects, subjects related via foaf:knows predicate ..foaf:knows (this is an abbreviation for ..s2o(.=foaf:knows), which is equivalent to o2s(.=foaf:knows); returned solutions will be pairs) 5) Find names of all acquaintances of Dave Beckett which have a homepage "Dave Beckett"/..foaf:name/foaf:knows[foaf:homepage]/foaf:name (we start with a literal, move "backwards" to a node for which it is the value of the foaf:name property, then go "forward" along s2o axis etc.; returned solutions will have the <"Dave Beckett", other_name> form) 6) Find names, homepages of all acquaintances of Tim Berners-Lee ..foaf:name[(foaf:knows|..foaf:knows)/foaf:name="Tim Berners-Lee"]/foaf:homepage (we start with all foaf:name objects, go along o2s axis to subjects, select only those which either know or are known by someone named Tim Berners-Lee, and go along the s2o axis to fetch their home pages. The returned solutions will have the form) 7) Find all persons with their nicks [rdf:type=foaf:Person]/foaf:nick (we start with some resource having the right type, then go along the s2o axis to fetch his/her nick. The returned solutions will have the form; persons with no nick will not be included into the solution set) 8) Find names of all persons with their nicks if they have ones ..foaf:name/[rdf:type=foaf:Person]/(foaf:nick|^) (we used ^ as an alternative to s2o(.=foaf:nick) step -- in absence of nick, the corresponding solution will have its right variable left unbound. If |* were used in place of |^, the existence of some subject along s2o axis from the person would be required, and a separate solution would be generated for each such subject) TRANSLATION INTO SPARQL (?) Any path expression can be translated into an equivalent SPARQL clause by picking up two fresh names and applying the translation function T[[]](,) to the path expression and these names. the resulting clause can be turned into a complete SPARQL query by adding suitable namespace declarations, prologue, 'SELECT ?x ?y WHERE' prefix with the names chosen on the translation step, and a suitable SPARQL epilogue. The translation function T is defined by the following rules: T[[ .=q ]](?x, ?y) = FILTER { ?x = ?y && ?x = q } T[[ .=b ]](?x, ?y) = FILTER { ?x = ?y && ?x = b } T[[ .=u ]](?x, ?y) = FILTER { ?x = ?y && ?x = u } T[[ .=l ]](?x, ?y) = FILTER { ?x = ?y && ?x = l } T[[ .=n ]](?x, ?y) = FILTER { ?x = ?y && ?x = n } T[[ s2o(f) ]](?x, ?y) = { ?x ?z ?y. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ o2s(f) ]](?x, ?y) = { ?y ?z ?x. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ s2p(f) ]](?x, ?y) = { ?x ?y ?z. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ p2s(f) ]](?x, ?y) = { ?y ?x ?z. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ o2p(f) ]](?x, ?y) = { ?z ?y ?x. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ p2o(f) ]](?x, ?y) = { ?z ?x ?y. T[[ f ]](?z, ?t) } with fresh names for ?z, ?t T[[ . ]](?x, ?y) = FILTER { ?x = ?y } T[[ ^ ]](?x, ?y) = FILTER { true } T[[ $ ]](?x, ?y) = FILTER { false } T[[ ..f ]](?x, ?y) = T[[ f ]](?y, ?x) T[[ f/g ]](?x, ?y) = T[[ f ]](?x, ?t) T[[ g ]](?t, ?y) with fresh name for ?t T[[ f|g ]](?x, ?y) = { { T[[ f ]](?x, ?y) } UNION { T[[ g ]](?x, ?y) } } T[[ f&g ]](?x, ?y) = { T[[ f ]](?x, ?y) T[[ g ]](?x, ?y) } T[[ [g] ]](?x, ?y) = { FILTER { ?x = ?y } T[[ g ]](?y, ?t) } w/ fresh name for ?t FORMAL GRAMMAR path ::= orpath orpath ::= andpath ('|' andpath)* andpath ::= segpath ('&' segpath)* segpath ::= abrpath ('/' abrpath)* abrpath ::= revpath | revpath '[' path ']' | revpath '=' constant revpath ::= parpath | '..' parpath parpath ::= minpath | '(' path ')' | '[' path ']' minpath ::= '.=' constant | '.' | '^' | '$' | funcall funcall ::= name '(' arglist? ')' arglist ::= path (',' path)* constant ::= resource | blank | literal resource ::= uriref | qname blank ::= nodeid literal ::= string | string '@' language | string '^^' resource | simple simple ::= numeric | boolean uriref ::= '<' reluri '>' qname ::= prefix? ':' name nodeid ::= '_:' name reluri ::= urichar* prefix ::= (ncchar1 - '_') ncchar* name ::= ncchar1 ncchar* string ::= '"' schar* '"' numeric ::= [+-]? (integer | double | decimal) integer ::= [0-9]+ double ::= [0-9]+ '.' [0-9]* exp | '.' [0-9]+ exp | [0-9]+ exp exp ::= [eE] [+-]? [0-9]+ decimal ::= [0-9]+ '.' [0-9]* | '.' [0-9]+ boolean ::= 'true' | 'false' language ::= [a-z]+ ('-' [a-z0-9]+)* urichar ::= ([^<>'{}|^`]-[#x00-#x20]) ncchar1 ::= [A-Za-z_] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | uchar ncchar ::= ncchar1 | [0-9-] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] schar ::= ([^#x22#x5C#xA#xD]) | echar | uchar echar ::= '\' [tbnrf\"'] uchar ::= '\' ('u' hex hex hex hex | 'U' hex hex hex hex hex hex hex hex) hex ::= [0-9A-Fa-f] IMPLEMENTATION A testbed implementation for RDF path expressions is available at http://www.lsrn.org/semweb/rdfpath.scm (runs on any R5RS Scheme). REFERENCES RDF Templating Language http://www.semanticplanet.com/2003/08/rdft/spec Versa (XPath-motivated RDF query language) http://en.wikipedia.org/wiki/Versa http://www.xml.com/pub/a/2005/07/20/versa.html RDF Path http://infomesh.net/2003/rdfpath/ N3 RDF Path http://infomesh.net/2002/notation3/#rdfpath Rio Path http://www.xulplanet.com/ndeakin/arts/reopath/rpath-templates.php Tree Hugger http://rdfweb.org/people/damian/treehugger/introduction.html RxPath http://rx4rdf.liminalzone.org/RxPath