TOWARDS A MORE REGULAR RDF PATH MODEL
Sergei Egorov
A path expression is constructed from binary relations defined below; every
path is, by construction, a binary relation. Path expressions can be used to check
if the described relation holds between two resources, or produce "solution sets"
(a solution is a pair of resources for which the relation holds). In this mode,
"relaxed" relations that do not bind their left or right argument return "generic"
solutions in a form of blanks, interpreted as existential assertions (cf. Prolog's
unbound variables). Alternatively, path expressions may be treated as functions,
mapping a set of resources to a set of resources (left-to-right or in reverse).
SYNTAX CONVENTIONS
u ranges over uri constants (e.g. )
b ranges over blank node constants (e.g. _:g123)
l ranges over literal constants (e.g. "Foo", "Paris"@en, "45"^^xsd:int)
n ranges over numerical/boolean constants (e.g. -5, 3.1415, true)
c ranges over all of the above (any non-qname constants)
q ranges over qname constants (e.g. foaf:Person)
x, y, z, t range over resources
f, g, h range over path expressions (binary relations between resources)
is an RDF triple with subject x, predicate y, and object z
PATH EXPRESSIONS
.=q (x, y) <=> x == y == resource denoted by q (expanded to full uri)
.=b (x, y) <=> x == y == resource denoted by b (blank node ID)
.=u (x, y) <=> x == y == resource denoted by u (full uri)
.=l (x, y) <=> x == y == literal denoted by l
.=n (x, y) <=> x == y == literal denoted by n (e.g. +5 denotes "5"^^xsd:integer)
s2o(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
o2s(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
s2p(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
p2s(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
o2p(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
p2o(f) (x, y) <=> there are z, t such that there is a triple and f(z, t)
. (x, y) <=> x == y
^ (x, y) <=> true
$ (x, y) <=> false
..f (x, y) <=> f(y, x)
f/g (x, y) <=> there is t such that f(x, t) and g(t, y)
f|g (x, y) <=> f(x, y) or g(x, y)
f&g (x, y) <=> f(x, y) and g(x, y)
[g] (x, y) <=> x == y and there is t such that g(y, t)
SOME ALGEBRAIC PROPERTIES OF PATH EXPRESSIONS
/ is associative (f/g)/h = f/(g/h)
. is neutral element for / f/. = ./f = f
$ is universal element for / f/$ = $/f = $
^ absorption ^/^ = ^
| is commutative f|g = g|f
| is associative (f|g)|h = f|(g|h)
| is idempotent f|f = f
^ is universal for | ^|f = ^
$ is neutral for | $|f = f
& is commutative f&g = g&f
& is associative (f&g)&h = f&(g&h)
& is idempotent f&f = f
$ is universal for & $&f = f
^ is neutral for & ^&f = f
| distributes over & f|(g&h) = (f|g)&(f|h)
& distributes over | f&(g|h) = (f&g)|(f&h)
absorption of | over & f|(f&g) = f
absorption of & over | f&(f|g) = f
. is reverse-neutral ..(.) = .
^ is reverse-neutral ..(^) = ^
$ is reverse-neutral ..($) = $
.. cancellation ..(..(f)) = f
.. distributes over & ..(f&g) = (..f)&(..g)
.. distributes over | ..(f|g) = (..f)|(..g)
.. over / ..(f/g) = (..g)/(..f)
[] cancellation [[f]] = [f]
[], / commutativity [f]/[g] = [g]/[f]
[], / absorption [g]/[g] = [g]
absorption of .. in [], / f/[..f] = f
definition for . [^] = .
$ is universal for [] [$] = $
.=q is reverse-neutral .=q = ..(.=q)
.=c is reverse-neutral .=c = ..(.=c)
o2s is s2o, reversed o2s(f) = ..s2o(f)
p2s is s2p, reversed p2s(f) = ..s2p(f)
o2p is p2o, reversed o2p(f) = ..p2o(f)
ABBREVIATIONS AND OPERATOR PRECEDENCE
There are six syntactic abbreviations, simplifying common use:
q is an abbreviation for s2o(.=q) -- qname moves along s2o
c is an abbreviation for .=c -- other constants filter
f=q is an abbreviation for f/.=q -- postfix filter
f=c is an abbreviation for f/.=c -- postfix filter
* is an abbreviation for s2o(.) -- moves along s2o
f[g] is an abbreviation for f/[g] -- xpath-style postfix filter
These abbreviations don't introduce any ambiguities in the
grammar and can be parsed by a precedence parser as if they were
regular expression forms. All infix operators are associative, so
they can be grouped left or right. Operator precedence is
(from lowest to highest):
| (infix)
& (infix)
/ (infix)
[] =q =c (posfix, abbreviation)
.. (prefix)
The expressions in each of the following groups produce the same
parse trees (after abbreviation substitutions):
foo:bar/*/..*
s2o(.=foo:bar)/s2o(.)/..s2o(.)
foo:bar[baz:quux]="xyzzy"
foo:bar/[baz:quux]/"xyzzy"
s2o(.=foo:bar)/[s2o(.=baz:quux)]/.="xyzzy"
Parentneses () may be used for grouping in the usual manner.
EXAMPLES
We assume the following namespace definitions:
@prefix rdf: .
@prefix rdfs: .
@prefix foaf: .
"Finding" with path expressions means producing a set of resource pairs satisfying
the expression (interpreted as a binary relation between two resources)
0) Find nothing
$
(the $ relation is false for every pair of arguments so the solution
set is empty)
1) Find everything
^
(the ^ relation is true for every pair of arguments, so, theoretically, it
should start producing all possible pairs of whatever, which may not be
even constrained by the contents of the RDF graph. In fact, ^'s definition
makes no reference to the RDF graph! In reality, the system will return a
single "generic" solution, the one that leaves both arguments unbound.
This solution "represents" every possibility in a very compact form)
2) Find all subjects, objects (i.e. return pairs)
*
(this is an abbreviation for s2o(.) which is the relation we need)
3) Find all subjects, objects related via foaf:knows predicate
foaf:knows
(this is an abbreviation for s2o(.=foaf:knows))
4) Find all objects, subjects related via foaf:knows predicate
..foaf:knows
(this is an abbreviation for ..s2o(.=foaf:knows), which is equivalent to
o2s(.=foaf:knows); returned solutions will be