Ad

SPARQL Query To Get Only Results With The Most Recent Date

- 1 answer

I am learning basics of SPARQL with recent RDF-database released by the Finnish Ministry of Justice. It contains Finnish law data.

There are statutes, which have versions, which have a date and topics. I want to get the most recent versions that have a "gun" topic. So, I wrote this:

PREFIX sfl: <http://data.finlex.fi/schema/sfl/>
PREFIX eli: <http://data.europa.eu/eli/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?stat ?vers ?dv 
WHERE { 
   ?stat rdf:type sfl:Statute .
   ?stat sfl:hasVersion ?vers .
   ?vers eli:version_date ?dv .
   ?vers eli:is_about ?top .
   ?top skos:prefLabel "Ase"@fi .

 } ORDER BY DESC(?dv)

This returns four lines, with three statutes, one statute twice. This is because that statute has two versions, older and current. The two other statutes have only one version.

How do I get rid of the older version so I get only statutes with the most recent version? I tried using something like (MAX(?dv) AS ?ndv) and grouping by ?stat and ?vers, but this doesn't work, as there are four distinct versions.

EDIT: Let me add a mock example of what happens.

The result of the original query looks like this:

stat | vers | dv
 a   | abc  |  x
 a   | cde  |  y(<x)
 b   | foo  |  z
 c   | fot  |  u

We see that statute "a" has two versions, "abc" and "cde" and the dv of version "abc" is later that dv of version "cde". The other two statutes "b" and "c" have only one version each, with dvs of "z" and "u".

The property of having topic "gun" is a property of vers. All the versions returned have that topic.

What I want to get is this:

stat | vers | dv
 a   | abc  |  x
 b   | foo  |  z
 c   | fot  |  u

In other words, I wish to get, for each statute, only the version with the highest or latest dv value.

PS. You are welcome to test this at http://yasgui.org/ Just type the query and you get the result.

Ad

Answer

You can do this using a subselect, as scotthenninger's answer, but you could also just use a filter to make sure that each result doesn't have another possible result that would be more recent. In you query, that would just mean adding:

filter not exists {
  ?stat sfl:hasVersion/eli:version_date ?dv2
  filter (?dv2 > ?dv)
}

The idea is just to keep only those result rows which have a version such that there is not another version of the same statute with a more recent date. This approach is a bit more flexible in that it doesn't require a "single max-value" that you can retrieve via a subselect; it will let you keep results based on arbitrary criteria, as long as you can express them in SPARQL.

I used a property path in ?stat sfl:hasVersion/eli:version_date ?dv2 instead of the longer ?stat sfl:hasVersion ?vers2 . ?vers2 eli:version_date ?dv2 because it's a bit shorter and we don't really care about the value of ?vers2 here. Here's what the query as a whole now looks like:

PREFIX sfl: <http://data.finlex.fi/schema/sfl/>
PREFIX eli: <http://data.europa.eu/eli/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?stat ?vers ?dv 
WHERE { 
   ?stat rdf:type sfl:Statute .
   ?stat sfl:hasVersion ?vers .
   ?vers eli:version_date ?dv .
   ?vers eli:is_about ?top .
   ?top skos:prefLabel "Ase"@fi .
   filter not exists {
      ?stat sfl:hasVersion/eli:version_date ?dv2
      filter (?dv2 > ?dv)
   }
 } ORDER BY DESC(?dv)

Query and Results

Ad
source: stackoverflow.com
Ad