Ivan Čukić

Nepomuk: Don’t misuse

Nepomuk is a very nice shared data repository. It is an easy way to make the data from your application available to others.

But, it is important to know that Nepomuk is not a general purpose database – everything is peachy until you start using it as such. And especially if you start treating it as a relational database.

There are a few things to keep in mind when developing Nepomuk-based programs.

Working on graphs

Nepomuk is a graph database – this means that the data is not organized into tables live you’ve used to, but as nodes and connections between them.

So, any query you make is not a restriction on a relation (a table) but rather a multi-join of a single subject-verb-object table (this is a bit simplistic view).

As you probably know, doing joins is not really a cheap operation however optimized it is (and Virtuoso is one of the fastest graph databases available).

D-Bus connections

The second thing is that while Nepomuk-internal connections are done via local sockets, your connection to Nepomuk goes through D-Bus which is not the fastest kid on the block. The more requests you make, the more time it will take.

Some hints

There are some things you can make to make these issues less relevant to your applications.

Wide-table queries

One of the common ways people write queries is the following:

    select ?r where {
        ?r a something .
        ?r something else . 
        ...
    }

And then process the results one by one by doing stuff like:

    resource.getProperty(something);
    resource.getProperty(something else);
    ...

Which means your program creates a lot of queries – one main, and a couple more for each result.

Instead of making a lot of queries, it is advisable (although initially not that intuitive) to create one big query like this:

    select ?r, ?prop1, ?prop2 where {
        ?r a ?prop1 .
        ?r something ?prop2 . 
        ...
    }

It does transfer a lot of data at once, but at least it does so in a single request-response connection, and it doesn’t repeat the same query (parsing, optimizing, evaluating) multiple times for different parameters.

Consider storing some data locally

If you have data that don’t necessarily need to be shared, consider storing it in config files, embedded database like sqlite3 or similar.

This way, apart from skipping D-Bus, you can have faster queries in the cases like these (which are not at all rare):

    select ?r where {
        ?r a someType .
        ?r property1 "value1" .
        ?r property2 "value2" .
        ...
    }

This query does a number of joins, whereas its equivalent in SQL does only filtering:

    select * from someType where
        property1 = "value1" and
        property2 = "value2" and
        ...
    

Summary

So, whatever you do (be it Nepomuk or something else), first ask, experiment, test and learn the system before using it.

Loading comments...