PublicShow sourcerdf_persistency.pl -- RDF persistency plugin

This module provides persistency for rdf_db.pl based on the rdf_monitor/2 predicate to track changes to the repository. Where previous versions used autosafe of the whole database using the quick-load format of rdf_db, this version is based on a quick-load file per source (4th argument of rdf/4), and journalling for edit operations.

The result is safe, avoids frequent small changes to large files which makes synchronisation and backup expensive and avoids long disruption of the server doing the autosafe. Only loading large files disrupts service for some time.

The persistent backup of the database is realised in a directory, using a lock file to avoid corruption due to concurrent access. Each source is represented by two files, the latest snapshot and a journal. The state is restored by loading the snapshot and replaying the journal. The predicate rdf_flush_journals/1 can be used to create fresh snapshots and delete the journals.

See also
- rdf_edit.pl
To be done
- If there is a complete `.new' snapshot and no journal, we should move the .new to the plain snapshot name as a means of recovery.
- Backup of each graph using one or two files is very costly if there are many graphs. Although the currently used subdirectories avoid hitting OS limits early, this is still not ideal. Probably we should collect (small, older?) files and combine them into a single quick load file. We could call this (similar to GIT) a `pack'.
Sourcerdf_attach_db(+Directory, +Options) is det
Start persistent operations using Directory as place to store files. There are several cases:
  • Empty DB, existing directory Load the DB from the existing directory
  • Full DB, empty directory Create snapshots for all sources in directory

Options:

access(+AccessMode)
One of auto (default), read_write or read_only. Read-only access implies that the RDF store is not locked. It is read at startup and all modifications to the data are temporary. The default auto mode is read_write if the directory is writeable and the lock can be acquired. Otherwise it reverts to read_only.
concurrency(+Jobs)
Number of threads to use for loading the initial database. If not provided it is the number of CPUs as optained from the flag cpu_count.
max_open_journals(+Count)
Maximum number of journals kept open. If not provided, the default is 10. See limit_fd_pool/0.
directory_levels(+Count)
Number of levels of intermediate directories for storing the graph files. Default is 2.
silent(+BoolOrBrief)
If true (default false), do not print informational messages. Finally, if brief it will show minimal feedback.
log_nested_transactions(+Boolean)
If true, nested log transactions are added to the journal information. By default (false), no log-term is added for nested transactions.\\
Errors
- existence_error(source_sink, Directory)
- permission_error(write, directory, Directory)
Sourcerdf_persistency_property(?Property) is nondet
True if Property is a property of the current persistent database. Currently makes to options passed to rdf_attach_db/2 available. Notable rdf_persistency_property(access(read_only)) is true if the database is mounted in read-only mode. Other properties:
directory(Dir)
Directory in which the database resides.
Sourcerdf_detach_db is det
Detach from the current database. Succeeds silently if no database is attached. Normally called at the end of the program through at_halt/1.
Sourcerdf_current_db(?Dir)
True if Dir is the current RDF persistent database.
Sourcerdf_flush_journals(+Options)
Flush dirty journals. Options:
min_size(+KB)
Only flush if journal is over KB in size.
graph(+Graph)
Only flush the journal of Graph
To be done
- Provide a default for min_size?
Sourcerdf_persistency(+DB, Bool)
Specify whether a database is persistent. Switching to false kills the persistent state. Switching to true creates it.
Sourcerdf_db:property_of_graph(?Property, +Graph) is nondet[multifile]
Extend rdf_graph_property/2 with new properties.
Sourcerdf_journal_file(+Graph, -File) is semidet
rdf_journal_file(-Graph, -File) is nondet
True if File the name of the existing journal file for Graph.
Sourcerdf_snapshot_file(+Graph, -File) is semidet
rdf_snapshot_file(-Graph, -File) is nondet
True if File the name of the existing snapshot file for Graph.
Sourcerdf_db_to_file(+DB, -File) is det
rdf_db_to_file(-DB, +File) is det
Translate between database encoding (often an file or URL) and the name we store in the directory. We keep a cache for two reasons. Speed, but much more important is that the mapping of raw --> encoded provided by www_form_encode/2 is not guaranteed to be unique by the W3C standards.