The library(rdf_db) module provides several hooks for 
extending its functionality. Database updates can be monitored and acted 
upon through the features described in section 
3.4. The predicate rdf_load/2 
can be hooked to deal with different formats such as rdfturtle, 
different input sources (e.g. http) and different strategies for caching 
results.
The hooks below are used to add new RDF file formats and sources from which to load data to the library. They are used by the modules described below and distributed with the package. Please examine the source-code if you want to add new formats or locations.
library(library(semweb/turtle))library(library(semweb/rdf_zlib_plugin))library(library(semweb/rdf_http_plugin))library(library(http/http_ssl_plugin))library(library(semweb/rdf_http_plugin)) 
to load RDF from HTTPS servers.library(library(semweb/rdf_persistency))library(library(semweb/rdf_cache))file(+Name),
stream(+Stream) or url(Protocol, URL). If this 
hook succeeds, the RDF will be read from Stream using rdf_load_stream/3. 
Otherwise the default open functionality for file and stream are used.xml.owl. Format is either a built-in format (xml 
or triples) or a format understood by the rdf_load_stream/3 
hook.
This 
module uses the library(zlib) library to load compressed 
files on the fly. The extension of the file must be .gz. 
The file format is deduced by the extension after stripping the .gz 
extension. E.g. rdf_load('file.rdf.gz’.
This module allows for rdf_load('http://...’. 
It exploits the library library(http/http_open.pl). The 
format of the URL is determined from the mime-type returned by the 
server if this is one of
text/rdf+xml, application/x-turtle or
application/turtle. As RDF mime-types are not yet widely 
supported, the plugin uses the extension of the URL if the claimed 
mime-type is not one of the above. In addition, it recognises
text/html and application/xhtml+xml, scanning 
the XML content for embedded RDF.
The library library(semweb/rdf_cache) defines the 
caching strategy for triples sources. When using large RDF sources, 
caching triples greatly speedup loading RDF documents. The cache library 
implements two caching strategies that are controlled by rdf_set_cache_options/1.
Local caching This approach applies to files only. Triples are 
cached in a sub-directory of the directory holding the source. This 
directory is called .cache (_cache on 
Windows). If the cache option create_local_directory is true, 
a cache directory is created if possible.
Global caching This approach applies to all sources, except 
for unnamed streams. Triples are cached in directory defined by the 
cache option global_directory.
When loading an RDF file, the system scans the configured cache files 
unless cache(false) is specified as option to rdf_load/2 
or caching is disabled. If caching is enabled but no cache exists, the 
system will try to create a cache file. First it will try to do this 
locally. On failure it will try to configured global cache.
enabled(Boolean) If true, caching is 
enabled.local_directory(Name). Plain name of local directory. 
Default .cache (_cache on Windows).create_local_directory(Bool) If true, try 
to create local cache directoriesglobal_directory(Dir) Writeable directory for storing 
cached parsed files.create_global_directory(Bool) If true, try 
to create the global cache directory.read, it returns the name of an existing file. If write 
it returns where a new cache file can be overwritten or created.
The library library(semweb/rdf_litindex.pl) exploits the 
primitives of section 4.5.1 and the 
NLP package to provide indexing on words inside literal constants. It 
also allows for fuzzy matching using stemming and‘sounds-like’based 
on the double metaphone algorithm of the NLP package.
sounds(Like, 
Words), stem(Like, Words) or prefix(Prefix, 
Words). On compound expressions, only combinations that provide 
literals are returned. Below is an example after loading the ULAN2Unified 
List of Artist Names from the Getty Foundation. database 
and showing all words that sounds like‘rembrandt’and appear 
together in a literal with the word‘Rijn’. Finding this 
result from the 228,710 literals contained in ULAN requires 0.54 
milliseconds (AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L).
L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt',
                        'Rembrand', 'Rembrandt', 'Rembrandtsz',
                        'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens). On failure it 
calls tokenize_atom/2 
from the NLP package and deletes the following: atoms of length 1, 
floats, integers that are out of range and the English words and, an, or, of,
on, in, this and the. 
Deletion first calls the hook rdf_litindex:exclude_from_index(token, 
X). This hook is called as follows:
no_index_token(X) :-
        exclude_from_index(token, X), !.
no_index_token(X) :-
        ...
‘Literal maps’provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 4.5. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2 
on the channels new_literal and old_literal to 
maintain an index of words that appear in a literal. Further abstraction 
using Porter stemming or Metaphone can be used to create additional 
search indices. These can map either directly to the literal values, or 
indirectly to the plain word-map. The SWI-Prolog NLP package provides 
complimentary building blocks, such as a tokenizer, Porter stem and 
Double Metaphone.
rdf_litindex.pl.not(Key). If not-terms 
are provided, there must be at least one positive keywords. The 
negations are tested after establishing the positive matches.
The library(semweb/rdf_persistency) 
provides reliable persistent storage for the RDF data. The store uses a 
directory with files for each source (see rdf_source/1) 
present in the database. Each source is represented by two files, one in 
binary format (see rdf_save_db/2) 
representing the base state and one represented as Prolog terms 
representing the changes made since the base state. The latter is called 
the journal.
cpu_count 
or 1 (one) on systems where this number is unknown. See also concurrent/3.true, suppress loading messages from rdf_attach_db/2.true, nested log transactions are added to the 
journal information. By default (false), no log-term is 
added for nested transactions.
The database is locked against concurrent access using a file
lock in Directory. An attempt to attach to a 
locked database raises a permission_error exception. The 
error context contains a term rdf_locked(Args), where args 
is a list containing time(Stamp) and pid(PID). 
The error can be caught by the application. Otherwise it prints:
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
false, the 
journal and snapshot for the database are deleted and further changes to 
triples associated with DB are not recorded. If Bool 
is true a snapshot is created for the current state and 
further modifications are monitored. Switching persistency does not 
affect the triples in the in-memory RDF database.min_size(KB) only 
journals larger than KB Kbytes are merged with the base 
state. Flushing a journal takes the following steps, ensuring a stable 
state can be recovered at any moment.
.new..new file over the base 
state.Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
The above predicates suffice for most applications. The predicates in 
this section provide access to the journal files and the base state 
files and are intended to provide additional services, such as reasoning 
about the journals, loaded files, etc.3A 
library library(rdf_history) is under development 
exploiting these features supporting wiki style editing of RDF.
Using rdf_transaction(Goal, log(Message)), we can add 
additional records to enrich the journal of affected databases with Term 
and some additional bookkeeping information. Such a transaction adds a 
term
begin(Id, Nest, Time, Message) before the change operations 
on each affected database and end(Id, Nest, Affected) after 
the change operations. Here is an example call and content of the 
journal file mydb.jrn. A full explanation of the terms that 
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, []). end([time(1183540578)]).
Using rdf_transaction(Goal, log(Message, DB)), where DB 
is an atom denoting a (possibly empty) named graph, the system 
guarantees that a non-empty transaction will leave a possibly empty 
transaction record in DB. This feature assumes named graphs are named 
after the user making the changes. If a user action does not affect the 
user's graph, such as deleting a triple from another graph, we still 
find record of all actions performed by some user in the journal of that 
user.
time(Stamp).time(Stamp).log(Message). Id is an 
integer counting the logged transactions to this database. Numbers are 
increasing and designed for binary search within the journal file.
Nest is the nesting level, where‘0’is a toplevel 
transaction.
Time is a time-stamp, currently using float notation with two 
fractional digits. Message is the term provided by the user 
as argument of the log(Message) transaction.log(Message). Id and Nest 
match the begin-term. Others gives a list of other databases 
affected by this transaction and the Id of these records. The 
terms in this list have the format DB:Id..trp for the base state and .jrn for the 
journal.