This library provides high-performance C-based primitives for 
manipulating URIs. We decided for a C-based implementation for the much 
better performance on raw character manipulation. Notably, URI handling 
primitives are used in time-critical parts of RDF processing. This 
implementation is based on RFC-3986:
http://labs.apache.org/webarch/uri/rfc/rfc3986.html
The URI processing in this library is rather liberal. That is, we 
break URIs according to the rules, but we do not validate that the 
components are valid. Also, percent-decoding for IRIs is liberal. It 
first tries UTF-8; then ISO-Latin-1 and finally accepts %-characters 
verbatim.
Earlier experience has shown that strict enforcement of the URI 
syntax results in many errors that are accepted by many other 
web-document processing tools.
- [det]uri_components(+URI, 
-Components)
- [det]uri_components(-URI, 
+Components)
- Break a URI into its 5 basic components according to the 
RFC-3986 regular expression:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9 
| Components | is a term uri_components(Scheme, Authority, Path, Search, Fragment). 
If a URI is parsed, i.e., using mode (+,-), components 
that are not found are left uninstantiated (variable). See uri_data/3 
for accessing this structure. |  
 
- [semidet]uri_data(?Field, 
+Components, ?Data)
- Provide access the uri_component structure. Defined field-names are: scheme,authority,path,searchandfragment
- [semidet]uri_data(+Field, 
+Components, +Data, -NewComponents)
- NewComponents is the same as Components with Field 
set to Data.
- [det]uri_normalized(+URI, 
-NormalizedURI)
- NormalizedURI is the normalized form of URI. 
Normalization is syntactic and involves the following steps:
 
- 6.2.2.1. Case Normalization
- 6.2.2.2. Percent-Encoding Normalization
- 6.2.2.3. Path Segment Normalization
 
- [det]iri_normalized(+IRI, 
-NormalizedIRI)
- NormalizedIRI is the normalized form of IRI. 
Normalization is syntactic and involves the following steps:
 
- 6.2.2.1. Case Normalization
- 6.2.2.3. Path Segment Normalization
 
- See also
- This is similar to uri_normalized/2, 
but does not do normalization of %-escapes.
 
- [det]uri_normalized_iri(+URI, 
-NormalizedIRI)
- As uri_normalized/2, but 
percent-encoding is translated into IRI Unicode characters. The 
translation is liberal: valid UTF-8 sequences of %-encoded bytes are 
mapped to the Unicode character. Other %XX-sequences are mapped to the 
corresponding ISO-Latin-1 character and sole % characters are left 
untouched.
- See also
- uri_iri/2.
 
- [semidet]uri_is_global(+URI)
- True if URI has a scheme. The semantics is the same as the 
code below, but the implementation is more efficient as it does not need 
to parse the other components, nor needs to bind the scheme. The 
condition to demand a scheme of more than one character is added to 
avoid confusion with DOS path names.
uri_is_global(URI) :-
        uri_components(URI, Components),
        uri_data(scheme, Components, Scheme),
        nonvar(Scheme),
        atom_length(Scheme, Len),
        Len > 1.
- [det]uri_resolve(+URI, 
+Base, -GlobalURI)
- Resolve a possibly local URI relative to Base. 
This implements
http://labs.apache.org/webarch/uri/rfc/rfc3986.html\#relative-transform
- [det]uri_normalized(+URI, 
+Base, -NormalizedGlobalURI)
- NormalizedGlobalURI is the normalized global version of URI. 
Behaves as if defined by:
uri_normalized(URI, Base, NormalizedGlobalURI) :-
        uri_resolve(URI, Base, GlobalURI),
        uri_normalized(GlobalURI, NormalizedGlobalURI).
- [det]iri_normalized(+IRI, 
+Base, -NormalizedGlobalIRI)
- NormalizedGlobalIRI is the normalized global version of IRI. 
This is similar to uri_normalized/3, 
but does not do %-escape normalization.
- [det]uri_normalized_iri(+URI, 
+Base, -NormalizedGlobalIRI)
- NormalizedGlobalIRI is the normalized global IRI of URI. 
Behaves as if defined by:
uri_normalized(URI, Base, NormalizedGlobalIRI) :-
        uri_resolve(URI, Base, GlobalURI),
        uri_normalized_iri(GlobalURI, NormalizedGlobalIRI).
- [det]uri_query_components(+String, 
-Query)
- [det]uri_query_components(-String, 
+Query)
- Perform encoding and decoding of an URI query string. Query 
is a list of fully decoded (Unicode) Name=Value pairs. In mode (-,+), 
query elements of the forms Name(Value) and Name-Value are also accepted 
to enhance interoperability with the option and pairs libraries. E.g.
?- uri_query_components(QS, [a=b, c('d+w'), n-'VU Amsterdam']).
QS = 'a=b&c=d%2Bw&n=VU%20Amsterdam'.
?- uri_query_components('a=b&c=d%2Bw&n=VU%20Amsterdam', Q).
Q = [a=b, c='d+w', n='VU Amsterdam'].
- [det]uri_authority_components(+Authority, 
-Components)
- [det]uri_authority_components(-Authority, 
+Components)
- Break-down the authority component of a URI. The fields of the structure Components 
can be accessed using uri_authority_data/3. 
This predicate deals with IPv6 addresses written as [ip], 
returning the ip ashost, without the enclosing[]. 
When constructing an authority string and the host contains:, 
the host is embraced in[]. If[]is not used 
correctly, the behavior should be considered poorly defined. If there is 
no balancing‘]` or the host part does not end with‘]`, these 
characters are considered normal characters and part of the (invalid) 
host name.
- [semidet]uri_authority_data(+Field, 
?Components, ?Data)
- Provide access the uri_authority structure. Defined field-names are: user,password,hostandport
- [det]uri_encoded(+Component, 
+Value, -Encoded)
- [det]uri_encoded(+Component, 
-Value, +Encoded)
- Encoded is the URI encoding for Value. When 
encoding (Value->Encoded), Component 
specifies the URI component where the value is used. It is one ofquery_value,fragment,pathorsegment. Besides alphanumerical characters, the following 
characters are passed verbatim (the set is split in logical groups 
according to RFC3986).
- query_value, fragment
- "-._~"|"!$’()*,;"|"@"|"/?"
- path
- "-._~"|"!$&’()*,;="|"@"|"/"
- segment
- "-._~"|"!$&’()*,;="|"@"
 
- [det]uri_iri(+URI, 
-IRI)
- [det]uri_iri(-URI, 
+IRI)
- Convert between a URI, encoded in US-ASCII and an IRI. 
An IRI is a fully expanded Unicode string. Unicode strings 
are first encoded into UTF-8, after which %-encoding takes place.
- Errors
- syntax_error(Culprit)in mode (+,-) if URI is 
not a legally percent-encoded UTF-8 string.
 
- [semidet]uri_file_name(+URI, 
-FileName)
- [det]uri_file_name(-URI, 
+FileName)
- Convert between a URI and a local file_name. This protocol is 
covered by RFC 1738. Please note that file-URIs use absolute 
paths. The mode (-, +) translates a possible relative path into an 
absolute one.
- [det]uri_edit(+Actions, 
+URI0, -URI)
- Modify a URI according to Actions. Actions 
is either a single action or a (nested) list of actions. Defined 
primitive actions are:
- scheme(+Scheme)
- Set the Scheme of the URI (typically http,https, 
etc.)
- user(+User)
- Add/set the user of the authority component.
- password(+Password)
- Add/set the password of the authority component.
- host(+Host)
- Add/set the host (or ip address) of the authority component.
- port(+Port)
- Add/set the port of the authority component.
- path(+Path)
- Set/extend the pathcomponent. If Path is not 
absolute it is taken relative to the path of URI0.
- search(+KeyValues)
- Extend the Key=Valuepairs of the current search (query) 
component. New values replace existing values. If KeyValues 
is written as =(KeyValues) the current search component is 
ignored. KeyValues is a list, whose elements are one ofKey=Value,Key-Valueor‘Key(Value)`.
- fragment(+Fragment)
- Set the Fragment of the uri.
 
Components can be removed by using a variable as value, except 
from pathwhich can be reset usingpath(/)and 
query which can be dropped usingquery(=([])).
 
| URI0 | is either a valid uri or a variable to 
start fresh. |