API documentation for SSAX-SXML

On the index page, the primary high-level API functions are presented:


Obtaining the SXML document from XML (HTML) by URI
f: sxml:document
High-level functions called by `sxml:document'
f: ssax:xml->sxml
f: html->sxml
f: open-input-resource

SXPath: XPath for SXML
f: txpath
f: sxpath

SXML Transformations
STX: Scheme-enabled XSLT Processor
f: stx:make-stx-stylesheet
f: stx:transform-dynamic
Pre-post-order transformations
f: pre-post-order

XPathLink: a Query Language with XLink support
f: xlink:documents
f: sxpath/c

SXML modifications
f: sxml:modify
f: sxml:modify!

DDO SXPath: the Polynomial-Time XPath Implementation
f: ddo:sxpath

Lazy SXML processing
f: lazy:xml->sxml
f: lazy:sxpath
Lower-level infrastructure for lazy SXML processing
f: lazy:result->list
f: lazy:node->sxml

SXML Serialization
f: srl:sxml->xml
f: srl:sxml->xml-noindent
f: srl:sxml->html
f: srl:sxml->html-noindent
Parameterizing the SXML serializer
f: srl:parameterizable

Obtaining the SXML document from XML (HTML) by URI

sxml:document

(define (sxml:document req-uri . namespace-prefix-assig)
... Full Code ... )
 procedure sxml:document :: REQ-URI [NAMESPACE-PREFIX-ASSIG] ->
                             -> SXML-TREE

 Obtain a [possibly, remote] document by its URI
 Supported URI formats:  local file and HTTP schema
 Supported document formats:  XML and HTML

 REQ-URI - a string that contains the URI of the requested document
 NAMESPACE-PREFIX-ASSIG - is passed as-is to the SSAX parser: there it is
  used for assigning certain user prefixes to certain namespaces.
  NAMESPACE-PREFIX-ASSIG is an optional argument and has an effect for an
  XML resource only. For an HTML resource requested, NAMESPACE-PREFIX-ASSIG
  is silently ignored.

 Result: the SXML representation for the requested document



High-level functions called by `sxml:document'

ssax:xml->sxml

(define (ssax:xml->sxml port namespace-prefix-assig)
... Full Code ... )
 procedure: ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG

 This is an instance of a SSAX parser that returns an SXML
 representation of the XML document to be read from PORT.
 NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING)
 that assigns USER-PREFIXes to certain namespaces identified by
 particular URI-STRINGs. It may be an empty list.
 The procedure returns an SXML tree. The port points out to the
 first character after the root element.


html->sxml

(define html->sxml
... Full Code ... )


open-input-resource

(define (open-input-resource req-uri)
... Full Code ... )
 Opens an input port for a resource
  REQ-URI - a string representing a URI of the resource
 An input port is returned if there were no errors. In case of an error,
 the function returns #f and displays an error message as a side effect.
 Doesn't raise any exceptions.



SXPath: XPath for SXML

txpath

(define txpath
... Full Code ... )
  xpath-string - an XPath location path (a string)
  ns-binding - declared namespace prefixes (an optional argument)
  ns-binding = (list  (prefix . uri)
                      (prefix . uri)
                      ...)
  prefix - a symbol
  uri - a string

 The returned result:   (lambda (node . var-binding) ...)  
                   or   #f
  #f - signals of a parse error (error message is printed as a side effect
 during parsing)
  (lambda (node . var-binding) ...)  - an SXPath function
  node - a node (or a node-set) of the SXML document
  var-binding - XPath variable bindings (an optional argument)
  var-binding = (list  (var-name . value)
                       (var-name . value)
                       ...)
  var-name - (a symbol) a name of a variable
  value - its value. The value can have the following type: boolean, number,
 string, nodeset. NOTE: a node must be represented as a singleton nodeset
 
 Administrative SXPath variables:
  *root* - if presented in the 'var-binding', its value (a node or a nodeset)
 specifies the root of the SXML document


sxpath

(define (sxpath path . ns-binding)
... Full Code ... )
 Evaluate an abbreviated SXPath
	sxpath:: AbbrPath -> Converter, or
	sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
 AbbrPath is a list. It is translated to the full SXPath according
 to the following rewriting rules
 (sxpath '()) -> (node-join)
 (sxpath '(path-component ...)) ->
		(node-join (sxpath1 path-component) (sxpath '(...)))
 (sxpath1 '//) -> (sxml:descendant-or-self sxml:node?)
 (sxpath1 '(equal? x)) -> (select-kids (node-equal? x))
 (sxpath1 '(eq? x))    -> (select-kids (node-eq? x))
 (sxpath1 '(*or* ...))  -> (select-kids (ntype-names??
                                          (cdr '(*or* ...))))
 (sxpath1 '(*not* ...)) -> (select-kids (sxml:complement 
                                         (ntype-names??
                                          (cdr '(*not* ...)))))
 (sxpath1 '(ns-id:* x)) -> (select-kids 
                                      (ntype-namespace-id?? x))
 (sxpath1 ?symbol)     -> (select-kids (ntype?? ?symbol))
 (sxpath1 ?string)     -> (txpath ?string)
 (sxpath1 procedure)   -> procedure
 (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...))
 (sxpath1 '(path reducer ...)) ->
		(node-reduce (sxpath path) (sxpathr reducer) ...)
 (sxpathr number)      -> (node-pos number)
 (sxpathr path-filter) -> (filter (sxpath path-filter))



SXML Transformations


STX: Scheme-enabled XSLT Processor

stx:make-stx-stylesheet

(define (stx:make-stx-stylesheet stx-tree)
... Full Code ... )
 Generates an stx:stylesheet from a stylesheet represented as <stx-tree>
 in SXML format


stx:transform-dynamic

(define (stx:transform-dynamic doc sst-sxml)
... Full Code ... )
 transformate given SXML document <doc> using stylesheet <sst> in SXML
 format



Pre-post-order transformations

pre-post-order

(define (pre-post-order tree bindings)
... Full Code ... )
 procedure: pre-post-order TREE BINDINGS

	          Traversal of an SXML tree or a grove:
			a <Node> or a <Nodelist>

 A <Node> and a <Nodelist> are mutually-recursive datatypes that
 underlie the SXML tree:
	<Node> ::= (name . <Nodelist>) | "text string"
 An (ordered) set of nodes is just a list of the constituent nodes:
 	<Nodelist> ::= (<Node> ...)
 Nodelists, and Nodes other than text strings are both lists. A
 <Nodelist> however is either an empty list, or a list whose head is
 not a symbol (an atom in general). A symbol at the head of a node is
 either an XML name (in which case it's a tag of an XML element), or
 an administrative name such as '@'.
 See SXPath.scm and SSAX.scm for more information on SXML.


 Pre-Post-order traversal of a tree and creation of a new tree:
	pre-post-order:: <tree> x <bindings> -> <new-tree>
 where
 <bindings> ::= (<binding> ...)
 <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
               (<trigger-symbol> *macro* . <handler>) |
		(<trigger-symbol> <new-bindings> . <handler>) |
		(<trigger-symbol> . <handler>)
 <trigger-symbol> ::= XMLname | *text* | *default*
 <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>

 The pre-post-order function visits the nodes and nodelists
 pre-post-order (depth-first).  For each <Node> of the form (name
 <Node> ...) it looks up an association with the given 'name' among
 its <bindings>. If failed, pre-post-order tries to locate a
 *default* binding. It's an error if the latter attempt fails as
 well.  Having found a binding, the pre-post-order function first
 checks to see if the binding is of the form
	(<trigger-symbol> *preorder* . <handler>)
 If it is, the handler is 'applied' to the current node. Otherwise,
 the pre-post-order function first calls itself recursively for each
 child of the current node, with <new-bindings> prepended to the
 <bindings> in effect. The result of these calls is passed to the
 <handler> (along with the head of the current <Node>). To be more
 precise, the handler is _applied_ to the head of the current node
 and its processed children. The result of the handler, which should
 also be a <tree>, replaces the current <Node>. If the current <Node>
 is a text string or other atom, a special binding with a symbol
 *text* is looked up.

 A binding can also be of a form
	(<trigger-symbol> *macro* . <handler>)
 This is equivalent to *preorder* described above. However, the result
 is re-processed again, with the current stylesheet.



XPathLink: a Query Language with XLink support

xlink:documents

(define xlink:documents
... Full Code ... )
 procedure xlink:documents :: {REQ-URI}+  -> (listof SXML-TREE)
 procedure xlink:documents-embed :: {REQ-URI}+  -> (listof SXML-TREE)

 Both `xlink:documents' and `xlink:documents-embed' accept one or more
 strings as their arguments. Each string supplied denotes the URI of the
 requested document to be loaded. The requested document(s) are loaded
 and are represented in SXML. All XLink links declared in these document(s)
 are represented as a set of SXLink arcs. If any XLink links refer to XLink
 linkbases [<a href="http://www.w3.org/TR/xlink/#xlg">XLink</a>],
 these linkbases are additionally loaded, for additional SXLink arcs
 declared there.

 The starting resource for each SXLink arc is determined:
 1. For each SXML document loaded, the function `xlink:document' adds all
    SXLink arcs whose starting resource is located within this document, to
    the auxiliary list of its document node (*TOP*).
 2. The function 'xlink:documents-embed' embeds each SXLink arc into its
    starting resource-node, via auxiliary list of that node. For text nodes
    serving for starting resources, their SXLink arcs are stored in the
    auxiliary list of the document node (*TOP*), since SXML text nodes do
    not support their own auxiliary lists.

 Supported URI formats:
  + local file
  + http:// schema

 Supported document formats: XML and HTML. In the case of HTML,
 <A> hyperlinks are considered as XLink simple links.

 Result: (listof SXML-TREE)
 A particular SXML document can be located in this list using the
 function `xlink:find-doc'.


sxpath/c

(define sxpath/c
... Full Code ... )
 xpath-string - an XPath location path (a string)
 ns+na - can contain 'ns-binding' and/or 'num-ancestors' and/or none of them
 ns-binding - declared namespace prefixes (an optional argument)
  ns-binding ::= (listof (prefix . uri))
  prefix - a symbol
  uri - a string
 num-ancestors - number of ancestors required for resulting nodeset. Can
  generally be omitted and is than defaulted to 0, which denotes a _usual_
  nodeset. If a negative number, this signals that all ancestors should be
  remembered in the context

 Returns: (lambda (nodeset position+size var-binding) ...)
 position+size - the same to what was called 'context' in TXPath-1
 var-binding - XPath variable bindings (an optional argument)
  var-binding = (listof (var-name . value))
  var-name - (a symbol) a name of a variable
  value - its value. The value can have the following type: boolean, number,
  string, nodeset. NOTE: a node must be represented as a singleton nodeset



SXML modifications

sxml:modify

(define (sxml:modify . update-specifiers)
... Full Code ... )
 update-specifiers ::= (listof  update-specifier)
 update-specifier ::= (list  xpath-location-path  action  [action-parametes])
 xpath-location-path - addresses the node(s) to be transformed, in the form of
  XPath location path. If the location path is absolute, it addresses the
  node(s) with respect to the root of the document being transformed. If the
  location path is relative, it addresses the node(s) with respect to the
  node selected by the previous update-specifier. The location path in the
  first update-specifier always addresses the node(s) with respect to the
  root of the document. We'll further refer to the node with respect of which
  the location path is evaluated as to the base-node for this location path.
 action - specifies the modification to be made over each of the node(s)
  addressed by the location path. Possible actions are described below.
 action-parameters - additional parameters supplied for the action. The number
  of parameters and their semantics depend on the definite action.

 action ::= 'delete | 'delete-undeep |
            'insert-into | 'insert-following | 'insert-preceding |
            'replace |
            'move-into | 'move-following | 'move-preceding |
            handler
 'delete - deletes the node. Expects no action-parameters
 'delete-undeep - deletes the node, but keeps all its content (which thus
   moves to one level upwards in the document tree). Expects no
   action-parameters
 'insert-into - inserts the new node(s) as the last children of the given
   node. The new node(s) are specified in SXML as action-parameters
 'insert-following, 'insert-preceding - inserts the new node(s) after (before)
   the given node. Action-parameters are the same as for 'insert-into
 'replace - replaces the given node with the new node(s). Action-parameters
   are the same as for 'insert-into
 'rename - renames the given node. The node to be renamed must be a pair (i.e.
   not a text node). A single action-parameter is expected, which is to be
   a Scheme symbol to specify the new name of the given node
 'move-into - moves the given node to a new location. The single
   action-parameter is the location path, which addresses the new location
   with respect to the given node as the base node. The given node becomes
   the last child of the node selected by the parameter location path.
 'move-following, 'move-preceding - the given node is moved to the location
   respectively after (before) the node selected by the parameter location
   path
 handler ::= (lambda (node context base-node) ...)
 handler - specifies the required transformation. It is an arbitrary lambda
  that consumes the node and its context (the latter can be used for addressing
  the other node of the source document relative to the given node). The hander
  can return one of the following 2 things: a node or a nodeset.
   1. If a node is returned, than it replaces the source node in the result
  document
   2. If a nodeset is returned, than the source node is replaced by (multiple)
  nodes from this nodeset, in the same order in which they appear in the
  nodeset. In particular, if the empty nodeset is returned by the handler, the
  source node is removed from the result document and nothing is inserted
  instead.

  Returns either (lambda (doc) ...) or #f
  The latter signals of an error, an the error message is printed into stderr
  as a side effect. In the former case, the lambda can be applied to an SXML
  document and produces the new SXML document being the result of the
  modification specified.


sxml:modify!

(define (sxml:modify! . update-specifiers)
... Full Code ... )
 A highest-level function



DDO SXPath: the Polynomial-Time XPath Implementation

ddo:sxpath

(define ddo:sxpath
... Full Code ... )
 procedure ddo:sxpath :: query [ns-binding] [num-ancestors] ->
                          -> node-or-nodeset [var-binding] -> nodeset
 procedure ddo:txpath :: location-path [ns-binding] [num-ancestors] ->
                          -> node-or-nodeset [var-binding] -> nodeset

 Polynomial-time XPath implementation with distinct document order support.

 The API is identical to the API of a context-based SXPath (here we even use
 API helpers from "xpath-context.scm"). For convenience, below we repeat
 comments for the API (borrowed from "xpath-context.scm").

 query - a query in SXPath native syntax
 location-path - XPath location path represented as a string
 ns-binding - declared namespace prefixes (an optional argument)
  ns-binding ::= (listof (prefix . uri))
  prefix - a symbol
  uri - a string
 num-ancestors - number of ancestors required for resulting nodeset. Can
  generally be omitted and is than defaulted to 0, which denotes a
  _conventional_  nodeset. If a negative number, this signals that all
  ancestors should be remembered in the context.

 Returns: (lambda (node-or-nodeset . var-binding) ...)
 var-binding - XPath variable bindings (an optional argument)
  var-binding = (listof (var-name . value))
  var-name - (a symbol) a name of a variable
  value - its value. The value can have the following type: boolean, number,
  string, nodeset. NOTE: a node must be represented as a singleton nodeset.

 The result of applying the latter lambda to an SXML node or nodeset is the
 result of evaluating the query / location-path for that node / nodeset.



Lazy SXML processing

lazy:xml->sxml

(define (lazy:xml->sxml port namespace-prefix-assig)
... Full Code ... )
 Produces a lazy SXML document, which corresponds to reading a source
 document in a stream-wise fashion


lazy:sxpath

(define lazy:sxpath
... Full Code ... )
 Support for native sxpath syntax



Lower-level infrastructure for lazy SXML processing

lazy:result->list

(define (lazy:result->list nodeset)
... Full Code ... )
 Converts the lazy result into a list, by forcing all the promises one by one


lazy:node->sxml

(define (lazy:node->sxml node)
... Full Code ... )
 Converts the lazy node to SXML, by forcing all of its descendants
 The node itself is not a promise



SXML Serialization

srl:sxml->xml

(define (srl:sxml->xml sxml-obj . port-or-filename)
... Full Code ... )
 procedure srl:sxml->xml :: SXML-OBJ [PORT-OR-FILENAME] -> STRING|unspecified

 Serializes the `sxml-obj' into XML, with indentation to facilitate
 readability by a human.

 sxml-obj - an SXML object (a node or a nodeset) to be serialized
 port-or-filename - an output port or an output file name, an optional
  argument
 If `port-or-filename' is not supplied, the functions return a string that
 contains the serialized representation of the `sxml-obj'.
 If `port-or-filename' is supplied and is a port, the functions write the
 serialized representation of `sxml-obj' to this port and return an
 unspecified result.
 If `port-or-filename' is supplied and is a string, this string is treated as
 an output filename, the serialized representation of `sxml-obj' is written to
 that filename and an unspecified result is returned. If a file with the given
 name already exists, the effect is unspecified.


srl:sxml->xml-noindent

(define (srl:sxml->xml-noindent sxml-obj . port-or-filename)
... Full Code ... )
 procedure srl:sxml->xml-noindent :: SXML-OBJ [PORT-OR-FILENAME] ->
                                      -> STRING|unspecified

 Serializes the `sxml-obj' into XML, without indentation.


srl:sxml->html

(define (srl:sxml->html sxml-obj . port-or-filename)
... Full Code ... )
 procedure srl:sxml->html :: SXML-OBJ [PORT-OR-FILENAME] -> STRING|unspecified

 Serializes the `sxml-obj' into HTML, with indentation to facilitate
 readability by a human.

 sxml-obj - an SXML object (a node or a nodeset) to be serialized
 port-or-filename - an output port or an output file name, an optional
  argument
 If `port-or-filename' is not supplied, the functions return a string that
 contains the serialized representation of the `sxml-obj'.
 If `port-or-filename' is supplied and is a port, the functions write the
 serialized representation of `sxml-obj' to this port and return an
 unspecified result.
 If `port-or-filename' is supplied and is a string, this string is treated as
 an output filename, the serialized representation of `sxml-obj' is written to
 that filename and an unspecified result is returned. If a file with the given
 name already exists, the effect is unspecified.


srl:sxml->html-noindent

(define (srl:sxml->html-noindent sxml-obj . port-or-filename)
... Full Code ... )
 procedure srl:sxml->html-noindent :: SXML-OBJ [PORT-OR-FILENAME] ->
                                       -> STRING|unspecified

 Serializes the `sxml-obj' into HTML, without indentation.



Parameterizing the SXML serializer

srl:parameterizable

(define (srl:parameterizable sxml-obj . port-or-filename+params)
... Full Code ... )
 procedure srl:parameterizable :: SXML-OBJ [PORT] {PARAM}* ->
                                    -> STRING|unspecified
 sxml-obj - an SXML object to serialize
 param ::= (cons param-name param-value)
 param-name ::= symbol
 
 1. cdata-section-elements
 value ::= (listof sxml-elem-name)
 sxml-elem-name ::= symbol

 2. indent
 value ::= 'yes | #t | 'no | #f | whitespace-string

 3. method
 value ::= 'xml | 'html

 4. ns-prefix-assig
 value ::= (listof (cons prefix namespace-uri))
 prefix ::= symbol
 namespace-uri ::= string

 5. omit-xml-declaration?
 value ::= 'yes | #t | 'no | #f

 6. standalone
 value ::= 'yes | #t | 'no | #f | 'omit

 7. version
 value ::= string | number

 ATTENTION: If a parameter name is unexpected or a parameter value is
 ill-formed, the parameter is silently ignored. Probably, a warning message
 in such a case would be more appropriate.

 Example:
 (srl:parameterizable 
   '(tag (@ (attr "value")) (nested "text node") (empty))
   (current-output-port)
   '(method . xml)  ; XML output method is used by default
   '(indent . "\t")  ; use a single tabulation to indent nested elements
   '(omit-xml-declaration . #f)  ; add XML declaration
   '(standalone . yes)  ; denote a standalone XML document
   '(version . "1.0"))  ; XML version