SXML Tools Tutorial |
Abstract: This page is currently under construction.
However, if you have any comments/suggestions/corrections on the subject, feel free to contact me.
On this page:
The SXML representation for your XML or HTML document can be constructed automatically. For obtaining the SXML representation for an XML/HTML document, located either locally or remotely, the library function sxml:document is provided. The function accepts a filename or a Uniform Resource Identifier (URI) of the requested XML/HTML document and returns the SXML document being the SXML representation for the requested one.
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml") ==> (*TOP* (*PI* xml "version='1.0'") (poem (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) (stanza (line "Let us go then, you and I,") (line "When the evening is spread out against the sky") (line "Like a patient etherized upon a table:")) (stanza (line "In the room the women come and go") (line "Talking of Michaelangelo."))))
(sxml:document "filename") ==> ; The SXML representation of your document located in "filename"
The sxml:document function provides a convenient wrapper for the
XML parser (SSAX), the HTML parser (HtmlPrag) and remote resource accessor.
Depending on the requested document type, the appropriate parser (either
XML one or HTML one) is chosen automatically.
The interface for SSAX and HtmlPrag parsers is considered in the next two
subsections respectively.
(ssax:xml->sxml (open-input-resource "http://modis.ispras.ru/Lizorkin/XML/poem.xml") '()) ==> ; Yields the same result as for Example 1
The second argument supplied to the ssax:xml->sxml function is the
socalled namespace prefix assignment.
The semantics of this argument is discussed below; this argument can be the
empty list and it affects the result only for an XML document with XML
namespaces.
(ssax:xml->sxml
(open-input-resource "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml")
'())
==>
(*TOP*
(*PI* xml "version='1.0'")
(doc
(title "This document is linked with itself by XLink link")
(body
(link (@
(http://www.w3.org/1999/xlink:type "simple")
(http://www.w3.org/1999/xlink:href "selflinked.xml"))
"Following this link would result in this document again"))))Please note the qualified SXML names emphasized in bold.
The ns-prefix-assing can be supplied for SSAX parser:
(ssax:xml->sxml
(open-input-resource "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml")
'[(xl . "http://www.w3.org/1999/xlink")])
==>
(*TOP*
(@@ (*NAMESPACES* (xl "http://www.w3.org/1999/xlink")))
(*PI* xml "version='1.0'")
(doc
(title "This document is linked with itself by XLink link")
(body
(link (@ (xl:type "simple") (xl:href "selflinked.xml"))
"Following this link would result in this document again"))))Note the namespace node appeared in the auxiliary list of the SXML document node, and the qualified names of XLink attributes.
(html->sxml
(open-input-resource "http://modis.ispras.ru/Lizorkin/XML/amorphis.html"))
==>
(*TOP*
(html
(head (title "Amorphis lyrics"))
(body
(h2 "To Fathers Cabin")
"(Lyrics: trad., Music: Holopainen, Laine)"
(p
"O old man, good god" (br)
"careful man of heaven" (br)
"keeper of storm clouds" (br)
"make misty weather" (br)
"and create a tiny cloud" (br)
"in whose shelter I may go"))))
The XPath support provided for SXML in SXPath is fully compatible with the XPath Recommendation version 1.0 [1].
((sxpath "doc/title")
'(*TOP*
(doc (title "Hello world"))))
==>
((title "Hello world"))(sxpath "doc/title")– corresponds to the static analysis phase [2].
((sxpath "doc/title")
'(*TOP*
(doc (title "Hello world"))))
– corresponds to the dynamic evaluation phase.
Due to the similarity between SXML elements and attributes, the attribute value can be obtained by applying the child axis to the attribute node:
((sxpath "poem/@title/text()") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) ==> ("The Lovesong of J. Alfred Prufrock")
Although not compatible with the XPath Specification [1], the functionality illustrated in example 4 allows addressing the attribute value. This is extremely important, for instance, when XML modification is considered [3].
((sxpath "bib/book/*[self::author or self::editor]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((author (last "Stevens") (first "W.")) (author (last "Stevens") (first "W.")) (author (last "Abiteboul") (first "Serge")) (author (last "Buneman") (first "Peter")) (author (last "Suciu") (first "Dan")) (editor (last "Gerbarg") (first "Darcy") (affiliation "CITI")))In XPath 2.0, the solution would look more compact, namely “
bib/book/(author | editor)”.
However, XPath 1.0 and, consequently, SXPath do not support union expressions
in location steps.
On the other hand, the solution presented in SXPath has the advantage of
automatically preserving the correct order of nodes in the result nodeset.
((sxpath "bib/book[publisher = 'Addison-Wesley']/title") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((title "TCP/IP Illustrated") (title "Advanced Programming in the Unix environment"))
Many XPath operations (and comparison operations in particular) involve implicitly taking the string-value of its arguments. As defined in the XPath Spefication [1], “The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order. The string-value of a text node is the character data.”
The idea can be best illustrated by the following example:
((sxpath "bib/book[author = 'StevensW.']") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((book (@ (year "1994")) (title "TCP/IP Illustrated") (author (last "Stevens") (first "W.")) (publisher "Addison-Wesley") (price " 65.95")) (book (@ (year "1992")) (title "Advanced Programming in the Unix environment") (author (last "Stevens") (first "W.")) (publisher "Addison-Wesley") (price "65.95")))
Existential semantics of comparison operation in XPath:
((sxpath "bib/book[author/last = 'Abiteboul']") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((book (@ (year "2000")) (title "Data on the Web") (author (last "Abiteboul") (first "Serge")) (author (last "Buneman") (first "Peter")) (author (last "Suciu") (first "Dan")) (publisher "Morgan Kaufmann Publishers") (price "39.95")))Note that the XML document considered contains books (a) with no authors, (b) with one author, (c) with more than one author.
In XPath, arguments of a comparison operation are implicitly converted to a common datatype:
((sxpath "bib/book[price < 100]/title") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((title "TCP/IP Illustrated") (title "Advanced Programming in the Unix environment") (title "Data on the Web"))In this example, both argument of the comparison operation in the predicate are converted to numbers.
When multiple predicates are presented in a location step, the syntactical order of predicates is generally important:
((sxpath "bib/book[@year > 1993][position()<=2]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((book (@ (year "1994")) (title "TCP/IP Illustrated") (author (last "Stevens") (first "W.")) (publisher "Addison-Wesley") (price " 65.95")) (book (@ (year "2000")) (title "Data on the Web") (author (last "Abiteboul") (first "Serge")) (author (last "Buneman") (first "Peter")) (author (last "Suciu") (first "Dan")) (publisher "Morgan Kaufmann Publishers") (price "39.95")))– has the semantics “select the first 2 books from the bibliography that have their publication year grater than 1993”.
((sxpath "bib/book[position()<=2][@year > 1993]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((book (@ (year "1994")) (title "TCP/IP Illustrated") (author (last "Stevens") (first "W.")) (publisher "Addison-Wesley") (price " 65.95")))
– has the semantics “from the first 2 books in the bibliography, select those that have their publication year grater than 1993”.
Namespace bindings are supplied at XPath static analysis phase [2],
i.e. as an optional second argument of the sxpath function.
xlink:href attribute of the XLink linking element:
((sxpath "doc/body/link/@xlink:href"
'[(xlink . "http://www.w3.org/1999/xlink")])
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml"))
==>
((http://www.w3.org/1999/xlink:href "selflinked.xml"))
Please note that prefixes in the XML document and in the XPath node test are
completely independent [4]: rather than prefixes, their
corresponding namespace URIs are compared when evaluating the node test.In particular, a different prefix name can be chosen to stand for the same namespace URI:
((sxpath "doc/body/link/@x:href"
'[(x . "http://www.w3.org/1999/xlink")])
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml"))
==>
; Yields the same resultNote that (unlike XPath 2.0) the XPath 1.0 Specification implemented in SXPath does not support a default namespace declaration in node tests: in subsect. 2.3 of the XPath 1.0 Specification, it is stated that “..if the QName [in the node test] does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded).” Prefixes in XPath name tests thus have to be used when addressing a named node from a non-null namespace URI.
A more elaborate example:
((sxpath "rdf:RDF/rdf:Description/dc:title/text()"
'((rdf . "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
(dc . "http://purl.org/dc/elements/1.1/")))
'(*TOP*
(*PI* xml "version='1.0'")
(http://www.w3.org/1999/02/22-rdf-syntax-ns#:RDF
(http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description
(http://purl.org/dc/elements/1.1/:creator "Karl Mustermann")
(http://purl.org/dc/elements/1.1/:title "Algebra")
(http://purl.org/dc/elements/1.1/:subject "mathematics")
(http://purl.org/dc/elements/1.1/:date "2000-01-23")
(http://purl.org/dc/elements/1.1/:language "EN")))))
==>
("Algebra")When addressing parts of an SXML document that uses namespace-ids instead of explicitly expanded qualified names, namespace prefixes in XPath node tests should be mapped to namespace-ids instead of namespace URIs:
((sxpath "my:schema/my:complexType/my:sequence/*"
'[(my . "xsd")])
'(*TOP*
(@ (*NAMESPACES* (xsd "http://www.w3.org/2001/XMLSchema")))
(xsd:schema
(xsd:complexType (@ (name "Address"))
(xsd:sequence
(xsd:element (@ (type "xsd:string") (name "name")))
(xsd:element (@ (type "xsd:string") (name "street")))
(xsd:element (@ (type "xsd:string") (name "city"))))
(xsd:attribute (@ (type "xsd:NMTOKEN") (name "country") (fixed "US")))))))
==>
((xsd:element (@ (type "xsd:string") (name "name")))
(xsd:element (@ (type "xsd:string") (name "street")))
(xsd:element (@ (type "xsd:string") (name "city"))))XPath 1.0 has the node test that is “true for any node ... whose expanded-name has the namespace URI to which the prefix expands, regardless of the local part of the name” [1]. Such a node test allows selecting all the nodes that belong to a given namespace:
link element that belong to the XLink
namespace:
((sxpath "//link/@xlink:*"
'[(xlink . "http://www.w3.org/1999/xlink")])
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml"))
==>
((http://www.w3.org/1999/xlink:type "simple")
(http://www.w3.org/1999/xlink:href "selflinked.xml"))
Variable bindings are supplied at XPath dynamic evaluation
phase [2], i.e. as an optional second argument to the function
constructed by sxpath.
((sxpath "table/tr[$k]")
'(*TOP*
(table (tr "First table row")
(tr "Second table row")))
'[(k . 2)])
==>
((tr "Second table row"))Multiple variables can be used in an XPath expression as well.
n books published
after publ_year.
((sxpath "bib/book[@year > $publ_year][position() <= $n]/title") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml") '((publ_year . 1993) (n . 2))) ==> ((title "TCP/IP Illustrated") (title "Data on the Web"))
The variable passed to the SXPath evaluator may have any of the four data types supported in XPath 1.0 [1]:
((sxpath "items/item_tuple[offered_by = $i]/description") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/items.xml") `[(i . ,((sxpath "users/user_tuple[name='Tom Jones']/userid") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/users.xml")))]) ==> ((description "Red Bicycle") (description "Tricycle") (description "Broken Bicycle"))
retrieve_all_authors?:
((sxpath "bib/book[title = 'Data on the Web']/ author[$retrieve_all_authors? or position()=1]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml") '[(retrieve_all_authors? . #f)]) ==> ((author (last "Abiteboul") (first "Serge")))
Rewriting Example 4 in SXPath native syntax:
([sxpath '(poem @ title *text*)] (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) ==> ("The Lovesong of J. Alfred Prufrock")
Analogue of Example 5 in SXPath native syntax:
((sxpath '(bib book (*or* author editor))) (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((author (last "Stevens") (first "W.")) (author (last "Stevens") (first "W.")) (author (last "Abiteboul") (first "Serge")) (author (last "Buneman") (first "Peter")) (author (last "Suciu") (first "Dan")) (editor (last "Gerbarg") (first "Darcy") (affiliation "CITI")))
Analogue of Example 6 in SXPath native syntax:
((sxpath '(bib (book [publisher (equal? "Addison-Wesley")]) title)) (sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")) ==> ((title "TCP/IP Illustrated") (title "Advanced Programming in the Unix environment"))
((sxpath
`(poem stanza (line (,(lambda (nodeset var-binding)
(= 7
(length
(filter
(lambda (str) (not (string=? "" str)))
(string-split
(sxml:string-value (car nodeset))
'(#\space #\tab))))))
))))
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml"))
==>
((line "Let us go then, you and I,")
(line "Like a patient etherized upon a table:"))The solution in XPath, although possible, does not look straightforward:
((sxpath "poem/stanza/line[string-length() + 1 - string-length(translate(normalize-space(), ' ', ”)) = 7 ]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) ==> ; Yields the same result
There are certain queries that cannot be expressed in XPath, however can be expressed in native SXPath syntax + lambda steps. A couple of such examples are considered below:
((sxpath
`(poem stanza line ,(lambda (nodeset var-binding)
(filter
(lambda (node)
(not (null?
(filter
(lambda (str) (= (string-length str) 7))
(string-split
(sxml:string-value node)
'(#\space #\tab #\. #\, #\:))))))
nodeset))
))
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml"))
==>
((line "When the evening is spread out against the sky")
(line "Like a patient etherized upon a table:")
(line "Talking of Michaelangelo."))
and the 7-letter words in the above three lines are “evening”,
“patient” and “Talking” respectively.
((sxpath
`(bib book ,(lambda (book-set var-binding)
(map
(lambda (book)
(cons (abs
((sxpath "@year - 1993") book))
book))
book-set))
,(lambda (alist var-binding)
(let ((min-delta (apply min (map car alist))))
(filter
(lambda (pair) (= (car pair) min-delta))
alist)))
,(lambda (alist var-binding)
(map cdr alist))
))
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml"))
==>
((book
(@ (year "1994"))
(title "TCP/IP Illustrated")
(author (last "Stevens") (first "W."))
(publisher "Addison-Wesley")
(price " 65.95"))
(book
(@ (year "1992"))
(title "Advanced Programming in the Unix environment")
(author (last "Stevens") (first "W."))
(publisher "Addison-Wesley")
(price "65.95")))
The SXPath expression contains three lambda steps.
On the first lambda step, each book is cons'ed with the proximity of its
publication date to year 1993.
On the second lambda step, the associative list is filtered with respect to
the minimal value of the proximity.
On the third lambda step, the (filtered) associative list is turned into a
list of books again.
Jim Bender implemented the XQuery FLWOR-expression in Scheme, as described
here.
For a detailed description of the XQuery FLWOR-expression, take a look
here.
In this section, this technique is demonstrated by a couple of examples from XQuery Use Cases [5].
The analogue in Scheme for the XQuery query 1.1.9.1 Q1 from [5]:
`(bib
. ,(for ((b ((sxpath "/bib/book")
(sxml:document
"http://modis.ispras.ru/Lizorkin/XML/bib.xml"))))
(where
((sxpath "./publisher = 'Addison-Wesley' and ./@year > 1991") b)
`(book
(@ year ,((sxpath "string(./@year)") b))
. ,((sxpath "./title") b)))))
==>
(bib
(book (@ year "1994")
(title "TCP/IP Illustrated"))
(book (@ year "1992")
(title "Advanced Programming in the Unix environment")))
The analogue in Scheme for the XQuery query 1.1.9.6 Q6 from [5]:
et-al" element if the book has additional
authors:
`(bib
,@(for ((b ((sxpath "//book")
(sxml:document
"http://modis.ispras.ru/Lizorkin/XML/bib.xml"))))
(where
((sxpath "count(./author) > 0") b)
`(book
,@((sxpath "./title") b)
,@(for ((a ((sxpath "./author[position()<=2]") b)))
a)
,@(if ((sxpath "count(./author) > 2") b)
'((et-al))
'())))))
==>
(bib
(book
(title "TCP/IP Illustrated")
(author (last "Stevens") (first "W.")))
(book
(title "Advanced Programming in the Unix environment")
(author (last "Stevens") (first "W.")))
(book
(title "Data on the Web")
(author (last "Abiteboul") (first "Serge"))
(author (last "Buneman") (first "Peter"))
(et-al)))
The FLWOR-expressions can involve multiple (S)XML documents as well as a single XML document.
The analogue in Scheme for the XQuery query 1.4.4.2 Q2 from [5]:
`(result
. ,(for ((i ((sxpath "//item_tuple")
(sxml:document
"http://modis.ispras.ru/Lizorkin/XML/items.xml"))))
(let ((b ((sxpath "//bid_tuple[itemno = $i/itemno]")
(sxml:document
"http://modis.ispras.ru/Lizorkin/XML/bids.xml")
`((i . ,(as-nodeset i))))))
(where
((sxpath "contains(./description, 'Bicycle')") i)
(order (by ((sxpath "./itemno") i))
`(item_tuple
,@((sxpath "./itemno") i)
,@((sxpath "./description") i)
(high_bid
,@((sxpath
`("./bid" ,(lambda (nodeset var-binding)
(if
(null? nodeset)
'()
(list
(apply max
(map
(lambda (node)
(sxml:number
(sxml:string-value node)))
nodeset)))))
))
b))))))))
==>
(result
(item_tuple (itemno "1001") (description "Red Bicycle") (high_bid 55))
(item_tuple (itemno "1003") (description "Old Bicycle") (high_bid 20))
(item_tuple (itemno "1007") (description "Racing Bicycle") (high_bid 225))
(item_tuple (itemno "1008") (description "Broken Bicycle") (high_bid)))
SXML transformations are used for transforming an SXML document tree into another tree. Two primary kinds of SXML transformation tools can be considered: the XSLT processor STX (by Kirill Lisovsky) and the pre-post-order SXML transformer (by Oleg Kiselyov).
(for-each
display
(sxml:clean-feed
(stx:transform-dynamic
(sxml:add-parents (sxml:document
"http://modis.ispras.ru/Lizorkin/XML/poem.xml"))
(stx:make-stx-stylesheet
(sxml:document
"http://modis.ispras.ru/Lizorkin/XML/poem2html.xsl"
'[(xsl . "http://www.w3.org/1999/XSL/Transform")])))))
==>
; Produces the following input [I inserted additional whitespaces here
; to improve readability]:
<html>
<head><title>The Lovesong of J. Alfred Prufrock</title></head>
<body>
<h1>The Lovesong of J. Alfred Prufrock</h1>
<p>
Let us go then, you and I,<br/>
When the evening is spread out against the sky<br/>
Like a patient etherized upon a table:<br/>
</p>
<p>
In the room the women come and go<br/>
Talking of Michaelangelo.<br/>
</p>
<i>T. S. Eliot</i>
</body>
</html>
The same transformation in Oleg Kiselyov's pre-post-order SXML transformer:
(pre-post-order
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")
`((*TOP* *macro* . ,(lambda top
(car ((sxpath '(*)) top))))
(poem . ,(lambda elem
`(html
(head
(title ,((sxpath "string(@title)") elem)))
(body
(h1 ,((sxpath "string(@title)") elem))
,@((sxpath "node()") elem)
(i ,((sxpath "string(@poet)") elem))))))
(@ *preorder* . ,(lambda x x))
(stanza . ,(lambda (tag . content)
`(p ,@(map-union
(lambda (x) x)
content))))
(line . ,(lambda (tag . content)
(append content '((br)))))
(*text* . ,(lambda (tag text) text))))
==>
(html
(head (title "The Lovesong of J. Alfred Prufrock"))
(body
(h1 "The Lovesong of J. Alfred Prufrock")
(p
"Let us go then, you and I,"
(br)
"When the evening is spread out against the sky"
(br)
"Like a patient etherized upon a table:"
(br))
(p
"In the room the women come and go"
(br)
"Talking of Michaelangelo."
(br))
(i "T. S. Eliot")))
Under construction
http://modis.ispras.ru/Lizorkin/xpathlink.html
link element and traverse it:
((sxpath/c "/doc/descendant::link/traverse::doc") (xlink:documents "http://modis.ispras.ru/Lizorkin/XML/selflinked.xml")) ==> ((doc (title "This document is linked with itself by XLink link") (body (link (@ (http://www.w3.org/1999/xlink:type "simple") (http://www.w3.org/1999/xlink:href "selflinked.xml")) "Following this link would result in this document again"))))
XLink extended links are supported in XPathLink as well as XLink simple links:
chapter element that is linked with the first item in
the table of contents:
((sxpath/c "doc/item[1]/traverse::chapter") (xlink:documents "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) ==> ((chapter (@ (id "chap1")) (title "Abstract") (p "This document describes about XLink Engine...")))Note that the link is described in a different place of the XML document than the
item element being the starting resource of this link.
Under construction
http://celtic.benderweb.net/sxml-match/
(sxml-match
(car ((sxpath "bib/book[1]")
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/bib.xml")))
[(book (@ (year ,publ-year))
(title ,book-title)
(author (last ,author-last-name)
(first ,author-first-name))
. ,rest-content)
(display "publ-year is bound to: ")
(pp publ-year)
(display "book-title is bound to: ")
(pp book-title)
(display "author-last-name is bound to: ")
(pp author-last-name)
(display "author-first-name is bound to: ")
(pp author-first-name)
(display "rest-content is bound to: ")
(pp rest-content)]
[,otherwise
(display "Match not found")])
==>
; Produces the following output:
publ-year is bound to: "1994"
book-title is bound to: "TCP/IP Illustrated"
author-last-name is bound to: "Stevens"
author-first-name is bound to: "W."
rest-content is bound to: ((publisher "Addison-Wesley") (price " 65.95"))
SXML modification tool was designed in the spirit of [6].
In this subsection, the following document will be used as an illustration:
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml")
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(recipient
(name "Dennis Scannell")
(street "175 Perry Lea Side Road"))
(order
(cd (@ (title "Little Lion") (artist "Brooks Williams"))))))Each update statement generally consists of two parts:
recipient element together with all its content:
([sxml:modify '("purchaseOrder/recipient" delete)] (sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml")) ==> (*TOP* (*PI* xml "version='1.0'") (purchaseOrder (@ (orderDate "07.23.2001")) (order (cd (@ (title "Little Lion") (artist "Brooks Williams"))))))
recipient element, while keeping its content:
([sxml:modify
'("purchaseOrder/recipient" delete-undeep)]
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml"))
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(name "Dennis Scannell")
(street "175 Perry Lea Side Road")
(order
(cd (@ (title "Little Lion") (artist "Brooks Williams"))))))
A more elaborate example:
([sxml:modify
'("/descendant::text()[not(translate(., ' \r\n\t', ”))]" delete)]
'(table "\n "
(tr "\n\t"
(td "value")
"\n ")
"\n"))
==>
(table (tr (td "value")))
The call to XPath translate function results in a string being a
string-value of the text node with all the whitespace characters removed
(i.e. with spacebar, return, newline and tabulation characters removed).
For a whitespace text node, the latter results in an empty string, which is
treated as a logical false value in XPath.
([sxml:modify
'("purchaseOrder/recipient" insert-into (postalCode "05676"))]
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml"))
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(recipient
(name "Dennis Scannell")
(street "175 Perry Lea Side Road")
(postalCode "05676"))
(order
(cd (@ (title "Little Lion") (artist "Brooks Williams"))))))Attribute nodes can be inserted as well as element nodes:
([sxml:modify
'("purchaseOrder/recipient" insert-into
(@ (country "USA")))]
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml"))
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(recipient (@ (country "USA"))
(name "Dennis Scannell")
(street "175 Perry Lea Side Road"))
(order (cd (@ (title "Little Lion") (artist "Brooks Williams"))))))Multiple nodes can be inserted with a single update statement as well:
([sxml:modify
'("purchaseOrder/recipient" insert-into (postalCode "05676")
(@ (country "USA")))]
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml"))
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(recipient (@ (country "USA"))
(name "Dennis Scannell")
(street "175 Perry Lea Side Road")
(postalCode "05676"))
(order
(cd (@ (title "Little Lion") (artist "Brooks Williams"))))))Note that attribute nodes are inserted into the attribute-list of an element, the other nodes are inserted as element's last children.
recipient element to a customer element:
([sxml:modify
'("purchaseOrder/recipient" rename customer)]
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml"))
==>
(*TOP*
(*PI* xml "version='1.0'")
(purchaseOrder (@ (orderDate "07.23.2001"))
(customer
(name "Dennis Scannell")
(street "175 Perry Lea Side Road"))
(order
(cd (@ (title "Little Lion") (artist "Brooks Williams"))))))
Under construction
http://modis.ispras.ru/Lizorkin/ddo.html
appendix.
((ddo:sxpath "//text()[contains(., 'XPointer')]/ following::text()[not(./ancestor::appendix)]") (sxml:document "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) ==> ("XPointer is the fragment identifier of documents having the mime-type..." "Models for using XLink/XPointer " "There are important keywords." "samples" "Conclusion" "Thanks a lot.")
Under construction
Lazy XML-to-SXML conversion:
(define doc
[lazy:xml->sxml
(open-input-resource "http://modis.ispras.ru/Lizorkin/XML/poem.xml")
'()])
doc
==>
(*TOP*
(*PI* xml "version='1.0'")
(poem
(@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot"))
(stanza (line "Let us go then, you and I,") #<promise>)
#<promise>))Please note Scheme promises as certain nodes of an SXML document; the promises correspond to not-yet-parsed subtrees of the requested XML document. Once promises are forced, XML document parsing continues, and the SXML representation for the corresponding branches is returned.
Querying a lazy SXML document, lazily:
(define res ((lazy:sxpath "poem/stanza/line[1]") [lazy:xml->sxml (open-input-resource "http://modis.ispras.ru/Lizorkin/XML/poem.xml") '()])) res ==> ((line "Let us go then, you and I,") #<promise>)
Obtain the next portion of the result
(force (cadr res)) ==> ((line "In the room the women come and go") #<promise>)
Converting the lazy result to a conventional SXML nodeset
(lazy:result->list res) ==> ((line "Let us go then, you and I,") (line "In the room the women come and go"))
The SXML serializer provides converting an SXML object (i.e. a node or a nodeset) into XML or HTML. The SXML serializer provides partial conformance with XSLT 2.0 and XQuery 1.0 Serialization [7].
For converting an SXML object to XML, the function srl:sxml->xml is provided. The function takes an SXML object as its first mandatory argument. If the second argument is not supplied, the function returns a string that contains the serialized representation of the SXML object as XML:
[srl:sxml->xml '(doc (title "Hello world"))] ==> "<doc>\n <title>Hello world</title>\n</doc>"
If the second optional argument is provided to the
function srl:sxml->xml, and this second argument is a port, the
serialized representation of the SXML object is output to this port:
[srl:sxml->xml '(doc (title "Hello world")) (current-output-port)] ==> ; Produces the following output: <doc> <title>Hello world</title> </doc>
If the function srl:sxml->xml is called with two arguments and the
second argument being a string, the serialized representation of the SXML
object is output to a file whose file name is that string:
[srl:sxml->xml '(doc (title "Hello world")) "output.xml"] ==> ; The file "output.xml" contains the serialized representation of the SXML object
The SXML serializer supports different kinds of nodes, including comment nodes, processing instruction nodes, namespace nodes and entities produced by HtmlPrag:
[srl:sxml->xml
'(*TOP*
(@ (*NAMESPACES*
(foo "http://www.foo.net")))
(*PI* xml "version='1.0'")
(shipment (@ (weight 100) (unit "kg") (delivered))
(*COMMENT* "Comment node")
(description "Shipment" (& 32) "description")
(foo:empty)))
(current-output-port)]
==>
; Produces the following output:
<?xml version='1.0'?>
<shipment weight="100" unit="kg" delivered="delivered">
<!–Comment node–>
<description>Shipment description</description>
<foo:empty xmlns:foo="http://www.foo.net"/>
</shipment>It can be noted from the above examples that the
function srl:sxml->xml produces nested XML elements with indentation
that facilitates readability of the XML document by a human.
However, in certain cases (e.g. producing XML documents for their consumption
by machine) such indentation can be useless and ever undesirable, as it
increases the size of the XML document being produced.
For outputting an SXML object as XML without indentation, the
function srl:sxml->xml-noindent
is provided.
Its signature and the semantics of its arguments is the same as for the
function srl:sxml->xml already discussed:
[srl:sxml->xml-noindent '(doc (title "Hello world")) (current-output-port)] ==> ; Produces the following output: <doc><title>Hello world</title></doc>
Following the similar interface, an SXML object can be serialized as HTML:
[srl:sxml->html
'(table (@ (border))
(tr (td "Item 1")
(td (@ (rowspan 2)) "Item 2")
(td "Item 3"))
(tr (td "Item 4") (td "Item 5")))
(current-output-port)]
==>
; Produces the following output:
<table border>
<tr>
<td>Item 1</td>
<td rowspan="2">Item 2</td>
<td>Item 3</td>
</tr>
<tr>
<td>Item 4</td>
<td>Item 5</td>
</tr>
</table>Serializing an SXML object as HTML corresponds to HTML output method [7]. The difference between XML and HTML output methods is discussed in a more detail in Subsect. 10.1.3 below.
Similarily to the function srl:sxml->xml-noindent, a function
srl:sxml->html-noindent is
provided for serializing an SXML object into HTML without indentation.
The functions srl:sxml->xml, srl:sxml->xml-noindent,
srl:sxml->html and srl:sxml->html-noindent described above
provide a practical however fixed serializer behaviour.
To give the application the full power of controlling serialization parameters
that influence how serialization is performed [7], the
function srl:parameterizable
is provided.
The first two arguments of the function srl:parameterizable are the
same as for the previously discussed srl:sxml->xml and the like:
1) an SXML object to be serialized and 2) optionally, an output port or a
file name.
Additionally, the function srl:parameterizable takes one or more
serialization parameter declarations.
Each parameter declaration takes the form
of (cons param-name param-value),
param-name being a Scheme symbol, and the type of
param-value depends on the particular serialization parameter.
Before going into details about each particular serialization parameter
supported in the SXML serializer, let us consider an example of specifying
serialization parameters:
[srl:parameterizable
'(tag (@ (attr1 "value1") (attr2 "value2"))
(nested "text node")
(empty))
(current-output-port)
'(method . xml)
'(indent . " ")
'(omit-xml-declaration . #f)
'(standalone . #t)]
==>
; Produces the following output:
<?xml version='1.0' standalone='yes'?>
<tag attr1="value1" attr2="value2">
<nested>text node</nested>
<empty/>
</tag>The description of serialization parameters and their semantics is given in the following subsections.
For certain text nodes of the SXML document being serialized, it is sometimes
preferrable to output these text nodes as XML CDATA sections (e.g. for text
nodes containing code in some scripting language).
For outputting certain text nodes as CDATA sections, the
cdata-section-elements parameter is introduced [7].
The cdata-section-elements parameter contains a list of SXML element
names.
If the name of the parent of a text node is a member of the list, then the
text node is output as a CDATA section [7].
[srl:parameterizable
'(Snippet
(Declarations
(Literal
(ID "name")
(Default "element")))
(Code (@ (Language "XML")) "<$name$>$selected$$end$</$name$>"))
(current-output-port)
'(cdata-section-elements . (Code))]
==>
; Produces the following output:
<Snippet>
<Declarations>
<Literal>
<ID>name</ID>
<Default>element</Default>
</Literal>
</Declarations>
<Code Language="XML"><![CDATA[<$name$>$selected$$end$</$name$>]]></Code>
</Snippet>Without using the CDATA section, the serialized content of the
Code element for the above example looks somewhat human-unreadable
due to character escaping:
[srl:parameterizable
'(Snippet
(Declarations
(Literal
(ID "name")
(Default "element")))
(Code (@ (Language "XML")) "<$name$>$selected$$end$</$name$>"))
(current-output-port)]
==>
; Produces the following output:
<Snippet>
<Declarations>
<Literal>
<ID>name</ID>
<Default>element</Default>
</Literal>
</Declarations>
<Code Language="XML"><$name$>$selected$$end$</$name$></Code>
</Snippet>
Indentation (if any) produced for the output document is controlled by the
indent parameter [7].
According to [7], “if the indent parameter has the
value yes, then the XML output method may output whitespace ... in
order to indent the result so that a person will find it easier to read; if
the indent parameter has the value no, it must not output any
additional whitespace.”
Default indentation for the SXML serializer, corresponds to two-space indentation for each level of nested elements:
[srl:parameterizable
(sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml")
(current-output-port)
'(indent . #t)]
==>
; Produces the following output:
<?xml version='1.0'?>
<purchaseOrder orderDate="07.23.2001">
<recipient>
<name>Dennis Scannell</name>
<street>175 Perry Lea Side Road</street>
</recipient>
<order>
<cd title="Little Lion" artist="Brooks Williams"/>
</order>
</purchaseOrder>No indentation:
[srl:parameterizable (sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml") (current-output-port) '(indent . #f)] ==> ; Produces the following output (I manually inserted a couple of line breaks below ; to avoid a long line): <?xml version='1.0'?><purchaseOrder orderDate="07.23.2001"><recipient> <name>Dennis Scannell</name><street>175 Perry Lea Side Road</street></recipient><order> <cd title="Little Lion" artist="Brooks Williams"/></order></purchaseOrder>
In addition to values #t and #f for the indent parameter
(and their semantical synonyms 'yes and 'no supported for
conformance with [7]), the SXML serializer also allows the
value of the indent parameter to be a whitespace string.
In the latter case, this whitespace string denotes a custom indentation
to be applied for each level of element nesting.
In the example below, the custom indentation is set to a single tabulation
character:
[srl:parameterizable (sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml") (current-output-port) '(indent . "\t")] ==> ; Produces the following output: <?xml version='1.0'?> <purchaseOrder orderDate="07.23.2001"> <recipient> <name>Dennis Scannell</name> <street>175 Perry Lea Side Road</street> </recipient> <order> <cd title="Little Lion" artist="Brooks Williams"/> </order> </purchaseOrder>
Custom indentation – no additional whitespaces for nested elements, just line breaks:
[srl:parameterizable (sxml:document "http://modis.ispras.ru/Lizorkin/XML/po-short.xml") (current-output-port) '(indent . "")] ==> ; Produces the following output: <?xml version='1.0'?> <purchaseOrder orderDate="07.23.2001"> <recipient> <name>Dennis Scannell</name> <street>175 Perry Lea Side Road</street> </recipient> <order> <cd title="Little Lion" artist="Brooks Williams"/> </order> </purchaseOrder>
Two output methods are supported by the SXML serializer: the XML output method and the HTML output method.
Some of the differences between the XML and HTML output methods are [7]:
area, br, img, etc.
script and style elements.
<” characters
occurring in attribute values.
pre, textarea, etc.
The choice of a particular output method is controlled by the
method parameter [7].
The SXML serializer supports two values for the method parameter:
'xml and 'html which specify the XML and HTML output methods
respectively.
If the method parameter is omitted, the XML output method is used by
default.
In the two examples below, the same SXML element is serialized using XML and HTML output methods respectively; the differences between the results produced are emphasized in bold:
[srl:parameterizable
'(form (@ (action "http://cats.org/select") (method "post"))
(script (@ (type "text/javascript"))
"document.write('<h1>Cats</h1>')")
(select (@ (name "cats") (multiple "multiple"))
(option (@ (value "1")) "Calico")
(option (@ (value "2")) "Tortie")
(option (@ (value "3") (selected)) "Siamese"))
(br)
(input (@ (type "submit") (value "Send"))))
(current-output-port)
'(method . html)]
==>
; Produces the following output:
<form action="http://cats.org/select" method="post">
<script type="text/javascript">document.write('<h1>Cats</h1>')</script>
<select name="cats" multiple>
<option value="1">Calico</option>
<option value="2">Tortie</option>
<option value="3" selected>Siamese</option>
</select>
<br>
<input type="submit" value="Send">
</form>[srl:parameterizable
'(form (@ (action "http://cats.org/select") (method "post"))
(script (@ (type "text/javascript"))
"document.write('<h1>Cats</h1>')")
(select (@ (name "cats") (multiple "multiple"))
(option (@ (value "1")) "Calico")
(option (@ (value "2")) "Tortie")
(option (@ (value "3") (selected)) "Siamese"))
(br)
(input (@ (type "submit") (value "Send"))))
(current-output-port)
'(method . xml)]
==>
; Produces the following output:
<form action="http://cats.org/select" method="post">
<script type="text/javascript">document.write('<h1>Cats</h1>')</script>
<select name="cats" multiple="multiple">
<option value="1">Calico</option>
<option value="2">Tortie</option>
<option value="3" selected="selected">Siamese</option>
</select>
<br/>
<input type="submit" value="Send"/>
</form>
The namespace prefix assignment parameter is specific for SXML and has no
analogue in [7].
The ns-prefix-assig parameter is introduced in the SXML serializer due
to universal names
in SXML which by themselves give no notion about their original namespace
prefix, unless this information is supplied by application.
The ns-prefix-assig parameter allows the application to specify the
mapping between namespace URIs and the corresponding namespace prefixes to be
used for serialization.
The value of the ns-prefix-assig parameter takes the same form as the
argument to high-level SSAX parser
with the same name.
(srl:parameterizable
'(http://www.develop.com/student:student
(urn:schemas-develop-com:identifiers:id "3235329")
(name "Jeff Smith")
(urn:schemas-develop-com:programming-languages:language "C#")
(http://www.develop.com/student:rating "9.5"))
(current-output-port)
'[ns-prefix-assig
(dev . "http://www.develop.com/student")
(i . "urn:schemas-develop-com:identifiers")
(pl . "urn:schemas-develop-com:programming-languages")])
==>
; Produces the following output:
<dev:student xmlns:dev="http://www.develop.com/student">
<i:id xmlns:i="urn:schemas-develop-com:identifiers">3235329</i:id>
<name>Jeff Smith</name>
<pl:language xmlns:pl="urn:schemas-develop-com:programming-languages">C#</pl:language>
<dev:rating>9.5</dev:rating>
</dev:student>When no namespace prefix assignment is provided for some namespace URI, the serializer generates an XML prefix name by himself. The prefix name generated follows the goal of avoiding prefix re-declarations within the XML document being produced:
(srl:parameterizable
'(http://www.develop.com/student:student
(urn:schemas-develop-com:identifiers:id "3235329")
(name "Jeff Smith")
(urn:schemas-develop-com:programming-languages:language "C#")
(http://www.develop.com/student:rating "9.5"))
(current-output-port))
==>
; Produces the following output:
<prfx1:student xmlns:prfx1="http://www.develop.com/student">
<prfx2:id xmlns:prfx2="urn:schemas-develop-com:identifiers">3235329</prfx2:id>
<name>Jeff Smith</name>
<prfx2:language xmlns:prfx2="urn:schemas-develop-com:programming-languages">C#</prfx2:language>
<prfx1:rating>9.5</prfx1:rating>
</prfx1:student>However, the SXML serializer has a set of built-in namespace prefix
assignments for conventional namespace prefixes like xsl, rdf,
etc. and their corresponding namespace URIs.
There is thus no need for providing a namespace prefix assignment for a
popular namespace URI (unless you wish to use a different prefix name for such
a namespace in the serialized document).
The idea is illustrated below by the example of serializing the SXML element
from the XSLT namespace:
[srl:parameterizable
'(http://www.w3.org/1999/XSL/Transform:stylesheet
(http://www.w3.org/1999/XSL/Transform:template (@ (match "/"))
(body
(p "Total Amount: "
(http://www.w3.org/1999/XSL/Transform:value-of
(@ (select "expense-report/total")))))))
(current-output-port)]
==>
; Produces the following output:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<body>
<p>Total Amount: <xsl:value-of select="expense-report/total"/></p>
</body>
</xsl:template>
</xsl:stylesheet>
[srl:parameterizable '(doc (title "Hello world")) (current-output-port) '(omit-xml-declaration . #f) '(standalone . yes) '(version . "1.0")] ==> ; Produces the following output: <?xml version='1.0' standalone='yes'?> <doc> <title>Hello world</title> </doc>
[srl:parameterizable '(doc (title "Hello world")) (current-output-port) '(omit-xml-declaration . #f) '(standalone . omit) '(version . "1.0")] ==> ; Produces the following output: <?xml version='1.0'?> <doc> <title>Hello world</title> </doc>
When the omit-xml-declaration parameter has the value #t (a
default value for this parameter), no XML declaration is produced for an
SXML document being serialized and the values of standalone and
version parameters are ignored.
This document was translated from LATEX by HEVEA.