SiSU information Structuring Universe
SiSU ("SiSU information Structuring Universe" or "Structured information, Serialized Units"),1 is a Unix command line oriented framework for document structuring, publishing and search. Featuring minimalistic markup, multiple standard outputs, a common citation system, and granular search.
Using markup applied to a document, SiSU can produce plain text, HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL database with objects2 (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.
How it works
SiSU markup is fairly minimalistic, it consists of: a (largely optional) document header, made up of information about the document (such as when it was published, who authored it, and granting what rights) and any processing instructions; and markup within text which is related to document structure and typeface. SiSU must be able to discern the structure of a document, (text headings and their levels in relation to each other), either from information provided in the instruction header or from markup within the text (or from a combination of both). Processing is done against an abstraction of the document comprising of information on the document's structure and its objects,2 which the program serializes (providing the object numbers) and which are assigned hash sum values based on their content. This abstraction of information about document structure, objects, (and hash sums), provides considerable flexibility in representing documents different ways and for different purposes (e.g. search, document layout, publishing, content certification, concordance etc.), and makes it possible to take advantage of some of the strengths of established ways of representing documents, (or indeed to create new ones).
1. also chosen for the meaning of the Finnish term "sisu".
2 objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced.
More information on SiSU provided at: www.jus.uio.no/sisu/SiSU
SiSU was developed in relation to legal documents, and is strong across a wide variety of texts (law, literature...(humanities, law and part of the social sciences)). SiSU handles images but is not suitable for formulae/ statistics, or for technical writing at this time.
SiSU has been developed and has been in use for several years. Requirements to cover a wide range of documents within its use domain have been explored.
- 添加新评论
- 560 次阅读
最新评论
8 小时 18 分钟 前
2 周 6 天 前
3 周 18 小时 前
4 周 2 天 前
7 周 3 天 前
15 周 3 天 前
16 周 23 小时 前
16 周 2 天 前
16 周 3 天 前
16 周 3 天 前