|>On Mon, 14 Nov 1994, Phillip M. Hallam-Baker wrote:
|>
|>> Even amongst file based operating systems the UNIX model of `all files are
|>> linear sequences of characters' is not universal. There are many systems that
|>> have working models. The problem cited is yet another reason why the UNIX
|>> file model is broken.
|>
|>Personally I find record-oriented files awkward, not to mention that they
|>could always be represented in Unix as directories. Records are
|>candy-coating, they may taste good but they're hard to digest.
We call that candy coating `abstraction'. Because UNIX types have never had
it they do not know what they are missing. Of course it takes more intense
mental effort to produce well abstracted code. The point is that a week of hard
thinking may save several months of low abstraction level hacking.
|>> Better for whom? There are many instances in which obtaining a file as a
|> single
|>> unit is much more convenient. For example when editing I do not want to be
|>> concerned with the fact that a single logical document is accessible at the
|>> chapter level as separate texts.
|>
|>A decent UI should be capable of attaining this level of abstraction.
|>Anyhow, HTML is no kind of markup language to write a book in. It's
|>optimized for short pages and small, simple structures.
Nope, that is not what the idea is. The optimisation is towards hyperlinked
segments of information. The small part comes in only as a style issue.
|>> I prefer a structure which uses the structure of the HTML to provide location
|>> information. For example how about:
|>>
|>> H1:2/H2:4/H3:3/P:4/10,15
|>>
|>> Being the tenth to fifteenth tokens in the fourth paragraph in the third
|>> H3 within the fourth H2 within the second H1.
|>>
|>>
|>> Actually this is nonsense in terms of SGML because H1s etc are not containers.
|>> You need to incorporate Divisions for that. But remember the motto `screw
|> SGML'.
|>
|>No, this is nonsense in terms of HTML. SGML is perfectly capable of
|>supporting adequate structural containers, but these were not included in
|>the HTML spec.
SGML is alledgedly quite capable of doing everything. The problem is that the
designers did not try an implementation of many features before they piled
them into the spec. Test implementations are frowned upon in ISO because the
firm that builds one has a head start on the others.
|>How do you incorporate lists and sectioning by horizontal rules in this?
<HR> Elements are a Mosaic inspired abomination. Lists are another type of
paragraph markup.
Actually if you read the ISO stylesheet draft spec it starts off essentially
defining the sort of tree structure I propose. It has to be built at a logical
level somewhere in the parse module in any case.
|>HTML just generally loses at representing large-scale 'vertical' text
|>structures well. This is something that ought to be fixed, not kludged
|>around.
???? What do you mean by a vertical text structure?
|>> OK given such a labeling scheme we can define a few editing operations.
|>> Delete, Insert, Replace to operate on the texts. We then work out the minimal
|>> editing operations to trasfrom one text into another. This is called
|>> Unification in the AI world, something I tend to associate with shaven headed
|>> people wearing safron robes and carrying bongo drums but there you are.
|>
|>AI? Hardly. A simple variant of Unix diff could handle this fine.
The UNIX diff scheme is a Unification algorithm. The problem is actually quite
hard (np-complete in fact). The UNIX diff program is inappropriate because it is
line based and HTML is a structural markup.
|>> I would also humbly submit the same scheme to be used as an extension to the
|>> anchor label scheme in HTML proper ie say
|>>
|>> http://bongo.cern.ch/fred.html#H1:2/H2:4/H3:3/P:4/10,15
|>
|>This would be a great idea were there only some real containers to be used.
We come not to serve SGML, but to destroy it.
|>> I would also like a similar scheme for text/plain. There we need two index
|>> methods, row/colum and octets from start:
|>>
|>> fred.txt#1,3:4,6 row 1 col 3 through row 4 col 6
|>> fred.txt#3:56 byte 3 for 56 bytes
|>
|>What form of line separation would you use for 'octets'? And how would
|>you deal with Unicode?
If a file is an octet stream (ie binary data) then it is by definition not
text/plain. UNICODE is more problematic since the file might be UTF encoded
or be in unpacked (explicit) form. Here I would suggest that the byte count
be an actual byte count and not a character count but that the row/col system
work by character.
Until UNICODE based programming languages become comonplace I doubt that there
will be much use for the UNICODE variants since most other text needing fancy
fonts will have other formatting (eg HTML).
-- Phillip M. Hallam-BakerNot Speaking for anyone else.