Who invented sgml
When a product development group first gets your work, it is grateful, seeks out your help, and acknowledges what you've done. But by the time they've poured their own sweat into it for several years and a product comes out, they've often forgotten about the researchers who had the original bright idea.
So the name "Generalized Markup Language" -- because of its initials, GML -- was my way of labeling the technology so that its origin would be unmistakable. And there are a lot of statements being made that on face value, seem to be outlandish. Michael Vizard had an editorial in InfoWorld in which he basically called it a whole new processing paradigm and said that it was going to change everything from electronic commerce to how you package objects to share them.
All that's true! But when you look at the explanations you see in these magazines about what XML is, there's nothing that would justify those assertions. It's just presented as "extend HTML, make up your own tags. Part of this is that "simplicity sells" seems to be the philosophy. When they drew the line on how much the Web developer needs to know to understand the power of XML, I think they drew the line in the wrong place.
As I understand it, the signing was so heavily attended that the bookshop was forced to turn people away. Were you expecting this kind of response? Gee, I thought they were there to see me [more laughter]. The publicity hype has been unending. But there's genuine support for it. Anyone doing electronic commerce is heavily into XML. Very few technologies have been embraced by all of the players -- even archcompetitors -- in the that way XML has.
Even Java -- which Microsoft now embraces -- they certainly weren't on board from the beginning. But in the case of XML, they were. SGML has been applied very successfully to document management. SGML has options for every occasion. You're dealing with potential users whom you might think of as small niche markets, very specialized -- aerospace, telecommunications, semiconductors, and so on.
Those little specialized niches have document collections that are bigger than the Web. So, when you're dealing with applications of that magnitude, if they feel it's going to save them time and money to be able to leave off the end tags of paragraphs, it's worth it to them. They can afford to have software with that customization option, so it makes sense. For the Web, what the XML committee basically did was look at all these options and come up with their own tailored version, just as a large user would do, but one that was optimized for Web purposes.
So, just by eliminating choice in those syntactic areas, they reduced the size of the parser by 80 to 90 percent. So, it made it much easier to implement. Also, there's less diversity. One of the options that's available in full SGML is the ability to omit some markup when you can do so unambiguously. In XML, you're never allowed to omit markup. As a result, if you're what they call a hacker -- someone who's written code that's not really parsing the XML properly, but is just kind of scanning it as you might do to locate things in a hurry -- you've got a much more consistent text stream to work with.
Whereas in SGML, in order to do things safely, you pretty much have to parse all of the time, which isn't that big a deal. But in a networked environment, that can matter. Also, there are requirements in XML that make the document more robust if parts of it are missing, which is more likely to happen in a networked environment than in a more controlled environment. Whether you consider that a severe limitation or a powerful functional capability depends on how important [it is] to make choices about things that XML doesn't let you make choices about.
For the average Web developer, it's not an issue. You use XML. If, on the other hand, your employer is Boeing and you've got to turn out four million pages every quarter for each model of airplane, then you use SGML. Aside from the obvious fact that SGML is already used to manage terabytes of data, why is this scenario unlikely? For the same reason the Gap is not going to replace Savile Row. If you can afford custom tailoring and get a suit made exactly the way you want and the way you look the best, you'll do it.
One practical result is that SGML parsers are unable to make use of some advanced tools and techniques made possible by that theory. Consequently, they are large and complex pieces of computer software; as such they a suffer from reliability problems, b have in practice proven difficult to integrate into applications, and c change slowly in response to advances in software and document processing technology. Nonetheless, there remains a consensus that SGML's basic design partition into entities, elements, and attributes is correct and useful.
One result is a common tendency, in strategic projects involving SGML, to avoid using many advanced features and operate within the bounds of a highly restricted subset. This approach has generally met with success. However, this restricted subset has been re-invented by each successive group that has attacked the problem.
The design goals are that MGML shall: be an SGML application, and process a proper subset of SGML documents provide full support for the basic mechanisms entities, elements, and attributes which have made SGML successful unify the syntax of the meta-langage and the generated languages the DTD and the instances be defined by a simple, compact, formal specification that allows the easy implementation of MGML processors by taking advantage of standard formal-language technology.
The syntactic structure of MGML, enabling markup to be destinguished from data, is hardwired and has been straightforwardly and completely implemented using lex -style regular expressions. A DSD is a set of structure definitions that apply to all documents of a given class. In printed form, it occupies only 5 pages. An electronic form may be obtained here. A reference parser for a slightly earlier version, including fairly complete entity processing, implemented as two lex modules, one C module, and one yacc module, comprised about lines of code.
0コメント