Generic-Syntax for XML and HTML

Simple examples

A svg file:

<svg:svg xmlns:svg="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 32 32">
	<svg:g>
		<svg:circle cx="2" cy="4" r="2"/>
		<svg:rect x="8" y="2" width="24" height="4"/>
	</svg:g>
</svg:svg>

A html file:

<html lang=en[
<head[
	<meta charset=UTF-8>
	<title"Html example">
]>
<body[
	<h1`Title`>
	<p`Paragraph with <em`inline tag`>.`>
]>
]>

A server html file with processing instructions:

<html lang=en[
<head>
<body[
    <%php"ECHO 'Hello GS!';">
  ]>
]>

GS benefits over XML and HTML

Syntax overhead

GS is sensibly optimized in minified form:

Mixed content and DOM instantiation overhead

<div[
	<p `<em`consecutive`> <span`words`>`>
]>

XML and HTML do not differentiate insignificant white spaces for human readability (indentations) and significant spaces in text nodes.

In this example the space between the em and span tags is semantically needed whereas space between div and p is not.

That's why when the DOM is instantiated, white space only text nodes are always built but most are useless and consume unnecessary time and memory.

In GS, authors use different body type nodes:

Note: in GS this example can also be written with a list node [] and a simple text node "":

<div[
	<p [
		<em"consecutive">
		" "
		<span"words">
	]>
]>

Mixed-content and editors auto-formatting

Text editor automatic re-formating and re-indenting features are really useful. But these features don't work perfectly for XML and HTML due to undifferentiated text type nodes:

In XML, this issue is resolved with the declarative and inelegant xml:space="preserve" attribute, but its inheritable behavior is contrary to the modularity principle and is source of regressions (those who edit big xml files like xsl where some parts must never be re-indented well known this issue...)

Since HTML4 is a DSL, editors must know each tag name and the issue is partially worked around (as long as authors don't intend to override standard html tags with the white-space css property).

In HTML5 with web-components the DSL principle is no more applicable and the issue become more important because there is no way to explicitly specify the break-space behavior:

<ml-gsml-compare format="xml">
	<pre>&lt;svg:svg xmlns:svg="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 32 32">
		&lt;svg:g>
			&lt;svg:circle cx="2" cy="4" r="2"/>
			&lt;svg:rect x="8" y="2" width="24" height="4"/>
		&lt;/svg:g>
	&lt;/svg:svg></pre>
</ml-gsml-compare>

This html fragment with a web-component is issued from this page (see the first example). The <pre> tag in the <ml-gsml-compare> web-component is useless but is added just for blocking the auto text indentation in editors!

Here is the same example if this html page was written in GS, <pre> tag can be omitted because body node can be correctly typed in GS:

<ml-gsml-compare format=xml !"<svg:svg xmlns:svg="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 32 32">
  <svg:g>
    <svg:circle cx="2" cy="4" r="2"/>
    <svg:rect x="8" y="2" width="24" height="4"/>
  </svg:g>
</svg:svg>!">

In GS to explicitly allow text reformatting, a ~ is prepended to the text body ~"", the attribute value ~'' or the mixed body ~``:

<p "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi.">
<p ~"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi.">

Comment restrictions and imbrication

<section [
	<div [
		<button onclick='count--;'>
	]>
]>

To comment the button tag in HTML and XML:

HTML
<section>
	<div>
		<!--button onclick='count--;'-->
	</div>
</section>
XML: '--' is forbidden in a comment!
<section>
	<div>
		<!--button onclick='count---><!---;'-->
	</div>
</section>

In GS:

GS structured comment
<section [
	<div [
		<#button onclick='count--;'>
	]>
]>
GS text comment
<section [
	<div [
		<#"button onclick='count--;'">
	]>
]>

Now, commenting the div tag in HTML and XML is really hard. Comment each line is the simple but laborious way:

HTML
<section>
	<!--div-->
		<!--button onclick='count--;'-->
	<!--/div-->
</section>
XML
<section>
	<!--div-->
		<!--button onclick='count---><!---;'-->
	<!--/div-->
</section>

It's always easy in GS:

GS structured comment
<section [
  <#div [
    <#button onclick='count--;'>
  ]>
]>
GS text comment
<section [
  <#!"div [
    <#"button onclick='count--;'">
  ]!">
]>

GS benefits over HTML

Generic parsing and serializing

As opposed to XML and JSON, the main html issue is the need for a dedicated and complex parser and serializer with many special edge cases.

GS offers a generic and simple syntax: parsers and serializers can easily be implemented in many languages. For example a GS event driven parser implemented in Typescript is less than 700 lines of code (11kb minified js), and a serializer is about 200 lines.

Are these HTML dedicated parsing rules useful for authors?

<script !"
  if( a > 0 && a < 10) body.innerHTML="<span>One digit</span>";
!">

The biggest useful dedicated HTML syntax exception is the script tag (or style, textarea) that allows any characters without escaping until the </script (or </style , </textarea) sequence.

In GS, the generic useful solution is bounded escaping: the content between the boundaries is raw and never escaped. !" is the simplest boundary, if this character sequence is present in the content, any character (except ") can be inserted in the middle: !!", !.", !°", !xyz", !☠"...

The GS bounded escaping can also be useful where HTML does not offers good solutions, like in the previous pre tag example:

<pre !"
  <svg:svg xmlns:svg="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 32 32">
    <svg:g>
      <svg:circle cx="2" cy="4" r="2"/>
      <svg:rect x="8" y="2" width="24" height="4"/>
    </svg:g>
  </svg:svg>
!">

GS benefits over XML

Stricter and unreadable escaping

<script !"
  if( a > 0 && a < 10) body.innerHTML="<span>One digit</span>";
!">

One of the reasons for the failure of XHTML is the extreme syntactic constraint (resolved by HTML).

XML is a generic and simplified SGML (the common XML and HTML ancestor), but is not flexible enough. Thirty years later, GS proposes a more elegant, simple and generic syntax.

Do it in GS (you can't in XML nor HTML)

Structured comments and processing instructions

<#THREAD [
<comment by=mark date=2019-12-12 [
 <p`Comment with <em`rich html in GS format`>.`>
 <p`Useful with an IDE that instrument it!`>
]>
<comment by=john date=2019-12-18 [
 <p`Marvelous😉`>
]>
]>
<%repeat over=myUsers [
  <div `<%entry.name>`>
]>
<&http://schema.org/Product identifier=DMFL659{
  name= "USB switch"
  image= http://example.com/product/DMFL659.png
}>

As opposed to html and xml, in GS "comments" and "processing instructions" (split into "instruction", "meta" and "syntax" in GS) are not restricted to text.

In GS comments <#>, instructions <%>, meta <&> and syntax <?> are regular nodes with just a special type in addition.

Comments can be as simple as text or highly structured (see <#THREAD> example above).

The <&http://schema.org/Product> example illustrates how semantic metadata could be inserted in a presentation page as an alternative of JSON-LD or Microdata.

Note: due to DOM limitations, these richer comments and metas nodes can't be instantiated in DOM nodes and are skipped in the DOM building process.

Attributes with a special type

 <div onclick='version1()' #onclick='version2()' #onclick='version3()'>
 <myTag #todo='To remove...'>

In GS, attributes can be commented.

More generally, attributes can have any special type: comment #, instruction %, meta & or syntax ?.

These attributes with a special type are skipped in the DOM building process.

No restriction in names

 <http://schema.org/Product>
 <'a strange node name 😏' 'a strange \'attribute\'!'=true |'and an other 'attribute' with bounded escaping|'=ok>

As in JSON, node names and key attributes are unlimited by escape mechanisms.

Of course, you can't use these free names if you want to transform your GS to DOM or XML/HTML.

Multi-root document

Like HTML, but unlike XML, a GS document (or file) does not impose a single root element and can contain a set of nodes (see the examples above). It can be instantiated in DOM as a DocumentFragment.

Tail attributes

<content id=12 "
...
" sha256='2joE5T44CySmgKjv1KVAG1YIxyOxgiK5ZmDw4VUbrhc='>

In event driven programming, it could be useful to add attributes to a node after the body have been processed such as a computed size, CRC, hash key, or a validated transaction id. It is possible in GS by adding attributes to a node's tail.

When parsed as DOM the tail attributes are preserved and merged with head attributes.