Empowered data

- Posted in Tech by - Permalink

This proposal tries not to add new entities into the language, it only tries to take what already is there and reusing it mercilessly. Also number of abstractions is lowered, since some of them could be implemented with existing ones, with minimal changes. The result is compact, lesser, more uniform and much more powerful language. ;-) Forward-compatible.

Motivation.

Blocks of code enclosed in curly braces were of two natures in ES3 and ES5 - there were code blocks, containing sequence of instructions to perform, and there was the object literal, which contained recipe for building a structured piece of data.

ES.next introduced some powerful additions to the object literal, introduced new use for it (.{...} and <| {...} operations) and brought in a new type of {...} block - the class block. The class block borrows many new features of object literal, but itself is something in-between.

Driven by the feeling that having more types of {...} source code constructs brings more confusion led to the thoughts about their nature and their similarities. This proposal wants to take this train of thought to the extreme by proposing only two types of "brave new curly block" constructs with strict role split - the imperative one for control flow and the declarative one for data structures, while building on similarities between them and radically empower the declarative one in the process, and not losing forward compatibility (by this I mean old-style constructs work in the new proposal). Classes are made as one case of the declarative construct with class-specific extensions, changing existing class syntax very slightly and not losing any semantics.

Curly blocks are similar. Reuse for much power with few features.

A simple code block:

{
  x = 4;
  receiver.f(x);
  function g() {
    do { nothing; } while (false);
  }
  x++;
  const prop = 5;
  if (x>5) {
    process.exit();
  }
  x--
}

Basic structure of this block is: there are simple statements, which are terminated by a semicolon (ignoring general semicolon insertion here). The last simple statement in a code block does not need a semicolon, though it can have it. These statements include assignment and function call.

Then there are structured statements which do not need to be ended with semicolon (do-while is a nasty exception), since they are ended with a sub-block. These are if/else, while loops, function declarations etc.

Not that this is correct explanation of code block structure (I for example ignore cases where if/else/while/... sub-statements are simple statments, not sub-blocks. For now, let us assume there are always sub-blocks). I also intentionally dismissed variable declarations, since they are not needed for this topic and would make things a little more complicated (look at that const line as an assignment ;-) ).

Now, for the simple object literal (with ES.next extensions):

{
  x: 4,
  g() {
    do { nothing; } while (false);
  }
  y: { foo: "bar" },
  get prop () { return 5; }
  z: 0
}

Basic structure of this block is: there are simple productions, which are terminated by a colon. The last simple production in a literal block does not need a colon, though it can have it. These productions are property initializations.

Then there are enhanced productions which do not need to be ended with colon, since they are ended with a sub-block. These are get, set and method declarations.

Even when the range of possible building elements of object literal is smaller than that of the code block, the similarities can be seen pretty well. There is undoubtful similarity between x = 4; and x: 4,, not only syntactical, but semantical, too. There is strong syntactical similarity between declaration of function g in code block and method g in the literal. Semantically it is also pretty similar, though not as much as the previous case.

Previous examples showed that there are formally (simple-simple, structured-enhanced), syntactical and functionally similar pairs of constructs between code block and object literal. These elements are, more-or-less, about the same thing. The difference between them is given by the context: assignment and function declaration do actions (they are imperative), field specification and method specification produce data (they are declarative).

It can be said, with lot of grains of salt, that code block is "(ordered) collection of imperative elements, simple, semicolon delimited, as well as structured, undelimited" and object literal is "(unordered) collection of declarative elements, simple, colon delimited, as well as structured, undelimited", but matching elements appear in both. This strawman is about completing this element similarity, mainly drawing from useful code elements and bringing their counterpart to the data domain.

1. if & Co. Conditional data structures.

The first idea to borrow from code domain is the if statement - in this case, not a statement, but a data production. You surely had a situation when writing an object literal and wanted to have a field or two only when specific condition is met. The solution nowadays is either not put it in and add it afterwards with if statement in the code (which is not correct, a conditional data field was wanted, not a conditional action that assigns to that field) or put the field in with ?: or && operators, so the field has null value in the case it should not be there at all.

Why not to have something like this?

{
  x: 4,
  g() {
    do { nothing; } while (false);
  }
  y: { foo: "bar" },
  if (bar > cowboy) { jar: ["whiskey"], wall: ["bottle", "bottle"] }
  get prop () { return 5; }
  z: 0
}

The data-domain if, in accordance with its code counterpart, is the structured element, that does not need a colon at the end, since it ends with a sub-block. But the data-if governs data-block. The curly block that is guarded by data-if should be a normal data-block which is included if the condition is met, and is not included when the condition is not met.

Of course we can have if/else if/else combination, like in { name: name, if (age > 60) { retired: true } else if (age < 18) { minor: true } else { workplace: company } age: age }.

If the if/else could only govern (data) blocks, it would not be the true compilation of code-if. To be true, it should take both simple elements ended by comma as well as blocks into its syntax, so this should be possible, too: { name: name, if (age > 60) retired: true, else if (age < 18) minor: true, else workplace: company, age: age }. To not create inconsistencies, I would allow this syntax, as well. Be as true to code-if as possible. In one line, this may look inferior, but when indented, it can be

{
  name: name,
  if (age > 60) retired: true,
  else if (age < 18) minor: true,
  else workplace: company,
  age: age
}
or
{
  name: name,
  if (age > 60)
    retired: true,
  else if (age < 18)
    minor: true,
  else
    workplace: company,
  age: age
}
which is not that bad.

Another conditional that can readily be adopted into the data domain is switch. It's fall-throgh, implicit block, break-finished semantic is a bit unwieldy for a one-liner, like { name: name, switch (role) { case "manager": canSeeReports: true, case "admin": aceessToServerRoom: true, break, case "developer": accessToLibrary: true, default: needsTask: true } if (boss) reportsTo: boss }, but again, formatting helps, and frankly, switch is not used that often in the code, it won't be used that much in data either, but sometimes it is really helpful. For the sake of completeness, it should be in the data, as well.

2. f(x, y). Data-production macros.

Code has a function call amongst its "simple" building blocks. It allows to define a little piece of code in one place and issue it later in many other places, possibly parametrized. Why not to have something like that in data, too? What about these data productions?

{
  name("Doe", "John"),
  people.counter(),
  position: "manager",
  salary: 100000
}
{
  name("White Daemon", "Jinx Perry"),
  dogs.counter(),
  race: "cavalier King Charles spaniel",
  colors: [ white, brown ]
}

What are name(...) and repositorty.counter(), function calls? Not exactly - in code it would be calls to functions or methods that would do some imperative sequence of actions. In data, it "invokes" a named data production, which is just like function or a method, but its block is declarative. Otherwise, they are defined the same way as functions or methods, with exception of % character used as a modifier, analogically with * modifier of generators:

function% name(surname, givenNames) {
  fullname: (locale == "hu" || locale == "jp") ?
    surname+" "+givenNames : givenNames+" "+surname,
  catalogName: surname+", "+givenNames,
  givenNames: givenNames,
  surname: surname
}
class Repository {
  ...
  %counter() { id: this.maxId++, creationDate: Date.now() }
  ...
}

I call % functions and methods data-production macros. They are not in fact true functions - the semantics of dogs.counter() is to include id: dogs.maxId++, creationDate: Date.now() in the object literal. The semantics is this for a reason - so implementors can optimize it to any level they see fit. It is "just" an inclusion of a parameterized preready data production.

On the other hand, dynamics of true functions / methods and easy interoperability with code must be present, macros must be as flexible as code functions are. For this, I'd propose these rules:

  • macro is first-class object that is accessible by its property name for reading and writing (if not made const etc.)
  • you can create macro object inline by function% (args) { macro body }
  • typeof macro is "function", it has no [[Construct]] and behaviour of [[Call]] is deliberately undefined (to allow implementors freedom to use it as they see fit)
  • issuing non-macro object with typeof "function" from inside data block as a macro results in throwing TypeError

As for the [[Call]] implementation specific, how do you reuse a macro from inside code? Simply: obj.{ macro(...) }. This is officially recommended (and only supported) way of reusing macro directly from code.

And yes, you can have recursion with macros. You are encouraged to.

One more note: macros can be even more powerful if they cleverly use the [expr]: expr data production. It is part of ES.next-enhanced object literal. The word is "cleverly", it can be colossally abused. You have been warned.

No loops, no variables. "Functional" object production.

There can be two paths with continuing the approach above. One is to adopt everything, however imperative, which is possible, from the code side to the data side, so we can have variables and loops in data side, as well and can issue something like this:

{
  operation: "square",
  min: 1,
  max: 10,
  for (var i = this.min; i <= this.max; i++) { [i]: i*i }
}
I argue that when you have this kind of imperativity, (if is conditional descriptive; macro, even if powerful through recursion, is less imperative than loop and variable), you can as well do it in plain code. After all, code is better for imperative things:
var result = {
  operation: "square",
  min: 1,
  max: 10
}
for (var i = result.min; i <= result.max; i++) result.{ [i]: i*i };
I used this mechanical translation and not used result[i] = i*i; for purpose of genericity: you can issue loops in code but still use all of the power of enhanced descriptive blocks using .{ data-production... } construct.

If "side-effect" imperative things like variable, and, consequently, loops, were exempt from data-production blocks (and nothing other which is imperative in nature is added later; and all things that would be added would be "side-effect-free" and non-imperative), we will end up with a thing I'd call "functional data production". I think it is desirable trait of a data-production.

By "functional" I now mean the trait that is inherent to code in functional languages - if issued, with parameters, it produces value from them, but this value production has no side-effects. The most prominent of these side-effects is setting a value of a variable. One may also call this "stateless". Data production should be stateless, imperative code is one that should be stateful.

Being stateless (of course, the data production is not stateless in strict sense - the values are computed by stateful code expressions, and [expr]: expr can bring expressions in keys as well; but avoiding variables and loops makes data production still less stateful) allows doing things that are typical for functional code (various behind-the-scene optimizations, mainly; but also some proofs of correctness) for the data -production blocks. Since data production is descriptive thing, one almost naturally expects from it to be sort-of "producing a value" instead of "start a process of manufacturing a value". Though I can not give a convicing case for this, I beilieve it is Good Thing (tm) to let the data production be stateless. In the long run it will bring its fruit.

Parsing: ambiguities; syntax as opt-in philosophy.

This and lots of similar extensions are in some time questioned by the parsing problems. For example:

{
  if (typeof window === "undefined") { server: true }
  else { broswer: true }
}
is interpreter as code, with two expression statements labeled "server" and "browser", both statements being "true", if encountered in code. When parsed in expression contexts (after assigment "=" or after function call "("), it parses as data production.

The condensed example of this phenomenon is:

{}.f()
If encoutered in code context, it is syntax error, because {} is code block and "." is unexpected token. If encountered in data context, it is the value of calling f method of {} object.

This untreatable ambiguity may render any proposals as this doomed. But it is not that. Even plain {} does not work - and we got used to put parentheses around it whenever it appears at the beginning of an expression statement (it is not so common to start an expression statement with object literal, but when it happens, almost always dot is following and it produces early syntax error). So this is annoying, but already known phenomenon, and we learn to live with it. Bottom line is, it is orthogonal to this proposal.

One possible parsing problem is combination of method declaration (f(args) {body)) with macro invoking (f(args)). But hopefully there will not be a problem, because the latter needs a comma delimiter unless last in the block.

One paragraph for "syntax as opt-in" mindset, which seems to be part of ES.next. Conditionals and/or macro calls inside data production block are to be treated as ES.next syntax and, consequently, opt-it in. The same is the case of function% and %-prefixed method names. The question of scope of opt-in is still debated, but overall, this proposal seems to favor program-wide opt-in. It needs the review of others to see full consequences for "syntax as opt-in" if this proposal is considered. It brings some (not breaking) changes to the basic ECMAScript matter, that is, to the object literal. Also, if there were parsing guesses based on containing if, switch or function call, they are invalidated.

3. Class is glorified declaration of prototype.

No offense meant. One of the motivation behind all this was the fact that class block was neither imperative nor declarative but (at least syntactically) something from both, and by need of having only two kinds of {...} - imperative (with all its consequences and common functionality all over) and declarative (ditto). And as I see it (I hope I am not alone), class is a way to describe the prototype (and constructor at the same time, but it is already nicely integrated). So taking example from class proposal (comments shortened; private changed to @ use, see below),

class Monster {
  // The contextual keyword "constructor" ... defines the body
  // of the class’s constructor function.
  constructor(name, health) {
    public name = name;
    @health = health;
  }
 
  // An identifier followed by an argument list and body defines a method. 
  attack(target) {
    log('The monster attacks ' + target);
  }
 
  // The contextual keyword "get" followed by an identifier and
  // a curly body defines a getter in the same way that "get"
  // defines one in an object literal.
  get isAlive() {
    return @health > 0;
  }
 
  // Likewise, "set" can be used to define setters.
  set health(value) {
    if (value < 0) {
      throw new Error('Health must be non-negative.')
    }
    @health = value
  }
 
  // After a "public" modifier, an identifier ... declares a prototype
  // property and initializes it
  public numAttacks = 0;
 
  // After a "public" modifier, the keyword "const" followed by an identifier
  // and an initializer declares a constant prototype property.
  public const attackMessage = 'The monster hits you!';
}
we can embrace "just describe the prototype object" and do this instead:
class Monster {
  // A method defined with name "constructor" is processed specially:
  // tt _has_ [[Construct]] and is made a constructor of this class.
  // If not explicitly generated, empty one is provided.  
  constructor(name, health) {
    public name = name;
    @health = health;
  }
 
  // A method, as in every object literal.
  attack(target) {
    log('The monster attacks ' + target);
  }

  // A getter, as in every object literal. 
  get isAlive() {
    return`@health > 0;
  }
 
  // A setter, as in every object literal.
  set health(value) {
    if (value < 0) {
      throw new Error('Health must be non-negative.')
    }
    @health = value
  }
 
  // A property definition, as in every object literal.
  numAttacks: 0,
 
  // A "const" property definition, as in every object literal.
  // (syntax of const property production is not yet agreed upon,
  // just use any one which is selected in the end)
  attackMessage := 'The monster hits you!'
}

Note to private removal: It seems private keyword will be removed in favor of foo.@bar syntax to access foo's property with private name bar. I am embracing this syntax in the document.

Apart from the different comments, which just show the different implementation provide semantically same result, the class code is nearly identical. Gone is (superfluous) public keyword in context of the prototype, I'd say it could go from constructor method as well (this.name = name; works fine and does not create any exceptional situations for constructor/non-constructor). If you see at it, the class block really only did (declaratively) describe the prototype. So let us make class Clazz [extends Superclazz] an operator on the generic data-production block, which creates the class machinery from it and returns constuctor function. It can be de-facto desugared to something like:


var _proto = (Superclazz || Object).prototype <| {
  ... the class body ...
};
if (!_proto.constructor) { _proto.{ constructor() {} } }
var _ctr = _proto.constructor;
__allowConstruct__(_ctr);
_ctr.prototype = _proto;
return _ctr;
except for the __allowConstruct__ will be inherent, not issued afterwards. Pros are clearly visible: less kinds of abstraction, no management of making features in class and object literal work consistently (class declaration is an object literal, everything works automatically).

There are some open issues, definitely. The class proposal continues with this:

class Monster {
  // "static" places the property on the constructor.
  static allMonsters = [];
 
  // "public" declares on the prototype.
  public numAttacks = 0;
 
  // Although "public" is not required for prototype methods, 
  // "static" is required for constructor methods
  static numMonsters() { return Monster.allMonsters.length; }
}
which can be straightforwardly rewritten as
class Monster {
  // "static" places the property on the constructor.
  static allMonsters: [],
 
  // plain declares on the prototype.
  numAttacks: 0,
 
  // "static" is required for constructor methods
  static numMonsters() { return Monster.allMonsters.length; }
}
and yes, object literal needs to know static keyword if used in context of class operator. Yes, an exception, but pretty clear one. We can live with it. The question appears: "What about static in macros?", which is not really easy to answer. One possibility may be to allow it (and any use of static) and throw an error if it is not (directly or included) happening inside class operator.

To end this paragraph more positively, if you define class block to be a data-production block, you can make the language more cohesive and features reused instead of coordinated, which should be a plus. Also adoption should be less fearful, because you do not any "class magic", you are simply "declaring the structure of a prototype" (while constructor and static are taken care by class operator for you).

Classes + macros = free trait-based composition.

Obvious sexy freebie. Put traits into macros (you can create middlemen by another macros importing and glueing some of them), and then use them in class production.

function% Pointish() {
  get r() { return Math.sqrt(this.x*this.x, this.y*this.y); }
  get phi() { return ... this.x }
  set r(newR) { ... }
  set phi(newPhi) { ... }
}

function% Circlish() {
  get area()  { return Math.PI*this.radius*this.radius; }
  get diameter() { return 2*this.radius; }
  get cirumference() { return 2*Math.PI*this.radius; }
}

function% Translatable() {
  translate(dx, dy) { this.x += dx; this.y += dy; }
}

function% Rotatable() {
  rotate(angle) { this.phi += angle; }
  grow(quotient) { this.r *= quotient; }
}

class BasicPoint {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }
  
  Pointish()
}

class Vector extends BasicPoint {
  constructor(x, y) { super(x, y); }
  Rotatable(),
  Translatable()
}

class Circle extends BasicPoint {
  constructor(x, y, radius) {
    super(x, y);
    this.radius = radius;
  }
  
  Circlish(),
  Translatable(),
  
  grow(quotient) { this.radius *= quotient; }
}

//etc
And you can parametrize them, if you see the use (for example with names of properties to use for x, y, radius, ... while having defaults).

That's it.

Known problems, open questions.

What if I want to include a trait to class or sub-data into an object, but do not want to call a macro, which must be evaluated? For performance reasons, there should be some kind of direct import there.
Do not worry and use parameterless macros. Premature optimization is the root of all evil. Leave the evil to the compiler. If your macro does not rely on side-effects, its invoking can be considerably optimized by ECMAScript itself, down to one if from PIC and then inlining it. If you make the macro const, even that if can be probably eliminated.

Bonus: arrays and generators.

This is just a bonus idea, which sprang up from including code-like features into data. The array literal was not enhanced any way yet. But it is natural - in array literal you rarely need to have some elements optional and some not. So no ifs here. As for the couterpart of a macro, let's postpone it for a while.

Arrays have "listish", "linear" feel. So does loops. ;-) But loops are not the right addition to the data production - they use variables and are very code-like. But there is another element, which is "listish" and "linear" - a generator. So, if you have defined some generator with

function* fib (upTo) {
  ...
}
why not to include it when bulding array literal, like:
[ "fibonacci", 10, *fib(10) ]
Not as sexy as traits though macros, but occasionally, usable. Especially in bare [ *gen(args) ] form to have intrinsic toArray. And generators are essentially macros of the array world, with a grain of salt.

And I think [ ..., *foo, ...] syntax could work for any iterable thing. Why only generators?


Thanks for patience, Herby

Tags: