teq Update

teq Update(late May 2019)

It’s taken months, but the rip-up and replace has more or less stabilised.

The rebuild completely rewrote the parsing and triple generation code. 

Now, we have two independent symbol table universes. Bizarrely, this simplifies matters.

There’s one symbol table universe for real names in the model. Things like keywords, and register and field names are held in a symbol table hierarchy. The hierarchy is a simple queue of symbol tbales; when we lex-in (as, for example, on encountering a ‘{‘ in code) we create a new symbol table, whack it on the front of the symbol table queue, and insert new symbols in it. We can check simply for duplicate names, clearly. And while searching the queue of symbol tables sounds laborious, it doesn’t seem to be. Meanwhile, lexing back out again is very very quick - just detach the front of the symbol table queue. (In the classic symtab model, we walked through the symtab queue by queue, removing the symbols in each queue which had the current lex level)

There’s another single symbol table used in parsing ‘ordinary code’ - that is, the block-strucred C-like code we use to define behaviour. This is the raw symbol table and it contains token symbols. A token symbol is quite simple:

// first the 'tokSymbol' structure

typedef struct {

    qBlock header;// next struct in the hashchain

    char * name;// pointer to name

    TokClass tokclass;

    uint32 prehash;// the prehash 

} tokSymbol;

The tokclass field says what sort of beast this is:

typedef enum {

    tokClassNone,

    tokClassName,

    tokClassIntValue,

    tokClassHexValue,

    tokClassFloatValue,

    tokClassPunct,

    tokClassOp,

    tokClassPP,

    tokClassText

} TokClass;

As we parse the code, we look up symbols in the raw symbol table; whenever we find that we dont have an entry, we create a new toksymbol and insert it; the tok class comes direct from the tokeniser. We use trees of operators and toksymbols to represent expressions, and build up statements, which are queued up on a queue belonging (in the adl version) to the current instruction.

We then mechanically convert trees (etc) into raw triples. These are held on a raw triple queue (per instruction); this is generally done with no typechecking nor semantic checking.

After we’ve got a complete queue of raw triples, it’s time to convert them to refine triples, which are essentially the same form but whose operands are symbols in the hierarchical symtab universe. The check and convert step is pretty simple - we have exactly one triple to look at a time, and we know the rules for operands and operators. 

When a raw triple is an op_declare we create a new full-blown symbol of the same name, and insert it into the current symtab in the hierarchical symtab queue (we also create an op_declare triple for the symbol) . When a raw triple references symbols, we check they’re declared, and have the right types. We build up the refined triples on the refined triples queue of the instruction.

When we’ve done this for all instructions we can generate the model. We do this, mostly, by fprintf() ing the triples in a format acceptable to a C compiler, into the model.c and model.h files

Current status is that the instruction work seems correct; however, teq does something that sadl never did (beyond understanding the action code) - it also has sections for initialization and for instruction fetch-and-execute; where sadl did this automatically, teq requires you to specify it in the obvious manner:

// the execute-forever loop that the machine performs

operate {

    uint32 ptr;

    ptr <- iptr;

    uint16 instruction;

    instrucread(imem, instruction, ptr);

    instrucReg <- instruction;

    execute(instruction);// decode and execute

    iptr <- ptr;

}

We’re not correctly melding the ‘compiled’ version of this into the generated model, at the moment. But it looks like more a style decision that an incomprehensible bug, so things are good.

_____________________________________________________

© kiva design groupe • 2017