August Update

August Update

Well, that “it’s stabilized” turns out to be not quite true. More rip-up, but perhaps with more hope for stability because the beast is now generating code for executable models which at least looks right. More to do, but where we are now can be summarised as follows:

The teq Processing Flow

Pete Wilson v0.21 August 26 2019

Matches teq  teq 0.7v2.4[dev4.1] [August 14 2019]

This document describes how the teq architecture compiler functions, concentrating on the very high level aspects.

The general structure is:

- an architecture description parser, handwritten, for the architecture resources, along with a program parser (most of which we wish to re-use for parsing teq programs) which parses the 'action' definitions - which are just teq program, with the complication that we use the <- operator to mean 'read or write an architecture element (like a register)' rather than 'send or receive a message', as it would be in teq-for-programs 

The handwritten program parser works like this:

- it parses the source of the action definition

- using a token (the symbols which are read from the source) symbol table;  each Symbol has classification info based on its properties as a Token (a number, a string, a name...)

- with an operator-precedence expression parser, which generates simple trees whose elements are operator plus one or two operands held in that token symbol table. Errors detected are limited to malformed expressions. When reducing an expression, the parser generates new Symbols, which are marked as temporaries. We do not know the type of a temporary at this stage; it must be calculated later. The temps are added to the token symbol table.

A Symbol now looks like this:

typedef struct {

  qBlock header;// next struct in the hashchain

  char * name;// pointer to name

  TokClass tokclass;

  uint32 prehash;// the prehash - we've done the hashing computation 

                 // but not reduced to span the symtab size

  symClass symclass;         

  union {

    symSubClass subclass;   // for when we have a class/subclass relationship

    uint32 index;           // for when we want simply to count things, 

                            // as in a macro argument

  } sub;

} Symbol;

- errors detected in the expression parser are parseError() parseValError()(errors detected directly in the source), (errors detected in processing, like lacking a name or an operator in internal stacks) and badError()(implementation errors, like looking for the name of an operator and finding there isn't any such name)

- throughout the program, we do a lot of dynamic memory allocation. It's not likely to fail, but the standard action if it does is to call allocError()

- the parser generates a list of Statements of various types. Statements are structures of various types. Simple statements,like an assignment statement, have a tree representing the expression parsed.Errors detected include parseErrors()s, like not seeing a '{' where expected in an if or a while.}

- complex declarations (like int32 fred, jim, bert = 0, alf;nt32 fred; int32 jim; int32 bert; bert = 0; int32 alf;) are converted to a sequence of very simple declarations and separate assignment statements (so that would be i). Statements represent the declarations.

- we represent the behaviour of code as a sequence of Triples. A Triple may be queued using the standard teq functions. It contains, most importantly, three pointers to Symbols in the raw symtab, and three pointers to Vars in the refined symtab queue. Basic parsing fills in the Symbol pointers.

A Triple now looks like this:

typedef struct {

  qBlock header;

  uint32 seq;

  uint32 globalSeq;

  tripleType type;

  opSpec * op;

  // the operands for the initial node-to-triple pass, sans typechecking etc

  Symbol * symd0;

  Symbol * syms0;

  Symbol * syms1;

  // the operands for the type-checkable (etc) passes

  Var * d0;

  Var * s0;

  Var * s1;

 } Triple;

- the statements have their trees or other data structures pointed at by a void pointer to an approrpiate structure (several, for example in a while statement there's a controlling expression tree and a sequence of body statements) 

- a statement has a single triple queue 

- a first pass through the Statements and their queues of Triples creates Vars equivalent to Symbols in each Triple; Vars are added to the refined symbol table structure. The refined structure is just a queue of symbol tables;  this represents the block structure of the program in the simples way:  when we hit a new scope (such as an inner block) while parsing, we create and add a new symbol table to the front of the queue; on exiting the scope, we remove the scope's symbol table and attach it to something relevant (like the block statement). We provide lookup functions for both the front of queue (to check for within-scope multiple declaration errors) and up the queue (to find names in outer scopes).

- the Vars are created on encountering declaration statements, with the Symbol portion copied across from the declared Symbol. Once created, they are added to the refined symbol table.

- the Vars for temporaries are created and declared during the process of 'converting' raw triples into refined triples. We recognise a temp because its Symbol in the raw symtab is of symClasssymClassTemp .  We use checkDeclareTemp()convertTriplesonQ() in the conversion function  to see if the destination of the operation is a temp. If it is, we create a Var for it and fill in info marking it as a Var and that it needs its type calculated (symClassVar, symSubComputeType). We then create a new statement declaring it and add that statement in immediately prior to the current statement.

- while converting the Triples to have Vars, we look up the Var in the refined symbol structure. If the Var isn't found, we have an error (a statementError()).

- once we've built the refined Triples, we revisit it and run back down it checking for errors (type errors, operation errors,..). For each triple, we check the Vars exist and see if any are of class symClassVarsymSubComputeType, subclass . If so, we compute the type of the Var from the types of the source operand(s).

The end result is an architecture specification which has a hierarchical definition of the encoding of instructions, with data structures representing the architecture (resources and fields), with each instruction having a program representation consisting of a queue of Statements, each with their Triple queues (etc).

To create a model, we walk the specification,  writing a set of .h and .c files which can be linked to pre-written C code to construct an interpreter and an assembler. For the interpreter body, we use nested case statements to implement decode, and use a print tripleAsC() function to translate statements into C.

Futures: We’re looking at centralising the walking, so it’s always done the right way, with Actions being passed to specify conversion, checking, translation, translation to C etc.

_____________________________________________________


© kiva design groupe • 2017