January Update

Still re-assembling the parts, but meanwhile making progress on teq, the next-generation replacement for simpleadl.

Introduction to teq


The project is called teq.
Its goal is to provide a simple architecture design and experimentation toolkit.
In its initial version, it provides a tool, teq itself, which eats an architecture description and generates a
model and an assembler.
The model behaves like a processor with the provided architecture specification; the assembler can process programs written in the architecture’s assembly language into loadable binaries loadable and executable by the model
The toolkit is entirely written in standard C, and uses only the C standard library. It's also written fairly simply, and so can be understood (and improved) without understanding a vast panoply of external code. This does impose some limitations on the flexibility and power of the architectures specified - for example, currently instructions and data must be power-of-two sizes in bytes.


Status


Currently, the toolkit functions correctly for at least a pair of simple architecture specifications: one, acc32 is akin to a grown-up 32 bit 6502; the other, r32, is akin to a generic RISC machine.
These architectures don’t bother with specifying MMU’s etc, though there’s no obvious reason why they couldn’t.


Model Types


We currently generate two sorts of models - uni-context and multi-context.
Context models don’t worry about cores, caches, performance or whatever. They just model the
architecture.
A single-context model reads instructions from memory, executes them, and updates internal state and the memory.
A multicontext model runs multiple contexts one after the other in the hope of executing multiple instructions within a context before moving on to the next; this lets things run faster. But to get the interactions between programs in different contexts at least credible, whenever a context does something which could affect another context (eg access non-local memory (the current stackframe and registers are assumed local)) the current context is ripped untimely out of the queue and added to the back of the queue of contexts.
The architecture models allow you to compare architectures on the crude basics of code size and path lengths.
They also act a ‘golden models’ if you’re doing functional verification of a design
It’s conventional to add some instructions to the architecture to allow simple I/O - the arch language knows of a few standard ports which can read and write characters, integers, hex values and strings to/from
stdin and stdout.


Performance Models


A future version will be able to create a discrete-event model which maps contexts to cores and which models the tags of the caches (to avoid semantically unneeded data movements between memory and cache), which knows latencies and issue rates for the available function units and which knows what instructions use which units. A naive start at specifying this is already present. The microarchitecture of the cores will be abstracted away to get higher simulation performance.
Further away, there will be a finder-grain discrete event model which will use a microarchitecture description. We expect there’ll be an approximation version in which
some cores run with the full microarchitecture model but the majority just get given an ipc to use, which will be extracted from the running cores. (The assumption here is that the collection of cores is running “a program”; if it’s modelling a general purpose computer running a mix of workloads, you can’t do this).


Compilers


In the nearer future, the toolkit will start generating (not very good) compilers. The language will be a superset of a subset of C, and the compilers will generate assembler code for the targets specified.
The added semantics provide explicit concurrency (using
par and friends) and inter-context message passing. The compiler is the same parser, and translator, as used for the instruction semantics specification.
We foresee the ability to specify a few
styles of code generation for the compiler, including an accumulator style (e.g. acc32) and a RISC style (e.g. r32). We expect to add stack styles (for machines like Inmos’ transputer) and VLIW styles (supporting VLIW models requires some more work in the architecture tool itself).
In addition, we foresee the ability to manipulate the IR triples to prepare for a specific style of code generation (eg, we may wish to pre-group triples for a VLIW machine) along withs specific triple-to-code sequence transformation rules.
We don’t currently foresee much in the way of complex peephole optimisations being provided.


Magic Engines


Real systems include not just processors and memory, but also engines like DMA machines and peripherals.
The toolkit will allow you to write models of engines in the full teq language, which understands time - and thus looking at interfaces at specific times - as well as internal concurrency. The toolkit will generate discrete even models of the engines in C, mixable-in with the processor models.

_____________________________________________________


November Update

Getting the site back together; just copied-and-pasted the majority of the old blog content into the first page of this new blog.

Slowly (as ever) working on teq. Resuscitated an old language, charm, from the early 1990's and am adapting its code for conditionals for teq.

Introduction

Rebuilding the website



Running into problems using Karelia's Sandvox website builder, and support there seems to be not particularly existent, so trying RapidWeaver 8 instead.
This initial version will get the skeleton up, and then I'll backfill later.


Here's a copy'n'paste of the blog as it existed. Imperfect in form, but there y'go:

_____________________________________________________

February 25 2017

First Post


Creating this new Kiva Design website, kicking off the creation of the LLC…
The paperwork from the State arrived yesterday, making us official…
_____________________________________________________

May 8, 2017 at 10:50 AM

Architecture


Computer architecture has been annoyingly boring for a long time - on the whole, everybody’s doing RISC with a classical MMU and some form of cache coherence. (There are honourable exceptions - the Mill is worth reading about) but in general it’s all samey-samey.
This is about to change as systems which need lots of performance increasingly adopt multicore systems, with core counts running into the tens, hundreds, or thousands. There’s a bunch of examples:
Rex Computing has a (4-way VLIW) multicore solution which eschews caches and uses interrupts and ‘remote writes’ for intercore communication.
Adapteva has a similar approach.
Kalray also
What these all seem to lack is a basic understanding that these things need to be programmed, and “thinking parallel” when all you have is a sequential programming language is hard.
While there is progress being made (C++ atomics etc), in general current widely-used programming langauges either don’t support parallelism, or do it in an under-the counter way (such as providing a threads library or equivalent, along with locks and so forth). This generally makes writing correct, efficient, comprehensible code difficult.
An alternative approach is to notice that much “high performance computing” code is nested loops, and when you’re lucky, the iterations of the loops are independent, so that you can ignore identifying parallelism and let a compiler do it for you. Or, you can use an approach like OpenCL - then you identify those loops, rewrite your code as program + kernels (where the kernels capture the loops) and arrange to run the kernels on some collection of compute resource.
A more promising approach is to think of what the physics of this sort of parallel computing rewards and punishes. These multicore systems generally have a rectangular array of cores (each with some local memory) connected up with some form of mesh network - the network may provide one or more independent communications ‘planes’ to allow different sorts of data to proceed concurrently (examples - it can be helpful to have an ACK network separate from a data network, and if you’re moving cachelines-worth of data, it might also help to have an address network and a large data network).
This sort of system is capable of delivering data directly from one core into appropriate resources of another; you’re immediately tempted to think that an architecture which allows program to wait until some new data has arrived, and then process it, would be a natural fit. And so it seems - this style of programming is generally known as message-passing, and has the advantage that you don’t need locks or other shared-data-protecting structures. Also, by providing message queues/buffers in memory or registers, the architecture can speed up communication and synchronisation substantially while reducing power consumption at the same time.
But to include these features as an ordinary part of user mode program support you really want some new architecture. And the architecture tradeoffs for a very high core count system which leverages hardware message passing can be quite different from current mainstream practice.
And so there’s a new day dawning in processor/system architecture. Now all we need is a toolkit to go explore this new world.
As a first step in providing such a toolkit, we’re initiating a series of articles and software tools. The initial stuff provides an existence proof of how to define an architecture, and generate an executable model and an assembler automatically from that specification. The capabilities of this initial tool are exceedingly limited - despite the conversation above, it’s single processor only, for example. But most architects don’t use toolkits like this - they seem to work on squared paper or in text editors, and write plain text documents which seek to describe the architecture, and rely on tool makers to understand what’s written in the way they meant when constructing implementations, verification models and software tools. This inevitably leads to tears, so showing to a wider audience that this stuff needn’t be black magic seems a useful goal.
Look for the simpleADL toolkit to appear here quite soon.
_____________________________________________________


May 17, 2017 at 5:29 PM

Phew...


At last, a somewhat tidied version of the simpleADL tools and documentation are published.
This release is still rather alpha-y, but seems to work with the two example architectures and their two example programs.
We’ve found problems with XCode and symbolic links, and with make and symbolic links (macOS make doesn’t sem to follow the link back for the touch date of the original file, so doesn’t do what we’d like), and so we’ve fallen back on an odious install script which populates the various folders with necessary files and builds things and installs them in /usr/local/bin
_____________________________________________________


May 27, 2017 at 2:48 PM

Slowly getting there, perhaps


Just uploaded the 0.5 release of the simpleADL tools and associated documentation.
The toolkit itself seems to be able to do what it says on the can, although there’s still no example programs which do any memory write operations. Shocking, really.
But this release adds some implementation stuff. Article 3 talks about how a processor is implemented, and presents time-tagging, a simple addition to an architectural simulation which provides quite useful performance estimation. It also discusses the need for caches, and exposes the key elements of the changes to the generated model for the r32 architecture needed to support cache simulation and time-tagging.
What’s time-tagging? It’s a simple technique - every resource in the system that can be written and read has an associated 64 bit time tag value. The interpreter has a notion of current time, stored in the variable now. We start off with every tag zero. When the interpreter performs an ‘add’ instruction, for example, it needs to read the two source registers and write the destination register. and the operation takes some time - for an add, there’s a latency of 1 clock. So the interpreter works out when the instruction can start executing - it’s the maximum of now and the time tags of the registers involved. It sets the value of now to that result, and then adds the latency of the operation to now and writes that into the timetag of the destination register.
When we have caches, each line has its own tag, and so does main memory.
Function units need a tag if they’re not completely pipelined.
The whole scheme works quite well, without severely compromising the runtime performance of the interpreter.
Its weakness, of course, is that it doesn’t work so well when there are multiple agents - like multiple processors - making use of shared resources. But that’s a subject for the future.
_____________________________________________________
Apr 23, 2018 at 8:08 AM

teq (introduction)


Well, we’re about a year further on. simpleADL (sadl) has not been touched in all that time, because it’s a rather limited piece of work.
Instead, we’ve been working on teq, a new toolkit.
Note to the unwary - we don’t think teq contains major intellectual breakthroughs - its purpose is threefold: to provide a useful tool for architecture exploration (and to show by example how to construct such things); to present (eventually) some novel architectural thoughts aimed at IoT; and to provide us with (another) useful and pleasurable activity.
teq is an architecture description language, with (hopefully) the shortfalls of sadl as a language and as a tool fixed.
The project is just coming to life - we can consume architecture models and detect errors and provide useful error messages and build the data structures which represent the architecture, and are beginning to be able to generate the necessary .c and .h files. In addition to the architecture of a processor, teq will also be able to describe an implementation of an architecture (in the sense that the clock rate of an implementation, and the effects of the micro-architecture will be specifiable) so that performance studies can be done.
teq will also be able to describe large multiprocessor systems, in which many processors operate concurrently and interact.
Such systems will need software, so teq will also be a concurrent programming language. At the moment, we envisage this as being a slightly simplified C, with concurrency extensions more or less along the lines of occam (that is, channel-like concurrency, and static, block structured concurrency specifications) with aspects of go (that is, channels and dynamic concurrency) and a touch of csp (that is, perhaps no channels :-).
But writing code requires a compiler, so we’ve started working on how to automatically generate a compiler from the architecture and some extra information. At least initially, the compilers won’t be very good, but they’ll be of similar quality for all architectures and so not all comparisons will be useless. We’re thinking of a structure which compiles code into a simple intermediate language, being able to distributed code in that form, and having a sort of just in time converter to convert this into per-architecture loadable formats. But these are early days, and undoubtedly there will be gotchas.
And such systems also need useful widgets like DMA engines and other independently-executing non-programmable engines. teq will allow these to be defined; the approach will be to specify the behaviour of the state machine in a version of the teq programming language, with the code being executed on an appropriate abstract architecture. And, clearly, system descriptions will contain both processors and engines.
Finally, teq will allow the definition of a limited set of interconnect topologies, perhaps limited to regular meshes with some number of layers.
As each phase of this Grand Project gets to apparent usability, we shall upload the source and executable (for Mac OS X only, for the time being) along with some documentation.
The license for the teq stuff is as follows:

Copyright (c) 2018, Kiva Design Groupe LLC All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of anyone else.
_____________________________________________________

A Digression on an Aspect of Performance



teq (and simpleADL) are built from odds and sods of software I’ve had lying around for years.
One of these is the symbol package. The original version was essentially the example in Kernighan and Ritchie. This used a collection of queues, each holding a single-forward-linked queue to hold the symbol data structures. To insert a new symbol, you created it, filled in the name, computed a hash, and used the hash to select what queue to put it in.
(The Second Edition of The C programming Language has the example starting on page 143, under section 6.6 Table Lookup).
A key operation in any compiler or equivalent is looking up a name. You’ve just read a token, and eventually you’ll need to know what it is (if it’s been defined). So you’ll look it up in the name table. The simplest organization is a long list (single-forward chained queue) of all the names of interest. If this is, at some point 280 entries long, you’ll need to make on average 140 string comparisons. Since names vary from one character in length to 20 or more, these comparisons can take a long time.
A first step is to reduce the average work by any amount you like: instead of one queue, have N. This reduces the search to 140/N comparisons. How to choose what queue to use for a given name? Compute a small integer in the range 0..N-1 from the name.
K&R offer a simple function to compute a hash; here’s the code from the second edition::
/* hash: form hash value for string s */
unsigned hash(char * s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}

We note that often the % operator is relatively expensive on many processors, but we’ll ignore this because we don’t do it very often. But when looking up a name, we need to do a string compare between every entry in a queue and the name we’re looking up. This can be expensive. Luckily the standard library has the strcmp() function, generally optimised as much as practical. But it still needs to look at every name in the queue until a match is found or we reach the end of the queue indicating no match.
strcmp() looks like this, in non-weird form (from: https://opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c)
/* ANSI sez:
* The `strcmp' function compares the string pointed to by `s1' to the
* string pointed to by `s2'.
* The `strcmp' function returns an integer greater than, equal to, or less
* than zero, according as the string pointed to by `s1' is greater than,
* equal to, or less than the string pointed to by `s2'. [4.11.4.2]
*/
int strcmp(const char *s1, const char *s2) {
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}
With long names, that’s a lot of comparisons. Is there any way we could speed it up?
Probably. We have used one form of (intended) speedup for a long time. The structure holding the name also included the length of the string; not much point comparing the strings for equality of they’re different lengths, eh?
But that’s not all. Every name we encounter means we need to compute the hash. We could store the hash, and get a much stronger avoidance of doing the string compare. Naturally, we’d store the value computed as hashval in the hash() function, before it’s range-reduced.
Testing the hypothesis that these represent improving performance requires a fairly large experiment, because performance all depends on the names being looked up. Random names aren’t the same as English-language names; and these aren’t the same as names used in program (‘not the same as’ means distribution of length and of characters in the names differ).
So we performed a basic reality test. We created an array of pointers to names, and then malloc() space for each name, and created each name using our uniform() function to choose successive letters (‘a’ to ‘z’ more or less) and a separate call to uniform() to choose the length of each name. And installed them all, in order of creation, in a hash table. Then ran through the array in order, looking up each one.
We did two experiments. In the first, we chose a letter at random and used that for the first three characters of each name, a rough’n’ready nod to the fact that names are often ‘categorised’ thus. Results:
  • Using Raw, looking up 50,000 names in our symbol table took 1021692 microseconds
  • Using Length, looking up 50,000 names in our symbol table took 959724 microseconds
  • Using Hash, looking up 50,000 names in our symbol table took 901598 microseconds
  • And then, with no pre-string, we got:
  • Using Raw, looking up 50,000 names in our symbol table took 1012510 microseconds
  • Using Length, looking up 50,000 names in our symbol table took 990427 microseconds
  • Using Hash, looking up 50,000 names in our symbol table took 1020117 microseconds

Why isn’t hash better than length - or raw - for the random strings? Probably because the string compare can decide that the strings are different after looking at the first letter of the string. To do a proper test, we need a more realistic set of names.
So, yep. The thinking was good. But the effects are so small as to be unreliable..
From this we a reminded once again that optimizing without measuring is oft-times a Very Bad Idea…

_____________________________________________________

teq


Oof..
I’m generating correctly-executable code for an arbitrarily large number of processors from an architecture description and ‘hand-assembled’ program (using generated field macros)
And now I begin to see the lacunae….
For example, in thinking about how to incrementally add instructions, I realized that there were no checks in place to check the uniqueness of instruction decode. Bah, humbug…
Now to think hard about how either change the instruction encoding data structures; or to faff with a collection of instructions and discern the decode tree after the fact (as I did with the very first version of this stuff, dealing with instructions presented in more or less alphabetical order of mnemonics); or to work out how to insert things into the current structure.
This last seems most attractive at the moment.
It doesn’t fix the ‘uniqueness’ thingy, but that might fall out, or be amenable to a simple ’symbol table’ solution,,,
_____________________________________________________

Progress


Yep, had to rip up the instruction decode representation, but the new form is simpler to build and to walk. It’s essentially just a direct representation of a decode tree, with a tree of structures; in each structure, the level below is captured as an array of pointers to the decode nodes below. The bottom level, below which there are only instructions, holds a pointer to a single queue of instructions.
We capture the instructions as they arrive, and insert them into the single queue in field order (there’s always a ‘final opocde field’ to define this instruction rather than that, and that’s the field we use to insertion sort on.)
Insertion sorting means duplicates are easily found.
Also looked at the error reporting, and tidied that up extensively. The whole thing seems much more robust now - trying it on an increasing number of architectures In particular, we want an architecture whose instruction set is as close to one-to-one with the abstract architecture implied by the triples - if they were instructions, what would they look like? This lead to a new field type being needed - an ‘operand’, whose definition includes its base register. So now we can represent accumulator machines simply, too.
fields {
// the name of a field, for assembler definition, is the name given here
op8[0:7] opcode;// 4 bits of major opcode unsigned value
op9[9:16]opcode;
l_dest[9:15] opndsel fptr srcdest;
l_src1[16:23] opndsel fptr src;
l_src2[24:31] opndsel fptr src;
}


_____________________________________________________

Containers

In the expr.c expression package, we maintain a pair of stacks - one for names, one for operators. As we parse the source code, we push names onto the names stack, and operators onto the operator stack. When we encounter an operator in the source with a lower priority than the operator at the top of the operator stack, we begin reducing the stacks. Take off the top operator, and the number of names it needs, and generate an expression node, and add that node to an output tree. Keep doing this until the operator at the top no longer has higher priority than the one we just read.
(It’s a bit more complex than that really - parenthesised expressions, function calls, array indices - but that’s the basic operation of an operator-precedence expression parser).
So - simple question - how big should the name and operator stacks be?
And of course the answer is “it depends”.
In earlier versions of expr.c, the easy way out was taken - the stacks were a fixed size and the error message “expression too complex” was given if their limits were surpassed. But that’s just nasty.
So for teq, we have introduced a new useful building block, the Container.
typedef struct {
char * name;
uint32 max; // max number of elements
uint32 count; // how many elements we have
char ** data; // array of pointers
} Container;

It’s a simple structure with pointer to an array of pointers (data) and some metadata - count and max. max defines the current capacity of the data array, count the number of pointers in the array.
When a Container is created, the capacity is set to some default value, the array f pointers is malloc’d with that number of entries, and data is pointed at the allocated array.
Containers should be accessed only through the provided accessor functions (see below). When a pointer is added to a Container, the obvious check is done that this isn’t beyond capacity. If it is, we realloc() the data array to a capacity that is some multiple of the current capacity, check the realloc worked, and replace the data pointer if so.
Thus, we get self-growing arrays of exactly the right size.
container.h
//
// container.h
// teq
//
// Created by Pete Wilson on 6/6/17.
// Copyright © 2017 Kiva Design Groupe LLC. All rights reserved.
//
#ifndef container_h
#define container_h
typedef struct {
char * name;
uint32 max; // max number of elements
uint32 count; // how many elements we have
char ** data; // array of pointers
} Container;
char * contNameAndVersion(void);
// containers are self-extending arrays of pointers
uint32 ContainerCount(Container * c);
uint32 allocateInContainer(Container * c);
Container * newContainer(uint32 count, char * name);
void zeroContainer(Container * c);
void printContainer(Container * c);
void * getPtrFromContainer(Container * c, uint64 n); // get the n'th pointer; don't change the container
void addPtrToContainer(Container * c, void * p); // add a pointer to the container
void pushPtrToContainer(Container * c, void * ptr); // push a pointer into a container (same as add)
void * pullPtrFromContainer(Container * c); // pull a pointer from a container into ptr; return 1. If none, return 0
void * getTopPtrInContainer(Container * c); // if there is a pointer, copies top value into ptr and returns 1; else returns 0
#endif /* container_h */
container.c
//
// container.c
// teq
//
// Created by Pete Wilson on 6/6/17.
// Copyright © 2017 Kiva Design Groupe LLC. All rights reserved.
//
#include
#include
#include
#define EXTERN extern
#include "types.h"
#include "container.h"
extern void allocError(uint64 s, char * msg);
static char * name_and_version = "containers 0.1v6 [September 23 2018]";
/*
provides a uniform way of creating an array of objects which can grow in size
Versions
0.1v6 [September 28 2018]
- all Containers now hold just pointers. No 'ptrContainers'. Errors remain..
0.1v5 [April 16 2018]
- changed API for int pullValFromContainer(). returns 1 if there was something in the container, and writes the value removed to a pointer's variable
0.1v4 [November 2017]
- added pointer only containers (type Container)
0.1v3 August 14 2017
- added a set of functions to more efficiently push and pull 'word-sized' (64 bit) values to and from a Container
*/
// ------------------------- contNameAndVersion --------------
char * contNameAndVersion(void) {
return name_and_version;
}
// ------------------ printContainer --------------
void printContainer(Container * c) {
printf("\n\nContainer '%s':", c -> name);
printf("\n\tcapacity = %d", c -> max);
printf("\n\tcount = %d", c -> count);
// print the pointers
for (int i = 0; i < c -> count; i++) {
char * p = getPtrFromContainer(c, i);
printf("\n\t\tptr %d: %p", i, p);
}
//fflush(stdout);
}
// ----------------- zeroContainer ----------
void zeroContainer(Container * c) {
c -> count = 0;
}
// ------------------ ContainerCount -----------------
uint32 ContainerCount(Container * c) {
return c -> count;
}
// ----------- allocateInContainer ------------
uint32 allocateInContainer(Container * c) {
// allocate space for one more pointer in the container, and return the index to it
if (c -> count >= c -> max) {
// we need to grow the container - we use 50%
uint32 newmax = (c -> max + (c -> max/2));
//reallocate the data
//printf("\n-- growing container %p from %lld to %lld pointers", c, c -> max, newmax);
void * newdata = realloc(c -> data, newmax * sizeof(char *));
if (newdata) {
//printf("..succeeded.");
c -> data = newdata;
c -> max = newmax;
}
else {
allocError(newmax * sizeof(char *), "reallocation of a container's data");
}
}
// return the index to the current slot
uint32 index = c -> count;

// and increment the slot
c -> count++;
return index;
}
// -------------- getPtrFromContainer ---------
void * getPtrFromContainer(Container * c, uint64 n) {
// Read the n'th pointer
if (n >= c -> count) {
printf("\n=== container error - asking for item %llu in container '%s' when count is %u", n, c -> name, c -> count);
return NULL; // safety - a NULL pointer should cause havoc
}
char * ptr = c -> data[n];
// printf("\nget ptr[%lld] %p from %s", n, ptr, c -> name);
return ptr;
}

// -------------- addPtrToContainer ---------

void addPtrToContainer(Container * c, void * p) {
// Add the pointer p to the container
uint32 n = allocateInContainer(c); // find index at which to store our pointer
// printf("\nadd ptr %p to %s[%lld]", p, c -> name, n);
c -> data[n] = p; // and store it
}

// ------------ newContainer --------------

Container * newContainer(uint32 count, char * name) {
// create a new pointer container capable of holding up to 'count' pointers
uint64 s = sizeof(Container);
Container * c = malloc(s);
if (c) {
c -> name = strdup(name);
c -> data = malloc((count + 1) * sizeof(void *));
c -> max = count;
c -> count = 0;
if (c -> data == NULL) {
allocError(count * sizeof(char *), "cannot allocate a pointer container's data array");
free(c);
c = NULL;
}
}
else {
allocError(s, "cannot allocate a pointer container");
}
return c;
}
// ------------ pushPtrToContainer --------------
void pushPtrToContainer(Container * c, void * ptr) {
// push a pointer into a container. Same as add.
addPtrToContainer(c, ptr);
}
// ------------ pullPtrFromContainer --------------
void * pullPtrFromContainer(Container * c) {
// pull a pointer from a container into ptr; return 1. If none, return 0
if (c -> count > 0) {
char * p = getPtrFromContainer(c, c -> count - 1);
c -> count--;
return p;
}
return NULL;
}
// ------------ getTopPtrInContainer --------------
void * getTopPtrInContainer(Container * c) {
// if there is a pointer, copies top value into ptr and returns 1; else returns 0
uint64 n = c -> count;
if (n == 0) {
return NULL;
}
char * p = getPtrFromContainer(c, n - 1);
return p;
}

_____________________________________________________

teq Update(late May 2019)


It’s taken months, but the rip-up and replace has more or less stabilised.
The rebuild completely rewrote the parsing and triple generation code.
Now, we have two independent symbol table universes. Bizarrely, this simplifies matters.
There’s one symbol table universe for real names in the model. Things like keywords, and register and field names are held in a symbol table hierarchy. The hierarchy is a simple queue of symbol tbales; when we lex-in (as, for example, on encountering a ‘{‘ in code) we create a new symbol table, whack it on the front of the symbol table queue, and insert new symbols in it. We can check simply for duplicate names, clearly. And while searching the queue of symbol tables sounds laborious, it doesn’t seem to be. Meanwhile, lexing back out again is very very quick - just detach the front of the symbol table queue. (In the classic symtab model, we walked through the symtab queue by queue, removing the symbols in each queue which had the current lex level)
There’s another single symbol table used in parsing ‘ordinary code’ - that is, the block-strucred C-like code we use to define behaviour. This is the raw symbol table and it contains token symbols. A token symbol is quite simple:
// first the 'tokSymbol' structure
typedef struct {
qBlock header;// next struct in the hashchain
char * name;// pointer to name
TokClass tokclass;
uint32 prehash;// the prehash
} tokSymbol;
The tokclass field says what sort of beast this is:
typedef enum {
tokClassNone,
tokClassName,
tokClassIntValue,
tokClassHexValue,
tokClassFloatValue,
tokClassPunct,
tokClassOp,
tokClassPP,
tokClassText
} TokClass;
As we parse the code, we look up symbols in the raw symbol table; whenever we find that we dont have an entry, we create a new toksymbol and insert it; the tok class comes direct from the tokeniser. We use trees of operators and toksymbols to represent expressions, and build up statements, which are queued up on a queue belonging (in the adl version) to the current instruction.
We then mechanically convert trees (etc) into raw triples. These are held on a raw triple queue (per instruction); this is generally done with no typechecking nor semantic checking.
After we’ve got a complete queue of raw triples, it’s time to convert them to refine triples, which are essentially the same form but whose operands are symbols in the hierarchical symtab universe. The check and convert step is pretty simple - we have exactly one triple to look at a time, and we know the rules for operands and operators.
When a raw triple is an op_declare we create a new full-blown symbol of the same name, and insert it into the current symtab in the hierarchical symtab queue (we also create an op_declare triple for the symbol) . When a raw triple references symbols, we check they’re declared, and have the right types. We build up the refined triples on the refined triples queue of the instruction.
When we’ve done this for all instructions we can generate the model. We do this, mostly, by fprintf() ing the triples in a format acceptable to a C compiler, into the model.c and model.h files
Current status is that the instruction work seems correct; however, teq does something that sadl never did (beyond understanding the action code) - it also has sections for initialization and for instruction fetch-and-execute; where sadl did this automatically, teq requires you to specify it in the obvious manner:
// the execute-forever loop that the machine performs
operate {
uint32 ptr;
ptr <- iptr;
uint16 instruction;
instrucread(imem, instruction, ptr);
instrucReg <- instruction;
execute(instruction);// decode and execute
iptr <- ptr;
}
We’re not correctly melding the ‘compiled’ version of this into the generated model, at the moment. But it looks like more a style decision that an incomprehensible bug, so things are good.
_____________________________________________________

August Update



Well, that “it’s stabilized” turns out to be not quite true. More rip-up, but perhaps with more hope for stability because the beast is now generating code for executable models which at least looks right. More to do, but where we are now can be summarised as follows:
The teq Processing Flow
Pete Wilson v0.21 August 26 2019
Matches teq teq 0.7v2.4[dev4.1] [August 14 2019]
This document describes how the teq architecture compiler functions, concentrating on the very high level aspects.
The general structure is:
- an architecture description parser, handwritten, for the architecture resources, along with a program parser (most of which we wish to re-use for parsing teq programs) which parses the 'action' definitions - which are just teq program, with the complication that we use the <- operator to mean 'read or write an architecture element (like a register)' rather than 'send or receive a message', as it would be in teq-for-programs
The handwritten program parser works like this:
- it parses the source of the action definition
- using a token (the symbols which are read from the source) symbol table; each Symbol has classification info based on its properties as a Token (a number, a string, a name...)
- with an operator-precedence expression parser, which generates simple trees whose elements are operator plus one or two operands held in that token symbol table. Errors detected are limited to malformed expressions. When reducing an expression, the parser generates new Symbols, which are marked as temporaries. We do not know the type of a temporary at this stage; it must be calculated later. The temps are added to the token symbol table.
A Symbol now looks like this:
typedef struct {
qBlock header;// next struct in the hashchain
char * name;// pointer to name
TokClass tokclass;
uint32 prehash;// the prehash - we've done the hashing computation
// but not reduced to span the symtab size
symClass symclass;
union {
symSubClass subclass; // for when we have a class/subclass relationship
uint32 index; // for when we want simply to count things,
// as in a macro argument
} sub;
} Symbol;
- errors detected in the expression parser are parseError() parseValError()(errors detected directly in the source), (errors detected in processing, like lacking a name or an operator in internal stacks) and badError()(implementation errors, like looking for the name of an operator and finding there isn't any such name)
- throughout the program, we do a lot of dynamic memory allocation. It's not likely to fail, but the standard action if it does is to call allocError()
- the parser generates a list of Statements of various types. Statements are structures of various types. Simple statements,like an assignment statement, have a tree representing the expression parsed.Errors detected include parseErrors()s, like not seeing a '{' where expected in an if or a while.}
- complex declarations (like int32 fred, jim, bert = 0, alf;nt32 fred; int32 jim; int32 bert; bert = 0; int32 alf;) are converted to a sequence of very simple declarations and separate assignment statements (so that would be i). Statements represent the declarations.
- we represent the behaviour of code as a sequence of Triples. A Triple may be queued using the standard teq functions. It contains, most importantly, three pointers to Symbols in the raw symtab, and three pointers to Vars in the refined symtab queue. Basic parsing fills in the Symbol pointers.
A Triple now looks like this:
typedef struct {
qBlock header;
uint32 seq;
uint32 globalSeq;
tripleType type;
opSpec * op;
// the operands for the initial node-to-triple pass, sans typechecking etc
Symbol * symd0;
Symbol * syms0;
Symbol * syms1;
// the operands for the type-checkable (etc) passes
Var * d0;
Var * s0;
Var * s1;
} Triple;
- the statements have their trees or other data structures pointed at by a void pointer to an appropriate structure (several, for example in a while statement there's a controlling expression tree and a sequence of body statements)
- a statement has a single triple queue
- a first pass through the Statements and their queues of Triples creates Vars equivalent to Symbols in each Triple; Vars are added to the refined symbol table structure. The refined structure is just a queue of symbol tables; this represents the block structure of the program in the simples way: when we hit a new scope (such as an inner block) while parsing, we create and add a new symbol table to the front of the queue; on exiting the scope, we remove the scope's symbol table and attach it to something relevant (like the block statement). We provide lookup functions for both the front of queue (to check for within-scope multiple declaration errors) and up the queue (to find names in outer scopes).
- the Vars are created on encountering declaration statements, with the Symbol portion copied across from the declared Symbol. Once created, they are added to the refined symbol table.
- the Vars for temporaries are created and declared during the process of 'converting' raw triples into refined triples. We recognise a temp because its Symbol in the raw symtab is of symClasssymClassTemp . We use checkDeclareTemp()convertTriplesonQ() in the conversion function to see if the destination of the operation is a temp. If it is, we create a Var for it and fill in info marking it as a Var and that it needs its type calculated (symClassVar, symSubComputeType). We then create a new statement declaring it and add that statement in immediately prior to the current statement.
- while converting the Triples to have Vars, we look up the Var in the refined symbol structure. If the Var isn't found, we have an error (a statementError()).
- once we've built the refined Triples, we revisit it and run back down it checking for errors (type errors, operation errors,..). For each triple, we check the Vars exist and see if any are of class symClassVarsymSubComputeType, subclass . If so, we compute the type of the Var from the types of the source operand(s).
The end result is an architecture specification which has a hierarchical definition of the encoding of instructions, with data structures representing the architecture (resources and fields), with each instruction having a program representation consisting of a queue of Statements, each with their Triple queues (etc).
To create a model, we walk the specification, writing a set of .h and .c files which can be linked to pre-written C code to construct an interpreter and an assembler. For the interpreter body, we use nested case statements to implement decode, and use a print tripleAsC() function to translate statements into C.
Futures: We’re looking at centralising the walking, so it’s always done the right way, with Actions being passed to specify conversion, checking, translation, translation to C etc.
_____________________________________________________


October Update (2)



There’s been quite a bit of work to enhance teq since August. The software can now compile ordinary teq programs into a triples form (which has a textual representation we hope will be useful when writing the code generator and in avoiding a linker), and we have made a change to how symbols/variables etc are handled. We have also taken the first steps to specifying implementations.
Here’s a .pdf document with a description of the current situation.

Changes from prior version:
The most significant change is that things like variables, registers, fields etc, all of which need a Symbol to hold their name, no longer have an 'embedded' Symbol in their definition. Rather, the Symbol has one new field, a void * ptr, which points at a separate structure holding relevant information; and that structure has a pointer back to its Symbol.We provide new functions to get the separate structure from a Symbol; this checks the Symbol has the appropriate symclass and subclass.This change was done to help avoid stupid programming errors in treating one sort of structure as another, causing havoc in the symbol tables (when a smallish entity was mistakenly updates as if it were a larger one, bad things happened to memory-adjacent symbols)
_____________________________________________________