Computer architecture has been annoyingly boring for a long time - on the whole, everybody’s doing RISC with a classical MMU and some form of cache coherence. (There are honourable exceptions - the Mill is worth reading about) but in general it’s all samey-samey.

This is about to change as systems which need lots of performance increasingly adopt multicore systems, with core counts running into the tens, hundreds, or thousands. There’s a bunch of examples:

  • Rex Computing has a (4-way VLIW) multicore solution which eschews caches and uses interrupts and ‘remote writes’ for intercore communication. 
  • Adapteva has a similar approach.
  • Kalray also

What these all seem to lack is a basic understanding that these things need to be programmed, and “thinking parallel” when all you have is a sequential programming language is hard. 

While there is progress being made (C++ atomics etc), in general current widely-used programming langauges either don’t support parallelism, or do it in an under-the counter way (such as providing a threads library or equivalent, along with locks and so forth). This generally makes writing correct, efficient, comprehensible code difficult.

An alternative approach is to notice that much “high performance computing” code is nested loops, and when you’re lucky, the iterations of the loops are independent, so that you can ignore identifying parallelism and let a compiler do it for you. Or, you can use an approach like OpenCL - then you identify those loops, rewrite your code as program + kernels (where the kernels capture the loops) and arrange to run the kernels on some collection of compute resource.

A more promising approach is to think of what the physics of this sort of parallel computing rewards and punishes. These multicore systems generally have a rectangular array of cores (each with some local memory) connected up with some form of mesh network - the network may provide one or more independent communications ‘planes’ to allow different sorts of data to proceed concurrently (examples - it can be helpful to have an ACK network separate from a data network, and if you’re moving cachelines-worth of data, it might also help to have an address network and a large data network).

This sort of system is capable of delivering data directly from one core into appropriate resources of another; you’re immediately tempted to think that an architecture which allows program to wait until some new data has arrived, and then process it, would be a natural fit. And so it seems - this style of programming is generally known as message-passing, and has the advantage that you don’t need locks or other shared-data-protecting structures. Also, by providing message queues/buffers in memory or registers, the architecture can speed up communication and synchronisation substantially while reducing power consumption at the same time.

But to include these features as an ordinary part of user mode program support you really want some new architecture. And the architecture tradeoffs for a very high core count system which leverages hardware message passing can be quite different from current mainstream practice.

And so there’s a new day dawning in processor/system architecture. Now all we need is a toolkit to go explore this new world.

As a first step in providing such a toolkit, we’re initiating a series of articles and software tools. The initial stuff provides an existence proof of how to define an architecture, and generate an executable model and an assembler automatically from that specification. The capabilities of this initial tool are exceedingly limited - despite the conversation above, it’s single processor only, for example. But most architects don’t use toolkits like this - they seem to work on squared paper or in text editors, and write plain text documents which seek to describe the architecture, and rely on tool makers to understand what’s written in the way they meant when constructing implementations, verification models and software tools. This inevitably leads to tears, so showing to a wider audience that this stuff needn’t be black magic seems a useful goal.

Look for the simpleADL toolkit to appear here quite soon.

© kiva design groupe • 2017