User Tools

Site Tools

Action disabled: source

The Blinking Computer

The Blinking Computer is a easy-to-build educational computer. As the name suggests, the idea is that you can write code and see it execute at every level, that means discrete components and lots of LEDs. The design has evolved very many times since my Prof forbid me from canablising the old departmental Strowger switch telephone exchange in the late 1980's. Right now there are 32 words of DRAM and the program memory is a tape loop, just like Colossus.

It's not built yet (and it may never be), but this is the plan:

  • It's an 8 bit processor with only 32 bytes RAM but loads of program memory
  • There are no registers, only a single bank of data memory (the program memory is completely separate)
  • All instructions are of the form:
    • cnd Mi = Mj OP Mk
      • i, j, k are 6 bit integers
      • cnd indicates that all instructions are conditionally executed
    • The bottom 5 bits (M0 to M31) are DRAM
      • Probably this will based on Homebrew Dynamic RAM but with a single bipolar transistor as the gate. The LEDs are optional, the reference design has them to show the state of everything.
      • It may be based on Yann Guidon's single capacitor, two diode DRAM expecially if the read diode could be a LED
      • It's possible/probable that there will be a DRAM refresh circuit. Maybe every read will force a write. Maybe random values will be read forcing a refresh of all.
      • Maybe the LED will only light during read. Maybe this will be smoothed by phosphoresce.
      • Core memory results in minimal components, but it's not easily visible (mangnetic viewing film isn't senstitive enough)
    • The next 4 bits are constants 0 to 15 implemented by wiring the address to the data lines
    • The next 12 values are interesting constants, probably some will be hard wired, some DIP switch, some toggle switch for program input
    • M60/M61 are stack pointer SP0, as M62/M63 form SP1. Both are 5 bits wide
      • Reading/writing M60 or M62 works as any other memory location (execpt the bit width)
      • Reading/writing M61 or M63 results in the contents of M60/M62 being placed on the address bus, so the memory accessed is in the DRAM range of M0 to M31
    • Writing to M32 results in a JUMP to the label specified
    • Writing to M33 writes to the clock, including a HALT
    • There is no program counter! Instead the tape loop has labels.
      • If a JUMP occurs the tape spins until the desired label is read.
        • Not all instructions have a label so there are more instructions than the data width
        • This allows for duplicate labels!
    • One of the 'ALU' ops is to bypass the APU and construct the result from the bits of i and j, so implementing a load immediate instruction.
    • There's a hack for two ROM registers to enable indirects - this needs clarification when I remember it…
  • The instruction rate is intentionally slow. 1Hz is maybe typical, everything should be visible. There will be a single step button that is active after HALT so that everything can be as slow as needed.
  • There will be a six phase clock:
    • Phase 0: read the instruction from the tape and decode
    • Phase 1: read Mj to a latch
    • Phase 2: read Mk to a latch
    • Phase 3: run the ALU
    • Phase 4: write Mi
    • Phase 5: wait for tape to spin - this is the rate limiting step
  • There exists several status flags:
    • the carry bit
    • whether the last ALU result was zero
    • whether the last ALU result had the top bit set (negative)
    • ASL is hardwired in parallel, the rest are via a 1-bit serial ALU
    • There is a carry flag, ADD/SUB ignore it, ADDC/SUBC use it
  • Conditional execution is one of: never, always, ==0, !=0, <0, ⇐0, >=0, >0
  • The construction cost and time are very important. Ideally the bill of materials should come in at under £100 so that schools can buy and build it.
    • It's not a minimal processor, it should understandable with school level electronics and be able to run complex programs (slowly!)
    • I expect that laminated A4 will work (print A4, laminate, punch holes, spaghetti wire through holes)
  • Input is likely to be from a simple scanner. A4 pages are taped into a loop, a motor runs until the label is reached, photodiodes/photoresistors read the instruction.
  • Output could be a DIY pen plotter, eight marker pens on solonoids/springs, A4 paper, stepper motor.
  • Some permissive FOSS licence will be adopted (e.g. Apache2/MIT) and experimentation and evolution are encouraged

Next is to build in pyRTL and write some code to see if works well enough.


SIMPL is a Simple IMPerative Language. It's deliberatly simple, the aim is to show how languages and complilation work in a very easily ingestible way.

statements are of the form: `a = function(b c)` where a, b and c are variables. b and/or c can of course be other function calls, so 2x+1 is `add(mul(x 2) 1)`

Variables follow :

Type size/bytes Syntax Example
boolean 1 True|False True
byte 1 \d+[U] 42
short 2 \d+[U]S 65535US
long 4 \d+[U]L -2147483647L
character 1 '\w' 'a'
string 1+ “\w*” “Hello World!”
float 3 \d+.\d*[e[-]\d+] 3.1415926535

Indentation defines a code block so allow us to define functions and implement control flow:

def mul2(x)
  return mul(x 2)

and also to implement control flow:

if eq(answer 42) 
  print("The Answer is 42")
  print("The Answer is not 42")

a = 0
while lt(a 42)


BELOW here is old, I need to rewrite to include the above, link in all the archive stuff below and link in the other old stuff at



A really wacky computer built using DIP switches for ROM, a few shift registers, no RAM, running a Forth-like instruction set and having a minimal transistor count.

Q: Why DIP switches for ROM? A: So that you can visibly see all of the instructions. There will be one LED per bank of DIP switches so that you can see which one is being read and the instruction bus will also have LEDs on it so that you can see that the values on the switch have made it to the processor.

Q: Why shift registers? A: To keep the component count down - registers/RAM and ALU are most of the count for a PISC 16 bit processor. Ideally the shift registers would be physical, needing components only to read or write, more on this below.

Q: Why no RAM? A: The idea is that the shift registers have enough storage to run really basic programs. External RAM violates the completeness of the solution and it wouldn't be visible.

Q: Why Forth-like instruction set? Because it is compact, I've got to squeeeze in as much as possible here.

Q: Why minimal transistor count? A: Partly because of the challange, partly because it is easier for others to replicate and build on this work but mostly because I'm terrible at soldering so unless it's minimial it will never get finished.

Rough specs

10 bit instructions in a 9 bit read-only address space built from 512 10-way dip switches (e.g. eBay).

16 shift registers, holding at least 16 bits each, hopefully 32 bits. These form an ascending Forth data stack and a descending Forth address stack (even though addresses are 10 bits). Two 4 bit registers act as the data and address stack pointers. If not physical shift registers but four components per memory cell (may be 5), then 16 registers at 16 bits would be 1024 components (512 transistors). At 32 bits per register that would be 2048 components (1024 transistors).

DIP switch ROM

The goal is a reasonably inexpensive program storage where it is clear how it operates.

A simple (linear) design would take the 9 bit address bus and invert each line to give 18 signals. Any and all addresses can now be decoded using 9 diodes and one transistor per address. Each diode is wired to either the corresponding data line or its inverse, so that current flows in all cases except the required address. This then feeds a transistor whose output is high when the required address is present. A LED can be used as the load so that there is a visual indication of the address used. The output feeds all the top pins of the DIP switches via diodes, with all lower pins connected to the instruction bus. There's a huge number of DIP switches and an enoumous number of diodes…

An improved design would first get rid of all of the diodes to the instruction bus, then work on better decoding.

Shift Registers

I'd really like the registers to be 32 bits as, unusually, with shift registers the component count is independent of the register size, i.e. the register bits come for free (okay, that's assuming no acoustic dissipation and even then the instruction clock does run proportionally slower)

I'd also like to see all the bits. This is difficult with most hardware shift registers. One solution might be to take the input and power a scanning laser, the persistance of vision may show all the bits.

Also ideally it would have a variable clock rate between one cycle every few seconds (so that you can see exactly how everything works) up to hundreds of kiloHertz (the max switching speed of the transistors). Anyway, here are some options:

Acoustic tube

A 2m tube length would have to store on bit every that's just over 6cm a bit. Let's assume that's six wavelengths, so one is 0.01m and with the speed of sound at 340m/s that makes 34kHz or thereabouts. This seems just about possible, it's in the range of acoustics I know a little bit about. In order to transport the finished machine, and to hang it on the wall, the maximum dimension has to be 2m.

I originally started off with the idea of using 40kHz ultrasonic transducers. However, after playing about a bit I find that they have less than 5kHz bandwidth. In general any piezo transducer has a resonant frequency and so limited bandwidth (ref). Whilst it's great to filter out the low frequency background noise, I need more bandwidth to fit the bits into a 2m tube.

Recently I've found that small electret microphones can have a 30kHz frequency response. The idea of building a 32 bit computer is very appealing, so this is where my effort is going at the moment. potential speakers

The clock rate is dependent on the tube length, so at 2m length that's only 170 instructions a second! It's almost a shame it's not slower, then we'd be able to see the computer working. I'll probably hack in hardware NOPs to make everything run at about 1Hz and be visible.

PAL 64μs delay line

TVs used to be analog and use ultrasonic delay lines (YouTube: Inside a PAL Delay Line, Delay line memory, Glass Delay Lines Part 2 ) Let's assume that we could store 32 bits in the 64μs delay (use ~4MHz carrier), then that's a 'instruction' rate of about than 16kHz which is acceptable. A bit-sliced ALU would have to run at 32 times faster than that, or 500kHz, which is pushing my skills.

Massive laser

A cheap eBay laser pen claims to have a 10 mile range, that's 53μs, about the same time delay as the PAL line. I can't imagine getting 16 of these working with line of sight or mirrors parallel enough to get multiple reflections, but someone else may know how to (Free-space_optical_communication). It seems that 24 core optic fibre comes in at about $240 per km, so if the rest was built in TTL it may work (much faster than acoustic).

Radio bounce

The start of the ionosphere is 75km up, if the bounce was clean (which it won't be) the Shortwave_radio would give a 0.5ms delay. It would have to be quite broadband, but Spread_spectrum >50MHz is licence free. The real problem is the bounce has no chance of being clean. We can detect lasers on the moon and used to bounce communications off the moon before we had satellites. The round trip is about 2.6s, so the clock speed would be low. It also needs GHz or very high power, which runs into Radio spectrum licencing issues.

Cassette tape

Cassette tape runs at 4.75 cm/s and can store at 300 or 1200 bps (BBC model B). Say 64 registers at 64 bits each, so need 4096 bits on tape and instruction cycle is about 12s or 4s. Really don't need much tape - only 60cm max. Interesting. Very very slow CPU - if design variable speed read/write then can speed up via driving motor. Push to max speed, 2560 bps and 16 bits in 16 registers - then 10 Hz…

Falling water drops

A soloniod can release/retain a water drop, the presence of which can be detected (say) 1m below. It turns out that timing water drops is really difficult.

Real bubble memory

2m of polycarbonate tube with internal diameter of 7mm may support a 3mm bubble. If we leave about four bubble diameters so they don't merge (just a guess), then we may fit 128 bits into one tube. Very visual - I want visual memory if at all possible, As solonoid valves are used for the air bubbles this tilts in the direction of a relay computer…

Phosphorescenct tape loop

Phosphorescence using Strontium aluminate. One large circle with (say) 1025+-1 dots of phosphorescent paint (or maybe 32 circles of 33+1 dots so that all registers can be seen). These are read, erased and written using light. A red LED erases, the difference of light levels before and after erasing say whether the dot was charged or not. On the next clock cycle the dot is written again. Green is the strongest, red LED erases. This has the huge advantage that the memory is visible and the clock speed is independent of the memory - phosphorescence can last hours and we only need minutes. If the disk is static and the read/write head rotates then the memory will be visible. Holes can be punched in the disk radially to the dots to give timings, then the clock is derived from the rotor speed. Alternatively use a tape loop, that's a lot like Colossus.

Discrete physical options

Not well thought out, but a rotating metal drum with magnets in one of two positions might work. The idea is that it would be possible to read and write the position using electromagnets.

Discrete electronic options

I have a two-transistor memory cell which can drive a LED. It should be possible to store the output on a capacitor and so chain these. The idea is that the capacitor stores the previous output and all the read select lines are pulsed at once, so moving a bit pattern one step down. This may well require an additional resistor so that the capacitors don't change state whilst the memory cells are updating.

LED strip lights (WS2812B family or ws2812 alts) are cheating, but at 5p a memory cell and already wired up it's very tempting, especially if a 32 bit or 64 bit processor. The 4-connector options (GND, VCC, CLK, DATA) e.g. APA102 look better than 3 (GND, VCC, DATA) as maybe DATA can be fudged to 0V (black) or VCC (white) on each CLK. However, APA102 are hard to come by, at about 17p/bit. The APA102 replacement, SK9822 is bad as it needs a signal to display the result. Alternatively, use ws2812b/ws2813 and generate two DIN signals, one for 0 and one for 1, then use transistors to switch in the appropriate signal at the start of the shift register - it's still 450Hz not 17kHz of APA102.

Making shift registers visible

In an ideal world the computer would operate at very low frequencies and all state would be visible. However, many of the options above preclude this, e.g. shift registers from acoustic delay lines. persistance of vision may work well here. The sysmem can be halted between instructions even if delay lines have to keep ciruclating to keep information stored. Lasers wired to output bits and a mirror rotating using a stepper motor should project the internal state onto a display (e.g. wall) and so make everything visible.

Instruction set

b10 b9 b8-b6 b5-b1
0 0 JSR to 9 bit address ending in 0
0 1 LOAD - 8 bit immediate load
1 0 condition branch relative: -16 to +15
1 1 condition basic instruction (5 bit)

If top two bits are clear, then JSR to the remaining address (even addresses only). Thus this is subroutine linked Forth but without the overhead of the JSR instruction.

If next bit clear, load immediate the lower 8 bits (possibly sign extended - TBD)

Everything else is 3 bit conditional (1, lt0, le0, eq0, ne0, ge0, gt0, 0). Half of the space is for relative branching, of 5 bits (-16 to +15). The remaining is the basic instruction space:

  • IN input to top of stack
  • OUT output top of stack
  • RET return from JSR
  • NOT bitwise invert top of data stack
  • INC increment the value at the top of the data stack
  • DEC decrement the value at the top of the data stack
  • DROP decrement data pointer
  • SWAP swap top two items on data stack
  • AND/OR/XOR/ADD/SUB operate on top two items and leave one
  • D2R/R2D/R stack manipulation

If can keep the basic instructions to 16 then I can rejig the instruction space so that JSR doesn't have to end in zero. But if I use DIP switches to decode the instructions then it would be nice to allow others to add instructions just by setting these switches. Here is full 9 bit JSR addressing:

b10 b9 b8-b6 b5 b4-b1
0 JSR to full 9 bit address
1 0 8 bit immediate load
1 1 condition 0 branch relative -8 to +7
1 1 condition 1 basic instruction (4 bit)


I find a C like notation very convenient:

  • P is the program counter, P++ means increment the program counter
  • D is the data stack, D++ increments the data stack (– decrements)
  • R is the return stack, R– decrements the return stack (for pushing a value as it's a descending stack)
  • *P/*D/*R is the value of the data at the program/data/return counter
  • Z is a flag indicating that the value of the last ALU instruction was zero
  • N is a flag indicating that the value of the last ALU instruction was negative
INSTRUCTION Register movement
JSR *R– = P ; P = I
RET P = *R++
LOAD(X) *D++ = X

Make believe code

I've not yet written an emulator, or even fixed the instruction set, so none of this is final. Nevertheless, it's useful to write some code to see what is missing.

Flash some lights

LOAD(0) :loop INC DUP OUT B(:loop) 

Learning: DUP is a very common instruction and it may well be worth having a DUP-OUT as well as a DUP instruction. On the other hand, DUP OUT RET is only 3 or 4 words. Is 4x slower and 4x the memory worth it? Maybe it depends on what the microcode decode looks like and how much instruction space there is. AND/OR/XOR/ADD/SUB/D2R are all candicaes for an extra DUP or two (e.g. DUP2ADD which is a non-destructive ADD).

Add two numbers


Learning: Input has to be buffered, that is the processor should stop if input is not yet available. Perhaps input is done with a 0/1 toggle switch and an add to buffer. Once it's full then the processor can continue. Another add-to-buffer switch which adds 8 copies may well be useful as a 32 bit input is probably all 1s or all 0s in the top bits.

It would be nice to have more than one register as output. Maybe an RPi will feed the input and store all output?


There is only a few registers and no carry bit, so this is just 32bit by 32bit giving a 32bit result. With no LSB its hard to peel off the low bits and stop when the result is zero, which is a shame as most invocations won't be full width.

Simple first pass with LSR - return stack stores accumulator, works best with last arg +ve (can test and switch):

def MUL
LOAD(0) D2R # set accumulator to zero
DUP LOAD(1) AND BZ(:skip) # test low bit and skip hard work if not set
D2R DUP ADD R2D # double the first arguement 
LSR # halve the second argument
BNZ(:loop) # loop if not zero
DROP DROP # get rid of both arguments
R2D # retrive result
RET # and exit happy


def DUPDUPADD OVER OVER ADD RET  # this would be much better with a non-consuming ADD

Learning: Really need both LSR and non-destructive ALU operations. What is a good naming convention for ALU operations that implicitly encodes the data stack changes?


This is a major challenge as there is no LSR instruction (as this is very hard on a serial ALU).