The idea for Dippy originated onhackaday.io but now this is the main site.
A really wacky computer built using DIP switches for ROM, a few shift registers, no RAM, running a Forth-like instruction set and having a minimal transistor count.
Q: Why DIP switches for ROM? A: So that you can visibly see all of the instructions. There will be one LED per bank of DIP switches so that you can see which one is being read and the instruction bus will also have LEDs on it so that you can see that the values on the switch have made it to the processor.
Q: Why shift registers? A: To keep the component count down - registers/RAM can easily dominate the component count. Ideally the shift registers would be physical, needing components only to read or write, more on this below. A secondary consideration is that shift registers imply a bit serial ALU, which simplifies that a lot.
Q: Why no RAM? A: The idea is that the shift registers have enough storage for really basic programs. External RAM violates the completeness of the solution and it wouldn't be visible.
Q: Why Forth-like instruction set? Because it is compact, I've got to squeeeze in as much as possible here.
Q: Why minimal transistor count? A: Partly because of the challange, partly because it is easier for others to replicate and build on this work and largely because I'm terrible at soldering so unless it's minimial it may never get finished.
10 bit instructions in a 9 bit read-only address space built from 512 10-way dip switches (e.g. eBay).
16 shift registers, holding at least 16 bits each, hopefully 32 bits. These form an ascending Forth data stack and a descending Forth address stack (even though addresses are 10 bits). Two 4 bit registers act as the data and address stack pointers. If four components per memory cell (may be 5), then 16 registers at 16 bits would be 1024 components (512 transistors). At 32 bits per register that would be 2048 components (1024 transistors).
The goal is a reasonably inexpensive program storage where it is clear how it operates.
A simple (linear) design would take the 9 bit address bus and invert each line to give 18 signals. Any and all addresses can now be decoded using 9 diodes and one transistor per address. Each diode is wired to either the corresponding data line or its inverse, so that current flows in all cases except the required address. This then feeds a transistor whose output is high when the required address is present. A LED can be used as the load so that there is a visual indication of the address used. The output feeds all the top pins of the DIP switches via diodes, with all lower pins connected to the instruction bus. There's a huge number of DIP switches and an enoumous number of diodes…
An improved design would first get rid of all of the diodes to the instruction bus, then work on better decoding.
I've no good answer for this yet. Ideally you'd be able to see all the digits and it would have a variable clock rate between one cycle every few seconds (so that you can see exactly how everything works) up to hundreds of kiloHertz (the max switching speed of the transistors). Anyway, here are some options:
Let's start off assuming that we can use acoustics to store data:
I'd really like the registers to be 32 bits as, unusually, the component count is independent of the register size, i.e. the register bits come for free (okay, that's assuming no acoustic dissipation and even then the instruction clock does run proportionally slower). At 2m tube length, that's just over 6cm a bit. Let's assume that's six wavelengths, so one is 0.01m and with the speed of sound at 340m/s that makes 34kHz or thereabouts. This seems possible, it's in the range of acoustics I know a little bit about. In order to transport the finished machine, and to hang it on the wall, the maximum dimension has to be 2m.
I originally started off with the idea of using 40kHz ultrasonic transducers. However, after playing about a bit I find that they have less than 5kHz bandwidth. In general any piezo transducer has a resonant frequency and so limited bandwidth (ref). Whilst it's great to filter out the low frequency background noise, I need more bandwidth to fit the bits into a 2m tube.
Recently I've found that small electret microphones can have a 30kHz frequency response. The idea of building a 32 bit computer is very appealing, so this is where my effort is going at the moment. potential speakers
The clock rate is dependent on the tube length, so at 2m length that's only 170 instructions a second! It's almost a shame it's not slower, then we'd be able to see the computer working. I'll probably hack in hardware NOPs to make everything run at about 1Hz and be visible.
I'd like the computer to run for 10 hours without error. Assuming 32 bits per register and 16 registers then that's no errors in 32 * 16 * 170 * 10 * 60 * 60 = 6120000. So we need a bit error rate of about 1 in 10^8, which will be quite tough give that we are competing with environmental noise.
TVs used to be analog. One PAL scan line is 64 μs https://www.youtube.com/watch?v=bsk4WWtRx6M https://www.youtube.com/watch?v=-qerYLM-eEg and that's 720 'pixels' - lots of bits! I haven't found a source for these yet, but if I do then that would be great.
The physics behind an analog TV delay line is interesting, 1.3 μs delay from 2,816 turns of enamelled copper wire between two conducting tubes - http://www.hawestv.com/mtv_color/delayline.htm. Not easy to construct and just not enough delay to store enough bits.
A cheap eBay laser pen claims to have a 10 mile range, that's 53 μs, about the same time delay as the PAL line. I can't imagine getting 16 of these working with line of sight or mirrors parallel enough to get multiple reflections, but someone else may know how to (Free-space_optical_communication). It seems that 24 core optic fibre comes in at about $240 per km, so if the rest was built in TTL it may work (much faster than acoustic).
The start of the ionosphere is 75km up, if the bounce was clean (which it won't be) the Shortwave_radio would give a 0.5ms delay. It would have to be quite broadband, but Spread_spectrum >50MHz is licence free. The real problem is the bounce has no chance of being clean. We can detect lasers on the moon…
Cassette tape: It runs at 4.75 cm/s and can store at 300 or 1200 bps (BBC model B). Say 64 registers at 64 bits each, so need 4096 bits on tape and instruction cycle is about 12s or 4s. Really don't need much tape - only 60cm max. Interesting. Very very slow CPU - if design variable speed read/write then can speed up via driving motor. Push to max speed, 2560 bps and 16 bits in 16 registers - then 10 Hz…
Persistance of vision display of bits? Needs an instruction cycle of > 10Hz. Would be cool - can see what is going on even if stored in non-visable memory. Slow down and see bits flipping, speed up and see all memory - really cool!
Not well thought out, but a rotating metal drum with magnets in one of two positions might work. The idea is that it would be possible to read and write the position using electromagnets.
I have a two-transistor memory cell which can drive a LED. It should be possible to store the output on a capacitor and so chain these. The idea is that the capacitor stores the previous output and all the read select lines are pulsed at once, so moving a bit pattern one step down. This may well require an additional resistor so that the capacitors don't change state whilst the memory cells are updating.
LED strip lights (WS2812B family) are somewhat of an overkill, but at 5p a memory cell and already wired up it's very tempting, especially if a 32 bit or 64 bit processor. The 4-connector options (GND, VCC, CLK, DATA) e.g. APA102 look better than 3 (GND, VCC, DATA) as maybe DATA can be fudged to 0V (black) or VCC (white) on each CLK. However, APA102 are hard to come by, at about 17p/bit. The APA102 replacement, SK9822 is bad as it needs a signal to display the result. Alternatively, use ws2812b/ws2813 and generate two DIN signals, one for 0 and one for 1, then use transistors to switch in the appropriate signal at the start of the shift register - it's still 450Hz not 17kHz of APA102.
|0||0||JSR to 9 bit address ending in 0|
|0||1||LOAD - 8 bit immediate load|
|1||0||condition||branch relative: -16 to +15|
|1||1||condition||basic instruction (5 bit)|
If top two bits are clear, then JSR to the remaining address (even addresses only). Thus this is subroutine linked Forth but without the overhead of the JSR instruction.
If next bit clear, load immediate the lower 8 bits (possibly sign extended - TBD)
Everything else is 3 bit conditional (1, lt0, le0, eq0, ne0, ge0, gt0, 0). Half of the space is for relative branching, of 5 bits (-16 to +15). The remaining is the basic instruction space:
If can keep the basic instructions to 16 then I can rejig the instruction space so that JSR doesn't have to end in zero. But if I use DIP switches to decode the instructions then it would be nice to allow others to add instructions just by setting these switches. Here is full 9 bit JSR addressing:
|0||JSR to full 9 bit address|
|1||0||8 bit immediate load|
|1||1||condition||0||branch relative -8 to +7|
|1||1||condition||1||basic instruction (4 bit)|
I find a C like notation very convenient:
|JSR||*R– = P ; P = I|
|RET||P = *R++|
|LOAD(X)||*D++ = X|
|B(OFFSET)||IF condition THEN P += OFFSET|
I've not yet written an emulator, or even fixed the instruction set, so none of this is final. Nevertheless, it's useful to write some code to see what is missing.
LOAD(0) :loop INC DUP OUT B(:loop)
Learning: DUP is a very common instruction and it may well be worth having a DUP-OUT as well as a DUP instruction. On the other hand, DUP OUT RET is only 3 or 4 words. Is 4x slower and 4x the memory worth it? Maybe it depends on what the microcode decode looks like and how much instruction space there is. AND/OR/XOR/ADD/SUB/D2R are all candicaes for an extra DUP or two (e.g. DUP2ADD which is a non-destructive ADD).
IN IN ADD OUT
Learning: Input has to be buffered, that is the processor should stop if input is not yet available. Perhaps input is done with a 0/1 toggle switch and an add to buffer. Once it's full then the processor can continue. Another add-to-buffer switch which adds 8 copies may well be useful as a 32 bit input is probably all 1s or all 0s in the top bits.
It would be nice to have more than one register as output. Maybe an RPi will feed the input and store all output?
There is only a few registers and no carry bit, so this is just 32bit by 32bit giving a 32bit result. With no LSB its hard to peel off the low bits and stop when the result is zero, which is a shame as most invocations won't be full width.
Simple first pass with LSR - return stack stores accumulator, works best with last arg +ve (can test and switch):
def MUL LOAD(0) D2R # set accumulator to zero :loop DUP LOAD(1) AND BZ(:skip) # test low bit and skip hard work if not set DUPDUPADD R2D ADD D2R :skip D2R DUP ADD R2D # double the first arguement LSR # halve the second argument BNZ(:loop) # loop if not zero DROP DROP # get rid of both arguments R2D # retrive result RET # and exit happy
def DUPDUPADD OVER OVER ADD RET # this would be much better with a non-consuming ADD def OVER SWAP DUP D2R SWAP R2D RET
Learning: Really need both LSR and non-destructive ALU operations. What is a good naming convention for ALU operations that implicitly encodes the data stack changes?
This is a major challenge as there is no LSR instruction (as this is very hard on a serial ALU).