The idea for Dippy originated onhackaday.io but now this is the main site.
There is now a plan. Organise around it, move all to one site.
M0 to M15 are ROM. M0 is easy toggle, rest are DIP switch.
R0 to R15 are registers/RAM.
3 CYCLE, single address bus, single ported ram/Rom.
Single step mode, push to execute one more step.
Write to ROM to controll the processor, JUMP, HALT, STOP, CONT, FREQ,
Next is to build in yacc, compile to C, write some code as see if works well enough.
A really wacky computer built using DIP switches for ROM, a few shift registers, no RAM, running a Forth-like instruction set and having a minimal transistor count.
Q: Why DIP switches for ROM? A: So that you can visibly see all of the instructions. There will be one LED per bank of DIP switches so that you can see which one is being read and the instruction bus will also have LEDs on it so that you can see that the values on the switch have made it to the processor.
Q: Why shift registers? A: To keep the component count down - registers/RAM and ALU are most of the count for a PISC 16 bit processor. Ideally the shift registers would be physical, needing components only to read or write, more on this below.
Q: Why no RAM? A: The idea is that the shift registers have enough storage to run really basic programs. External RAM violates the completeness of the solution and it wouldn't be visible.
Q: Why Forth-like instruction set? Because it is compact, I've got to squeeeze in as much as possible here.
Q: Why minimal transistor count? A: Partly because of the challange, partly because it is easier for others to replicate and build on this work but mostly because I'm terrible at soldering so unless it's minimial it will never get finished.
10 bit instructions in a 9 bit read-only address space built from 512 10-way dip switches (e.g. eBay).
16 shift registers, holding at least 16 bits each, hopefully 32 bits. These form an ascending Forth data stack and a descending Forth address stack (even though addresses are 10 bits). Two 4 bit registers act as the data and address stack pointers. If not physical shift registers but four components per memory cell (may be 5), then 16 registers at 16 bits would be 1024 components (512 transistors). At 32 bits per register that would be 2048 components (1024 transistors).
The goal is a reasonably inexpensive program storage where it is clear how it operates.
A simple (linear) design would take the 9 bit address bus and invert each line to give 18 signals. Any and all addresses can now be decoded using 9 diodes and one transistor per address. Each diode is wired to either the corresponding data line or its inverse, so that current flows in all cases except the required address. This then feeds a transistor whose output is high when the required address is present. A LED can be used as the load so that there is a visual indication of the address used. The output feeds all the top pins of the DIP switches via diodes, with all lower pins connected to the instruction bus. There's a huge number of DIP switches and an enoumous number of diodes…
An improved design would first get rid of all of the diodes to the instruction bus, then work on better decoding.
I'd really like the registers to be 32 bits as, unusually, with shift registers the component count is independent of the register size, i.e. the register bits come for free (okay, that's assuming no acoustic dissipation and even then the instruction clock does run proportionally slower)
I'd also like to see all the bits. This is difficult with most hardware shift registers. One solution might be to take the input and power a scanning laser, the persistance of vision may show all the bits.
Also ideally it would have a variable clock rate between one cycle every few seconds (so that you can see exactly how everything works) up to hundreds of kiloHertz (the max switching speed of the transistors). Anyway, here are some options:
A 2m tube length would have to store on bit every that's just over 6cm a bit. Let's assume that's six wavelengths, so one is 0.01m and with the speed of sound at 340m/s that makes 34kHz or thereabouts. This seems just about possible, it's in the range of acoustics I know a little bit about. In order to transport the finished machine, and to hang it on the wall, the maximum dimension has to be 2m.
I originally started off with the idea of using 40kHz ultrasonic transducers. However, after playing about a bit I find that they have less than 5kHz bandwidth. In general any piezo transducer has a resonant frequency and so limited bandwidth (ref). Whilst it's great to filter out the low frequency background noise, I need more bandwidth to fit the bits into a 2m tube.
Recently I've found that small electret microphones can have a 30kHz frequency response. The idea of building a 32 bit computer is very appealing, so this is where my effort is going at the moment. potential speakers
The clock rate is dependent on the tube length, so at 2m length that's only 170 instructions a second! It's almost a shame it's not slower, then we'd be able to see the computer working. I'll probably hack in hardware NOPs to make everything run at about 1Hz and be visible.
TVs used to be analog and use ultrasonic delay lines (YouTube: Inside a PAL Delay Line, Delay line memory, Glass Delay Lines Part 2 ) Let's assume that we could store 32 bits in the 64μs delay (use ~4MHz carrier), then that's a 'instruction' rate of about than 16kHz which is acceptable. A bit-sliced ALU would have to run at 32 times faster than that, or 500kHz, which is pushing my skills.
A cheap eBay laser pen claims to have a 10 mile range, that's 53μs, about the same time delay as the PAL line. I can't imagine getting 16 of these working with line of sight or mirrors parallel enough to get multiple reflections, but someone else may know how to (Free-space_optical_communication). It seems that 24 core optic fibre comes in at about $240 per km, so if the rest was built in TTL it may work (much faster than acoustic).
The start of the ionosphere is 75km up, if the bounce was clean (which it won't be) the Shortwave_radio would give a 0.5ms delay. It would have to be quite broadband, but Spread_spectrum >50MHz is licence free. The real problem is the bounce has no chance of being clean. We can detect lasers on the moon and used to bounce communications off the moon before we had satellites. The round trip is about 2.6s, so the clock speed would be low. It also needs GHz or very high power, which runs into Radio spectrum licencing issues.
Cassette tape runs at 4.75 cm/s and can store at 300 or 1200 bps (BBC model B). Say 64 registers at 64 bits each, so need 4096 bits on tape and instruction cycle is about 12s or 4s. Really don't need much tape - only 60cm max. Interesting. Very very slow CPU - if design variable speed read/write then can speed up via driving motor. Push to max speed, 2560 bps and 16 bits in 16 registers - then 10 Hz…
2m of polycarbonate tube with internal diameter of 7mm may support a 3mm bubble. If we leave about four bubble diameters so they don't merge (just a guess), then we may fit 128 bits into one tube. Very visual - I want visual memory if at all possible, https://hackaday.com/2019/12/20/tiny-bubbles-in-the-clock/. As solonoid valves are used for the air bubbles this tilts in the direction of a relay computer…
Phosphorescence using Strontium aluminate. One large circle with (say) 1025+-1 dots of phosphorescent paint (or maybe 32 circles of 33+1 dots so that all registers can be seen). These are read, erased and written using light. A red LED erases, the difference of light levels before and after erasing say whether the dot was charged or not. On the next clock cycle the dot is written again. Green is the strongest, red LED erases. This has the huge advantage that the memory is visible and the clock speed is independent of the memory - phosphorescence can last hours and we only need minutes. If the disk is static and the read/write head rotates then the memory will be visible. Holes can be punched in the disk radially to the dots to give timings, then the clock is derived from the rotor speed. Alternatively use a tape loop, that's a lot like Colossus.
Not well thought out, but a rotating metal drum with magnets in one of two positions might work. The idea is that it would be possible to read and write the position using electromagnets.
I have a two-transistor memory cell which can drive a LED. It should be possible to store the output on a capacitor and so chain these. The idea is that the capacitor stores the previous output and all the read select lines are pulsed at once, so moving a bit pattern one step down. This may well require an additional resistor so that the capacitors don't change state whilst the memory cells are updating.
LED strip lights (WS2812B family or ws2812 alts) are cheating, but at 5p a memory cell and already wired up it's very tempting, especially if a 32 bit or 64 bit processor. The 4-connector options (GND, VCC, CLK, DATA) e.g. APA102 look better than 3 (GND, VCC, DATA) as maybe DATA can be fudged to 0V (black) or VCC (white) on each CLK. However, APA102 are hard to come by, at about 17p/bit. The APA102 replacement, SK9822 is bad as it needs a signal to display the result. Alternatively, use ws2812b/ws2813 and generate two DIN signals, one for 0 and one for 1, then use transistors to switch in the appropriate signal at the start of the shift register - it's still 450Hz not 17kHz of APA102.
In an ideal world the computer would operate at very low frequencies and all state would be visible. However, many of the options above preclude this, e.g. shift registers from acoustic delay lines. persistance of vision may work well here. The sysmem can be halted between instructions even if delay lines have to keep ciruclating to keep information stored. Lasers wired to output bits and a mirror rotating using a stepper motor should project the internal state onto a display (e.g. wall) and so make everything visible.
|0||0||JSR to 9 bit address ending in 0|
|0||1||LOAD - 8 bit immediate load|
|1||0||condition||branch relative: -16 to +15|
|1||1||condition||basic instruction (5 bit)|
If top two bits are clear, then JSR to the remaining address (even addresses only). Thus this is subroutine linked Forth but without the overhead of the JSR instruction.
If next bit clear, load immediate the lower 8 bits (possibly sign extended - TBD)
Everything else is 3 bit conditional (1, lt0, le0, eq0, ne0, ge0, gt0, 0). Half of the space is for relative branching, of 5 bits (-16 to +15). The remaining is the basic instruction space:
If can keep the basic instructions to 16 then I can rejig the instruction space so that JSR doesn't have to end in zero. But if I use DIP switches to decode the instructions then it would be nice to allow others to add instructions just by setting these switches. Here is full 9 bit JSR addressing:
|0||JSR to full 9 bit address|
|1||0||8 bit immediate load|
|1||1||condition||0||branch relative -8 to +7|
|1||1||condition||1||basic instruction (4 bit)|
I find a C like notation very convenient:
|JSR||*R– = P ; P = I|
|RET||P = *R++|
|LOAD(X)||*D++ = X|
|B(OFFSET)||IF condition THEN P += OFFSET|
I've not yet written an emulator, or even fixed the instruction set, so none of this is final. Nevertheless, it's useful to write some code to see what is missing.
LOAD(0) :loop INC DUP OUT B(:loop)
Learning: DUP is a very common instruction and it may well be worth having a DUP-OUT as well as a DUP instruction. On the other hand, DUP OUT RET is only 3 or 4 words. Is 4x slower and 4x the memory worth it? Maybe it depends on what the microcode decode looks like and how much instruction space there is. AND/OR/XOR/ADD/SUB/D2R are all candicaes for an extra DUP or two (e.g. DUP2ADD which is a non-destructive ADD).
IN IN ADD OUT
Learning: Input has to be buffered, that is the processor should stop if input is not yet available. Perhaps input is done with a 0/1 toggle switch and an add to buffer. Once it's full then the processor can continue. Another add-to-buffer switch which adds 8 copies may well be useful as a 32 bit input is probably all 1s or all 0s in the top bits.
It would be nice to have more than one register as output. Maybe an RPi will feed the input and store all output?
There is only a few registers and no carry bit, so this is just 32bit by 32bit giving a 32bit result. With no LSB its hard to peel off the low bits and stop when the result is zero, which is a shame as most invocations won't be full width.
Simple first pass with LSR - return stack stores accumulator, works best with last arg +ve (can test and switch):
def MUL LOAD(0) D2R # set accumulator to zero :loop DUP LOAD(1) AND BZ(:skip) # test low bit and skip hard work if not set DUPDUPADD R2D ADD D2R :skip D2R DUP ADD R2D # double the first arguement LSR # halve the second argument BNZ(:loop) # loop if not zero DROP DROP # get rid of both arguments R2D # retrive result RET # and exit happy
def DUPDUPADD OVER OVER ADD RET # this would be much better with a non-consuming ADD def OVER SWAP DUP D2R SWAP R2D RET
Learning: Really need both LSR and non-destructive ALU operations. What is a good naming convention for ALU operations that implicitly encodes the data stack changes?
This is a major challenge as there is no LSR instruction (as this is very hard on a serial ALU).