How to start up a Startup

This is all about the practical details of getting a startup off the ground. It's a personal view based on experience (my UK list), many things are easy when you've done them a few times but can catch you out first time.

Practice having good ideas:

Pick the best idea:

Good, that's the easy bit done!

Now do what is cheap and easy without spending much money (less than £1,000). The aim of this stage is to flesh out the idea without you becoming emotionally attached to it, you should be able to decide that it's not going to work and walk away:

Now find some quiet time to ask yourself: Is it really a good idea?

If you are really sure you can take the pain, then go for it:

Congratulations, you now have a Startup!

Things to remember/read:

Example: Neuracore.ai

Neuracore was incorporated on 12 December 2018 and dissolved on 30 April 2019.

What is the company going to do?

Design and license cores for the efficient execution of neural networks. This will enable “AI Everywhere”.

Why is it unique?

Extreme power efficiency obtained though low precision integer operation (with supporting software): single propagation delay addition and very low propagation delay multiplication.

How is it going to be successful?

Licence technology into a massive market, from servers though laptops, phones and smart watches.

Draft a quick business plan so that you have a story to tell others

Hardware acceleration for Neural Nets is already huge, the whole current wave of Deep Learning happened because GPUs became cheap enough. Google have enough services that need NNs to build their own ASIC, the TPU. Facebook is driven by AI, the trend towards increasing automation is massive and well known.

Aim for getting it out there in 5 years - any sooner and FPGA will dominate, any later and too much risk.

Use AI index 2018 annual report for evidence of AI gold rush. Neuracore sells the shovels “During the gold rush its a good time to be in the pick and shovel business” Mark Twain

Competitors

From: The Great Debate of AI Architecture

Idea killers

If we assume that neural nets will be a major consumption of power in the future, and that power is limited by convenience (on a phone) or cost (servers) or CO2 emissions (climate change) then there is the case for a power efficient hardware implementation of neural networks.

Technical Summary

Problem statement/Diagnosis

DNNs are everywhere and are growing in popularity, however the popular hardware is very general and not power efficient. This limits both the scale which can be trained and the scope for deployment. Typically a 2 slot PCIe card can consume 300W and a small number of them can fit in a server. GPUs from Nvidia are the current favourite, these perform fp16 calculations (was fp32) using a dedicated architecture of local SIMD processors and local data. FPGAs are also receiving more attention, they are good at convolutional neural networks (say why). Any 10x improvement over current technology must both reduce the transistor count (so as to reduce power) and be very memory bandwidth efficient (so as not to have a memory bottleneck). The field is moving fast, so it must be easily adoptable in a short time period.

In order to make an impact any solution must be complete, that is almost invisible to the user. It needs to improve on the three major operations:

Guiding Principles
Guiding Principle Why
Minimise power Aids deployability: (1) researchers get more power so can build bigger models so will buy (2) sells into areas not currently accessible (e.g. mobile). Cost and transistor count probably correlate with power, but they are secondary considerations
Scalable 200W for data centre, 20W for laptop and 2W for phone
Sufficiently flexible Blocker: if can't implement what's needed then it won't be used
State of art results Blocker: if better results elsewhere then people will go elsewhere
Easily trainable Blocker: if not TensorFlow/PyTorch then adoption will be too slow

Rejected and company closed

After considering many designs, including analog and ternary weights, I ended up with 4 bit weights and activations. This achieves the goals albeit uncomfortably similar to the TPU. The scale of work needed to make the trainsition from fp32/fp16 to 4bit is too great - the first prototype would be noticed by the giants and the company would be overtaken (defending IP is very expensive). This could well lead to a forced sale which isn't great for anyone (expecially founders/Ordinary share holders).

Start October 2018, end February 2019, minimal external costs.

EDIT: Reopened 4 Nov 2021