User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
how_to_start_up_a_startup [2019/11/26 17:25]
how_to_start_up_a_startup [2020/05/17 07:32] (current)
admin [Example:]
Line 76: Line 76:
   * Do the right thing, act responsibly and with integrity. ​  Your reputation is more important than your current startup.   * Do the right thing, act responsibly and with integrity. ​  Your reputation is more important than your current startup.
 +==== Example: ​ ====
 +Neuracore was incorporated on 12 December 2018 and dissolved on 30 April 2019.
 +=== What is the company going to do? ===
 +Design and license cores for the efficient execution of neural networks. ​ This will enable "AI Everywhere"​.
 +=== Why is it unique? ===
 +Extreme power efficiency obtained though low precision integer operation (with supporting software): single propagation delay addition and very low propagation delay multiplication.
 +=== How is it going to be successful? ===
 +Licence technology into a massive market, from servers though laptops, phones and smart watches.
 +=== Draft a quick business plan so that you have a story to tell others ===
 +Hardware acceleration for Neural Nets is already huge, the whole current wave of Deep Learning happened because GPUs became cheap enough. ​  ​Google have enough services that need NNs to build their own ASIC, the TPU.  Facebook is driven by AI, the trend towards increasing automation is massive and well known.
 +  * Stage 0:  Come up with a reasonable hardware design
 +  * Stage 1:  Do patent review then get team together
 +  * Stage 2:  Partner with ARM (local and known), get them to fund joint work
 +  * Stage 3:  Sell to ARM, boarden base.  Retain sufficient IP to be independent.
 +Aim for getting it out there in 5 years - any sooner and FPGA will dominate, any later and too much risk.
 +Use AI index 2018 annual report for evidence of AI gold rush.   ​Neuracore sells the shovels "​During the gold rush its a good time to be in the pick and shovel business"​ Mark Twain
 +=== Competitors ===
 +From: [[https://​​Hardware/​ArticleID/​16753/​The-Great-Debate-of-AI-Architecture.aspx|The Great Debate of AI Architecture]]
 +  * Nvidia - DNN training is a major part of their strategy
 +  * Intel ([[https://​​intel-nervana-neural-network-processors-nnp-redefine-ai-silicon|Nervana]] (estimated $408 million) and [[https://​|Movidius]]) - Need to maintain leading position
 +  * ARM [[https://​​products/​processors/​machine-learning/​arm-ml-processor|ML Processor]] - FPGA to rewire a fixed point unit with local controller and memory. ​ Claim 4 TOps/s per Watt.
 +  * Google - have volume, have built [[https://​​blog/​products/​gcp/​quantifying-the-performance-of-the-tpu-our-first-machine-learning-chip|TPU]] and TPU2
 +  * Microsoft [[https://​​en-us/​research/​uploads/​prod/​2018/​03/​mi0218_Chung-2018Mar25.pdf|BrainWave]] - catch up with Google
 +  * Baidu SDA - no definitive reference - https://​​2017/​08/​22/​first-look-baidus-custom-ai-analytics-processor/​
 +  * Xilinx - [[http://​|DeePhi Tech]]
 +  * IBM and fp8 https://​​blogs/​research/​2018/​12/​8-bit-breakthroughs-ai/​
 +  * ESE - not sure what this refers to - maybe https://​​abs/​1612.00694
 +  * [[https://​|Teradeep]] IP licence of RTL for SoC and FPGA [[https://​​organization/​teradeep|crunchbase]]
 +  * [[https://​|Cerebras]] raised $112m [[https://​​organization/​cerebras-systems|crunchbase]]
 +  * [[https://​|Graphcore]] raised $310m [[https://​​organization/​graphcore|cruchbase]] Claim 0.5 ExaFlop per rack (assume 2-20kW), so 250-25 TFlop per Watt. 
 +  * [[https://​|Groq]] raised $62m [[https://​​organization/​groq|crunchbase]] claim 8 TOps/s per Watt
 +  * [[https://​|Wave Computing]] raised $203m [[https://​​organization/​wave-semiconductor|crunchbase]]
 +  * [[https://​​2019/​02/​20/​global-deep-learning-chipsets-market-2019-2025-markets-major-players-are-google-intel-xilinx-amd-nvidia-arm-qualcomm-ibm-graphcore-brainchip-mobileye-wave-computing-ceva-movidius-nerv/​|Global Deep Learning Chipsets Market 2019-2025: Markets Major Players Are-, Google, Intel, Xilinx, AMD...]]
 +=== Idea killers ===
 +  * Consumer/​research grade has to be:
 +    * Faster than GPU, FPGA or TPU
 +    * Cheaper than GPU and FPGA (e.g. has more RAM)
 +    * Easy enough to use (will be less precision than fp16)
 +  * Need to get memory side-by-side with logic so get the bandwidth
 +  * Must be able to do training on chip as something will need this in 5 years time, e.g. AGI
 +  * Must be flexible enough to keep up with the NN developments in the next 5 years, including training
 +  * Hardware people have fixated on CNNs - are they right? ​ What does everyone want to use?
 +  * Must be able to use all common SGD optimisation techniques.
 +If we assume that neural nets will be a major consumption of power in the future, and that power is limited by convenience (on a phone) or cost (servers) or CO2 emissions (climate change) then there is the case for a power efficient hardware implementation of neural networks. ​
 +=== Technical Summary ===
 +== Problem statement/​Diagnosis ==
 +DNNs are everywhere and are growing in popularity, however the popular hardware is very general and not power efficient. ​  This limits both the scale which can be trained and the scope for deployment. ​ Typically a 2 slot PCIe card can consume 300W and a small number of them can fit in a server. ​ GPUs from Nvidia are the current favourite, these perform fp16 calculations (was fp32) using a dedicated architecture of local SIMD processors and local data.  FPGAs are also receiving more attention, they are good at convolutional neural networks (say why).  Any 10x improvement over current technology must both reduce the transistor count (so as to reduce power) and be very memory bandwidth efficient (so as not to have a memory bottleneck). ​ The field is moving fast, so it must be easily adoptable in a short time period.
 +In order to make an impact any solution must be complete, that is almost invisible to the user.  It needs to improve on the three major operations:
 +  * forward: ​ The inference, or forward pass of a DNN model
 +  * backward: ​ The backward error propagation pass of stochastic gradient descent which accumulates gradients over a batch
 +  * update: ​ The work needed to scale the batch gradient into a weight update (may need complex CPU like operations)
 +== Guiding Principles ==
 +^ Guiding Principle ^ Why ^
 +| Minimise power    | Aids deployability: ​ (1) researchers get more power so can build bigger models so will buy (2) sells into areas not currently accessible (e.g. mobile). ​ Cost and transistor count probably correlate with power, but they are secondary considerations |
 +| Scalable | 200W for data centre, 20W for laptop and 2W for phone |
 +| Sufficiently flexible | Blocker: ​ if can't implement what's needed then it won't be used |
 +| State of art results ​ | Blocker: ​ if better results elsewhere then people will go elsewhere |
 +| Easily trainable ​     | Blocker: ​ if not TensorFlow/​PyTorch then adoption will be too slow |
 +=== Rejected and company closed ===
 +After considering many designs, including analog and ternary weights, I ended up with 4 bit weights and activations. ​ This achieves the goals albeit uncomfortably similar to the TPU.  The scale of work needed to make the trainsition from fp32/fp16 to 4bit is too great - the first prototype would be noticed by the giants and the company would be overtaken (defending IP is very expensive). ​ This could well lead to a forced sale which isn't great for anyone (expecially founders/​Ordinary share holders).
 +Start October 2018, end February 2019, minimal external costs.
how_to_start_up_a_startup.1574789138.txt.gz ยท Last modified: 2019/11/26 17:25 by admin