This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
how_to_start_up_a_startup [2020/04/27 08:48] admin [Example: Neuracore.ai] |
how_to_start_up_a_startup [2020/05/17 07:32] admin [Example: Neuracore.ai] |
||
---|---|---|---|
Line 146: | Line 146: | ||
In order to make an impact any solution must be complete, that is almost invisible to the user. It needs to improve on the three major operations: | In order to make an impact any solution must be complete, that is almost invisible to the user. It needs to improve on the three major operations: | ||
- | * fwd: The inference, or forward pass of a DNN model | + | * forward: The inference, or forward pass of a DNN model |
- | * bwd: The backward error propagation pass of stochastic gradient descent | + | * backward: The backward error propagation pass of stochastic gradient descent which accumulates gradients over a batch |
- | * acc: The combination of fwd and bwd results to get and error signal which is accumulated over a batch | + | * update: The work needed to scale the batch gradient into a weight update (may need complex CPU like operations) |
- | The final pass, model update, can be formulated as computationally lower cost (e.g. updating only whenever there is a significant change) and also is not standardised in approach (ref: [[http://ruder.io/optimizing-gradient-descent|S. Ruder]]). There are also other operations (e.g. softmax and batch normalisation) that are best suited to general a purpose processor. | + | |
== Guiding Principles == | == Guiding Principles == | ||
Line 162: | Line 161: | ||
=== Rejected and company closed === | === Rejected and company closed === | ||
- | After considering many designs, including analog and ternary weights, I ended up with 4 bit weights and activations. This achieves the goals albeit uncomfortably similar to the TPU. The scale of work needed to make the trainsition from fp32/fp16 to 4bit is too great - the first prototype would be noticed by the giants and the company would be overtaken and left worthless. | + | After considering many designs, including analog and ternary weights, I ended up with 4 bit weights and activations. This achieves the goals albeit uncomfortably similar to the TPU. The scale of work needed to make the trainsition from fp32/fp16 to 4bit is too great - the first prototype would be noticed by the giants and the company would be overtaken (defending IP is very expensive). This could well lead to a forced sale which isn't great for anyone (expecially founders/Ordinary share holders). |
Start October 2018, end February 2019, minimal external costs. | Start October 2018, end February 2019, minimal external costs. | ||