# Machine Learning

Because intuition fails in high dimensions, we resort to a combined package of representation, evaluation, and optimization a.k.a machine learning (ML). ML offers a powerful toolkit to build complex systems faster. A short paper showing the cautionary side of ML makes for an interesting read. A few more things I found useful related to the field:

- You may not need ML.
- Business Logic is superior to ML. In cases where you have insufficient prior knowledge, ML offers good estimates.
- Understand what your model means.
- Keep data dependencies simple and crisp.
- Reduce code volume.
- ML processes should be designed based on the information flow paradigm.
- Training is a low effort exercise.
- Most of the functions placed at low levels of ML systems are redundant or of little value when compared with the cost of providing them at that low level.
- Drawing modular boundaries when designing ML systems is a bad idea. In other words, there should be no features for an ML system -- only functions at the highest application level.
- Choose how to represent your model first: K-NN, SVM, Bayes, Regression, Decision Trees, Rule-Based, Neural Networks, CRFs, or Bayesian Networks. Next, choose your evaluation method: Error Rate, Recall and Precision, Squared Error, Likelihood, KL-Divergence, Utility or Margin based. And finally, choose your optimization approach: Greedy, Branch-and-Bound, Beam Search, Linear Programming, Quadratic Programming, Gradient Descent, Conjugate Gradient or Quasi-Newton.
- A dumb algorithm with lots of data beats a decent algorithm with modest amounts of data.
- There is no such thing as minima.
- Not every representable function can be learned.
- Any ML system operates only in a specific observational mode.
- If hyperparameter optimization is your only worry, you got it all wrong.
- Training your ML model to convergence is impractical.
- Remember that modifying the model can have significant effects on memory layout.
- Reward function design is tricky and oftentimes, reinforcement learning isn't practical or accurate.
- Real models diverge.
- It is the behavior of your optimization algorithm that counts, not its 'zero' loss. Is it learning bad correlations or good ones first?
- Overparameterising neural networks is a simple way to get acceptable results.
- Your neural network of million parameters has an equivalent thousand-variable polynomial regression equation.
- There is no best learner. If your algorithm is better at solving one problem, it is worse at another.
- ML development is not a monolithic pursuit.