Machine Learning

Because intuition fails in high dimensions, we resort to a combined package of representation, evaluation, and optimization a.k.a machine learning (ML). ML offers a powerful toolkit to build complex systems faster. A short paper showing the cautionary side of ML makes for an interesting read. A few more things I found useful related to the field:

You may not need ML.
Business Logic is superior to ML. In cases where you have insufficient prior knowledge, ML offers good estimates.
Understand what your model means.
Keep data dependencies simple and crisp.
Reduce code volume.
ML processes should be designed based on the information flow paradigm.
Training is a low effort exercise.
Most of the functions placed at low levels of ML systems are redundant or of little value when compared with the cost of providing them at that low level.
Drawing modular boundaries when designing ML systems is a bad idea. In other words, there should be no features for an ML system -- only functions at the highest application level.
Choose how to represent your model first: K-NN, SVM, Bayes, Regression, Decision Trees, Rule-Based, Neural Networks, CRFs, or Bayesian Networks. Next, choose your evaluation method: Error Rate, Recall and Precision, Squared Error, Likelihood, KL-Divergence, Utility or Margin based. And finally, choose your optimization approach: Greedy, Branch-and-Bound, Beam Search, Linear Programming, Quadratic Programming, Gradient Descent, Conjugate Gradient or Quasi-Newton.
A dumb algorithm with lots of data beats a decent algorithm with modest amounts of data.
There is no such thing as minima.
Not every representable function can be learned.
Any ML system operates only in a specific observational mode.
If hyperparameter optimization is your only worry, you got it all wrong.
Training your ML model to convergence is impractical.
Remember that modifying the model can have significant effects on memory layout.
Reward function design is tricky and oftentimes, reinforcement learning isn't practical or accurate.
Real models diverge.
It is the behavior of your optimization algorithm that counts, not its 'zero' loss. Is it learning bad correlations or good ones first?
Overparameterising neural networks is a simple way to get acceptable results.
Your neural network of million parameters has an equivalent thousand-variable polynomial regression equation.
There is no best learner. If your algorithm is better at solving one problem, it is worse at another.
ML development is not a monolithic pursuit.