Skip to content

Computer programming

This page collects info and links (to this site or others) useful in computer programming and software development, including resources for various languages, editors, or testing tools, and notes/tips for using them effectively. These resources lean towards data science and data analysis applications.

General rules and philosophies

I like Rob Pike's 5 Rules of Programming:

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)

Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.

Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Pike's rules 1 and 2 restate Tony Hoare's famous maxim "Premature optimization is the root of all evil."

Ken Thompson rephrased Pike's rules 3 and 4 as "When in doubt, use brute force.".

Rules 3 and 4 are instances of the design philosophy KISS.

Rule 5 was previously stated by Fred Brooks in The Mythical Man-Month. Rule 5 is often shortened to "write stupid code that uses smart objects".

These were found here but are restated in various places around the internet.

Some techniques and conventions

Notes about data analysis techniques/conventions, independent of language/interface.

  • Sensor data notes on working with continuous sensor timeseries (from dataloggers, SNOTEL sites, etc.)
  • Data analysis workflow - Notes on collecting, storing, and moving data through the analysis process.

Text editing and data file handling

VIM is a great text editor. Below are a few resources on using it effectively.

An excellent general overview of text/data file handling in a Unix environment is provided by Unix for Poets, by Kenneth Ward Church. PDFs of this are all over the internet.

Other useful resources (including some on this wiki)

Python

Python is a high-level, open-source programming language that, when combined with some numerical, scientific, and plotting packages, makes a very powerful tool for scientific computing and data analysis (on par with Matlab). Useful Python extensions for scientific computing are:

  • NumPy - provides n-dimensional array objects and other useful numeric extensions to Python
  • SciPy - provides a number of high-level mathematical tools for use in scientific computing (integration, optimization, fourier transforms...etc)
  • Matplotlib - a plotting library that provides publication quality plots and plotting routines that are similar to Matlab's.
  • IPython - an interactive shell that is designed to work well with NumPy, SciPy, and Matplotlib.
  • SciKits - add on toolkits that complement SciPy (various statistical models, timeseries analysis, machine-learning, image processing, etc.
  • The pandas library - provides high-performance, easy-to-use data structures (like data frames) and data analysis tools that sit on top of NumPy.

Official Python resources

My Python notes

Collected notes, tips, and tricks for using any of the Python tools above.

Other

MATLAB (and clones)

MATLAB is a proprietary programming language and IDE that is widely used in scientific and engineering computing.

Resources

Clones of Matlab

There are a bunch of free/open-source clones of Matlab that have various levels of syntax compatibility.

R

R is a free, open-source software environment for statistical computing and graphics.

Math and Stats tools

Many toolboxes are available, either standalone or in Python, R, and Matlab, for math and statistical applications. See the math toolbox page page.

Testing data analysis functions

Code used in data analysis can perform fairly complex operations on datasets and generate output that may be significantly changed from the original data. The code itself can also be fairly complex and its actual function may be difficult to discern just by reading the code or looking at the data. It is important to verify that the result of running this code is what is expected and that the output is accurate. Writing test functions that call data analysis code and analyze their output is a useful way to do this.