Opening the Washington Post today brought me a Proustian moment: encountering the name of Jack Dongarra. His op-ed on supercomputing involuntarily recalled to mind the dusty smell of the third floor MacLean Hall computer lab, xterm windows, clicking keys, and graphite smudges on spare printouts. Jack doesn’t know it, but he was a big part of my life for a few years in the 90s. I’d like to share some things I learned from him.
I am indebted to Jack. Odds are you are too. Nearly every data scientist on Earth uses Jack’s work every day, and most don’t even know it. Jack is one of the prime movers behind the BLAS and LAPACK numerical libraries, and many more. BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) are programming libraries that provide foundational routines for manipulating vectors and matrices. These routines range from the rocks and sticks of addition, subtraction, and scalar multiplication up to finely tuned engines for solving systems of linear equations, factorizing matrices, determining eigenvalues, and so on.
Much of modern data science is built upon these foundations. They are hidden by layers of abstractions, wheels, pips and tarballs, but when you hit bottom, this is what you reach. Much of ancient data science is also built upon them too, including the solvers I wrote as a graduate student when I was first exposed to his work. As important as LAPACK and BLAS are, that’s not the reason I feel compelled to write about Jack. It’s more about how he and his colleagues went about the whole thing. Here are four lessons:
Layering. If you dig into BLAS and LAPACK, you quickly find that the routines are carefully organized. Level 1 routines are the simplest “base” routines, for example adding two vectors. They have no dependencies. Level 2 routines are more complex because they depend on Level 1 routines – for example multiplying a matrix and a vector (because this can be implemented as repeatedly taking the dot product of vectors, a Level 1 operation). Level 3 routines use Level 2 routines, and so on. Of course all of this obvious. But we dipshits rarely do what is obvious, even these days. BLAS and LAPACK not only followed this pattern, they told you they were following this pattern.
I guess I have written enough code to have acquired the habit of thinking this way too. I recall having to rewrite a hilariously complex beast of project scheduling routines when I worked for Microsoft Project, and I tried to structure my routines exactly in this way. I will spare you the details, but there is no damn way it would have worked had I not strictly planned and mapped out my routines just like Jack did. It worked, we shipped, and I got promoted.
Naming. Fortran seems insane to modern coders, but it is of course awesome. It launched scientific computing as we know it. In the old days there were tight restrictions on Fortran variable names: 1-6 characters from [a-z0-9]. With a large number of routines, how does one choose names that are best for programmer productivity? Jack and team zigged where others might have zagged and chose names with very little connection to English naming.
“All driver and computational routines have names of the form XYYZZZ”
where X represents data type, YY represents type of matrix, and ZZZ is a passing gesture at the operation that is being performed. So SGEMV means “single precision general matrix-vector multiplication”.
This scheme is not “intuitive” in the sense that it is not named GeneralMatrixVectorMultiply or general_matrix_vector_multiply, but it is predictable. There are no surprises and the naming scheme itself is explicitly documented. Developers of new routines have very clear guidance on how to extend the library. In my career I have learned that all surprises are bad, so sensible naming counts for a lot. I have noticed that engineers whom I respect also think hard about naming schemes.
Documentation. BLAS and LAPACK have always had comprehensive documentation. Every parameter of every routine is documented, the semantics of the routine are made clear, and “things you should know” are called out. This has set a standard that high quality libraries (such as the tidyverse and Keras – mostly) have carried forward, extending this proud and helpful tradition.
Pride in workmanship. I can’t point to a single website or routine as proof, but the pride in workmanship in the Netlib has always shone through. It was in some sense a labor of love. This pride makes me happy, because I appreciate good work, and I aspire to good work. As a wise man once said:
Once a job is first begun,
Never leave it ’till it’s done.
Be the job great or small,
Do it right or not at all.
Jack Dongarra has done it right. That’s worth emulating. Read more about him here [pdf] and here.