Models are Analogies

My definition(*) of analytics is:

Analytics is the practice of building models on computers to learn about the world.

Models, computers, learning are all necessary to the definition. No models? Not analytics, otherwise browsing the web would qualify. No computers? Not analytics. You’re doing math or stats or something. Not learning about the world? Not analytics. You’re playing Sim City or Grand Theft Auto or whatever. I find that many data scientists (**) don’t spend much time thinking about what analytics actually is, perhaps because analytics is such a relentlessly practical discipline, or maybe because it is so nebulous. I bet more people would be able to define machine learning, or Big Data, or optimization than analytics. Everyone’s doing analytics but nobody can say what it is. That’s worth considering in its own right, but it’s not the subject of today’s post.

I want to talk about models because that’s the piece of the puzzle in my definition that isn’t obvious. A model is simply a representation of a thing that actually exists in the world. It’s a statement: “this thing is like that one”. This is every bit as true for a model railroad as it is for an analytics model. The way you build a model is to think about the key properties of the thing you want to model (“it rides on a track, has wheels, and a funnel on top”), and then use the tools you have at your disposal to reproduce those attributes in the model.

Computer models often differ from their targets in a key respect: computer models don’t actually physically exist where as the things they represent (shoppers in a grocery store, oil deposits underneath the ocean floor, cells reproducing and mutating), do. It is this difference that makes computer models so useful. Since they don’t actually exist, computer models are comparatively incredibly cheap to make, change, and rebuild. Experiments can be run on computer models without fear of explosions, crashes, reactions, or lawsuits. These days we’re witnessing exponential growth in the number of experiments that are conducted each day, and it’s entirely due to the use of computer models. I’ll bet there have been more experiments conducted in the first thirteen years of the 21st century then in all of prior human existence combined.

Models purposefully represent only certain aspects of the things they represent. After all, if it represented all of them you’d have a clone and not a model. Of the nearly limitless properties we can ascribe to any object, a model of that object ignores most, modifies others, and mimics only a select few. A toy airplane has no engine, no seats, no electrical system, no pilot, is built from different materials, at much smaller scale. It’s entirely recognizable as a toy airplane because it has wings and is fun to move around. It’s the same deal with computer models because most of the learning can be achieved through modeling a few key attributes. It’s so easy to forget this that when I am advised about model building I often suggest to go back to the start and build the simplest possible model that models the phenomenon of interest. Add in the complexity later – oftentimes you don’t need it (***). 

Finding the right dimensions of similarity are important. It’s not always obvious which ones are the important ones to keep. Models, like analogies, are ways of seeking truth. As with analogies, models can be “faux amis”; they can enlighten or obscure. It’s also true that models and analogies are insufficient. They can be tortured and stretched too thin. We’ve got to be careful that the conclusions that we draw based on analytics apply to the things we are modeling, and not just the models themselves. There’s the old joke about spherical cows that applies. I hope it’s not too obvious if I say that there will always be room for experimental science: watching what actually happens in the actual universe and thinking about what’s been observed.

A recent Hofstadter book (which I have not read…) discusses the mind’s necessity for analogy, and I think this is what drives us to build models so relentlessly. Model building soothes great clusters of neurons inside our skulls. Even if we didn’t enjoy building models so much, they’d be necessary because they help us to understand the incomprehensible so much more quickly than we would if we relied solely on empirical science. 

(*) INFORMS defines analytics as follows: Analytics is the scientific process of transforming data into insight for making better decisions. That’s a great definition. 

(**) Most data scientists don’t call themselves data scientists. 

(***) I am not implying that “big data” is useless. I am speaking here of the complexity of input data rather than quantity. That said, in many, many cases I don’t think you need “big data”. So there.

Author: natebrix

Follow me on twitter at @natebrix.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s