In my previous post I discussed the challenges in obtaining data to measure marketing effectiveness in marketing mix models. Getting good data fast is hard because of comprehensiveness and correctness concerns. In this post I want to address modeling challenges. The heart of most MMM estimation is multivariate regression: a statistical means for predicting one quantity (referred to as the dependent variable) in terms of others (the independent variables). MMM systems rely on prebuilt multivariate regression packages from SAS, R, SPSS, or somebody else.
The dependent variable in an MMM is typically related to brand volume: cases of green beans, for example. Sales may seem like a more natural choice, but there are difficulties. For a global product, this would require conversion rates, and even for single currency projects, inflation comes into play. More importantly, regular and promoted price are often independent variables in an MMM, so using sales gets confusing. So brand volume is often the way to go. As I noted in the previous post, a single MMM may model an entire brand – which may consist of UPCs with different sized packages or units: 12 oz, 16 oz, 12 pack, 2 liter for example. This means that to model with brand volume it’s necessary to convert the sales volume of each UPC into what is referred to “equivalized units”. For example, if equivalized units are expressed in terms of 24 count cases, then a 12 pack counts as 0.5.
You’ve got to mess with the independent variables too. The reason why is that you want to scale or transform them so that they are as useful as possible for predicting volume. There are tricks won through experience that depend on the quantity. As an example, weather often affects sales. Papa Murphy’s pizza offers discounts in the summer based on daily high temperature, in recognition of this fact. When average weekly temperature is used as an independent variable, it is often mean centered – that is, the average is subtracted from each week. Variation from average is more useful than a string of values in the 70s. A log transform may be applied in other cases, and so on.
Once the data is prepared, the regression comes out and estimated volumes come out based on the model. It’s tempting to simply compare the estimated and actual volumes: if they’re close, the model’s good. This can be done using standard statistical measures such as R^2 or MAPE. Evaluating a model solely on fit is a very bad idea. The biggest reason is that you risk overfitting. Overfitting happens when there are too many independent variables in the model, so that you are no longer modeling the underlying phenomena causing sales. I can get amazing R^2 for any marketing mix model by simply adding a bunch of independent variables with random noise. Random bumps in the series will happen to line up with portions of sales, and just like a big room full of monkeys accidentally banging out Shakespeare, out comes great fit. Overfitting is often more subtle, for example by trying to account for differences in regions, channels, stores, and so on. Each new variable by itself may seem reasonable but collectively the model becomes overspecified. The guard against this is to take holdout samples: randomly pull some percentage of the independent variable data prior to estimating. Then measure fit on both the modeled data and the holdout sample. If the fit on the modeled data is great but poor for the holdout sample, you’re overfitting.
Another important consideration is the level of data aggregation. A simple rule of thumb is to try and get all of data at the level of aggregation where it occurs in real life, and model at the lowest common level. If you can’t do that, aggregate up until you can. This implies that for MMM it would be great if I could get individual sales data for everyone in my modeling universe, along with information about all of their media exposure, the grocery store features they were exposed to, the coupons they received, etc. Not bloody likely, even with NSA assistance. And even if I could obtain this data, it might be difficult to clean, prepare and model at this level. Grocery store scanners yield very accurate store level sales data, therefore store level models are frequently used in the US for brands that are sold in grocery stores. In other situations a market level model is more appropriate. The danger of modeling at a higher level of aggregation is that we lose variation in the data, and therefore predictive power. This is easiest to see in the time dimension. Consider a TV ad campaign where we run ads for the first two weeks of a month, and then pull them for the last two. When viewed at the biweekly level, TV activity zigzags up and down in predictable fashion. When viewed at the monthly level, TV activity is uniform and would therefore be useless in an MMM.
A last (underrated) consideration is reasonableness. Does the result makes sense? This seems obvious but when the amount of input and output are considered, this can be laborious and tricky. Looking at different “pivots” of the output results is often helpful. Something is reasonable only with respect to a convention. The convention in this case can come from past models, industry norms, or even the opinion of the client. The latter is dangerous because there can be considerable pressure to bend the model so that the result is exactly what the client expects. Modeling is complicated and it’s usually pretty easy to second guess details at any step in the process, so the safe bet is simply to tell the client what they want to hear. Don’t do that! And don’t tell them what you want them to hear – tell them what they need to hear, based on facts. Reasonableness assessments are intended to ferret out flaws in data preparation or modeling, not to reject uncomfortable truths.