Perhaps you’re in the process of commissioning an econometric modelling project and want to avoid the pitfalls. Or maybe your project is well underway, numbers are coming at you, but you’ve got a sneaking suspicion that all is not quite as it should be.
Econometric modelling is a technical area and sometimes a little daunting. In this piece, we identify some of the most common errors, misunderstandings or plain old bad models that you may well encounter.
Statistics are the most obvious things an econometrician should check. Here are the key ones.
Not by a long way, but it is the only statistic that clients routinely ask for. R2 is a summary measure of ‘goodness of fit’, in other words how well the model that’s been built fits the data you’ve got, 0% being not at all and 100% being perfectly. The problem is that it’s a non-diminishing function of the number of variables in your model. Or, in other words, the more variables you add to a model the higher the R2!
It’s very easy for unwitting junior analysts (or unscrupulous senior ones!) to get a high R2, simply by dumping dozens of variables in your model. The problem with that is that is appears to account for the data well, but doesn’t explain anything. Also, what’s ‘good’ for one context may be less good for another. Typically, FMCG/CPG sales models have an R2 of 90-95%. If you’re modelling a ‘soft’ metric (e.g. awareness, consideration), 70% may be as good as it gets. In summary, watch out for models with dozens of variables – make your modeller explain why they’re there and what they tell you.
These tell you how much confidence we have that a particular number from the model, for example the response of sales to TV advertising, is actually the value we happen to have calculated from our sample. Think of it as the difference in having a hunch about the right answer, versus being almost 100% sure. The number in question may be the same in both cases, but you’d rather have numbers of the second sort. That’s what, in effect, t-statistics tell you. And the higher the statistic, the more confidence you can have in the number.
However, t-statistics are purely a statistical measure. Bad modellers will sometimes include variables with sky-high t-statistics that have no business being in your model, and for which there is no credible explanation. Conversely, good modellers will sometimes retain as exceptions variables with low t-statistics because they make good, economic sense.
The chances are that you won’t be offered your t-statistics by default (your consultant won’t want to bog you down with detail), so make sure you ask for them, and make sure they’re high (greater than 2 is a minimum). Beware of variables that have high t-statistics and make little sense. Make sure that any variables with low t-statistics are justified.
One very useful check on model quality is the actual vs. ‘fitted’ data, where, by ‘fitted’, we mean the data predicted by the model. The better a model is, the more closely the fitted data should follow the actual data. (In fact, this is very closely related to the R2 statistic, see above.) When looking at fitted sales from a model vs. the actual sales, be aware of periods where the one consistently higher than the other, especially if this happens in periods where there is media activity, this can sometimes indicate that the media effect has been over-estimated (model > actual) or under-estimated (actual > model), or that another variable has been left out of the model.
A model is exactly that – a model. It is intended to be a simplified representation of reality that retains only materially important factors. A model – any model – is just one such representation, one way of making sense of the world. So, when you’re reviewing models, be sure to apply liberal doses of common sense, and engage actively with the responsible consultant. Good consultants will be able to engage with you without being precious, and this process will improve the model and increase your confidence in it. The converse – staying quiet – will lead the model results to be quietly undermined behind the modeller’s (and maybe even your) back, which is the kiss of death where analytics are concerned.
There are natural limitations to what econometrics can do. National models CANNOT pick up small localised spends, so that £5,000 poster campaign around Sainsbury’s in Sheffield at the same time of the flagship £2m national TV re-launch campaign and BOGOF promotion, will simply not be picked up by an econometric model. If your consultant is saying otherwise, they’re probably just picking up noise (random variation) in the data and misattributing it. That’s great in the short-term (conclusion: £5,000 poster campaigns WORK), but is bound to surprise you in a bad way the next time around.
Occasionally we have the privilege of seeing presentations of modelling work from suppliers who purport to have found incredibly subtle, rich and varied effects – microscopic cross-price elasticities with very distant competitors, subtle pack design changes, shelf wobblers vs. barkers and so on. ‘Incredible’ is the operative word here. For, on digging deeper, it almost always turns out that these effects reflect the fancies of the responsible modeller, rather than a serious attempt to understand how the world works. It looks great on paper (‘these guys can measure EVERYTHING!’) but once again it is bound to disappoint in the long-run.
Another classic watch-out, is for over-reliance of trends in models. In general, econometricians use trends as a last resort to capture any underlying movement which they can’t explain in terms of the variables they have. The trends may be labelled ‘long-term media trend’ or ‘healthy eating trend’ or anything else that’s plausible. Whilst these may be valiant attempt to interpret the uninterpretable, don’t lose sight that they are just that – interpretations. Again, good consultants will be open with you about their assumptions so that you are clear about the evidence base on which their conclusions rest. In the worst cases, trends are used by less diligent modellers to save them the time and effort it takes to understand how the variables actually interact.
Modelling at a brand level (i.e. combining all relevant products within a brand) can appear attractive, but there are two dangers here. Firstly, even closely related products may be priced and promoted very differently, in which case combining them into a single model will give poorly defined (low t-statistics) averaged effects. Secondly, combining products loses information which is often valuable, e.g. interactions between products. If you’re going to combine products make sure you’re combining things that you expect to behave similarly, and that don’t need the information you’ll lose. Nothing is more painful than commissioning an aggregate, brand-level model to answer THE big question, and then realising that to address all the secondary questions that this answer prompts you need disaggregated, product-level models.
1. Get the statistics – get your consultant to explain the numbers so you understand what they mean and their limitations
2. Use common sense – everything in a model should be translatable into plain English, or else it’s not much use. What does it mean, what is it saying, does it ring true?
3. Be clear on what you want to find out from the outset – clarity gives you the maximum chance of reaching your goals at the end of the project.
Talk to us about your bad models. We’re happy to advise you on modelling best practice.
Piquant only builds good models. Moreover, we only build the models necessary to address your
business challenges. We start with your problems and develop suitable solutions, using analytical
techniques in innovative ways to help you do the best marketing possible.