Data Science
Approaching a Sales Forecast Problem (Part-2)
Prashant Brahmbhatt
September 12, 2020
5 min

Continuing with our previous blog post, we will now try to build some more models for our problem and compare their performance. Be sure to check out the first post here.


As the name suggests, ARMA is a combination of both of the models. We will combine the AR models with MA to get a more nuanced model that may give us better results.


For ARMA we will begin with complex models and slowly step down.







Checking model significance,

LLR For ARMA(3,1) and ARMA(5,1): 0.0

LLR For ARMA(5,1) and ARMA(5,2): 0.67

  • ARMA(5,1) is significantly better than ARMA(3,1)
  • ARMA(5,2) isn’t better than ARMA(5,1)

Comparison of model performance,


We observe that ARMA(5,1) is an ample model but again, it is still very similar to AR(5) and not an improvisation. Another problem that we are still facing is,


we are still not able to account for the seasonality in the data. The pattern is still sneaking past our model and showing up in the error terms yet again. We have to get a grip on this somehow, so now we’ll build a model that covers this seasonality.

We would not use the ARIMA model here as our series is already stationary so we won’t require the I or the integrating portion of the model.


Another upgrade on the ARMA model, the SARIMAX model has eight parameters. The X denotes the exogenous variable, one which can be used to explain some of the variance in our target variable. But since here we are addressing the problem as univariate we will not using the exog variable making it a SARIMA model. Also since our series is stationary we drop the I as well, finally getting SARMA.

Format for SARIMAX (p, d, q) (P, D, Q, s) p — Trend Auto Regression d — Trend Difference Order q — Trend Moving Average

P — Seasonal Auto Regression D — Seasonal Difference Order Q — Seasonal Moving Average s — Length of Cycle


Since the hike appears after every 1 year and our periods are in weeks, we should be considering 52 as the length of cycle(s) for SARMA; s= 52.

We fit similar order models as we did earlier and then get the following results.


We can clearly see the difference that our seasonal model makes in the results. The performance of the latter two models is more than acceptable. It corresponds to the actual values very well. This shows how crucial it is to capture the seasonality in the data.

To verify this we can compare the residual results as well.


We can see that the seasonal pattern in the error terms has been reduced a lot signifying that it has been encapsulated in our model.


There are several methods that smoothen the data. It is done to remove some of the random noise so that the important patterns are emphasized. Some of the smoothing methods are random method, random walk, moving average, simple exponential, linear exponential, and seasonal exponential smoothing.

We are going to implement exponential smoothing here.


We can see that the results of the smoothing are also reasonable in predicting the pattern correctly but they have some kind of offset to the actual predictions which makes it somewhat less appealing to our taste.


In recent times, any conversation on time-series wouldn’t be complete without any mention of Facebook’s Prophet.

Prophet is an open-source additive model designed by Facebook, that is best suited for the time-series which have seasonality in them. It is fully automatic but also tunable. It is remarkably faster than other methods. It can be implemented in both R as well as python without any hassle.

It has three optional boolean parameters daily_seasonality, weekly_seasonality, yearly_seasonality. We can configure them as per our time series. For our purpose, we can mark the yearly seasonality as True.

So why don’t we try it on our data and see whether it lives up to the hype?


Pretty! Very pretty indeed! definitely living up to the expectations. Be it the test results (between saffron and green margin) or the future forecasts (from green margin onwards) it shows a very reasonable prediction pattern along with the upper and lower confidence interval.

In terms of speed, prophet took 2–3 seconds to give the above results meanwhile SARMA took 13 seconds to produce its results in our case given the same amount of data. So yes, prophet is faster, confirmed!

Another advantage is that you can make a dynamic plot of the prophet model as well.


So concluding the modeling part, we can say that SARMA and PROPHET models are the ones that we could rely on the most for our problem.


Until now we have been modeling the overall sales of all the departments aggregated, the series did not belong to any particular department or any store. Now that we have got our models to be used, we can model for the department individually.

However we can not necessarily use the same orders of complexities as we did earlier, we have to again check for appropriate orders as per the new series of the department that we are to model. But now we know the drill so it isn’t a herculean task anymore.


We can see how each department is essentially a new time-series in itself.

Note that we are still aggregating over the stores. We are not going to predict for each store separately here, only the department.

SARMA Performance

After a sampling of the series for any arbitrarily chosen department (say Department 2) and choosing the appropriate order, we got the following results,


We can see that SARMA is still holding very well given the new individual series.

PROPHET Performance

Now we can also see how prophet is doing with new series.


So prophet is also doing very well. So we can take both of these models and they are very likely to perform well for each of the departments and we can also go to store level granularity if we wish.


In this post, we tackled a sales forecasting problem. We saw some of the initial processing of the data to prepare it for forecasting. We also looked at different models and how they perform, got a walkthrough of the process of selecting the appropriate orders and complexities for those models.

We compared the results of models and also validated their performance in terms of residuals.

I hope this could be of help to anyone who may be trying to solve a similar kind of problem statement. There is an additional Linear Regression Model that has not been included here, you can check the complete ipython notebook here for the complete code of all of the above models and the Linear Regression as well.

Leave any suggestions or feedback if you like. Also if you are stuck somewhere, do reach out to me. Remember…

“Help will always be given, to those who ask for it!”

Until next time! Ciao!


Data ScienceTime SeriesForecasting

Related Posts

Using Pre-Trained Models Effectively
November 16, 2020
4 min
© 2021, All Rights Reserved.

Quick Links

Advertise with usContact Us

Social Media