Linear Regression Model to Predict Unemployment Rates in Trinidad and Tobago — Part 2
Introduction
This article is a follow-up to my previous analysis on unemployment in Trinidad and Tobago. In Part 1, I highlighted all the steps in developing a linear regression model and performing diagnostics using Python. Now although the model upheld all Ordinary Least Squares (OLS) assumptions, my analysis was not complete. In this article, I will focus on “Model Interrogation”, which is the process of answering one simple question, “Does this model make sense?” Because even though a model doesn’t violate any OLS assumptions or all independent variables are significant, you still have to determine whether the features are explaining the target/dependent variable.
Model Interrogation
In my previous analysis, I did mention that NPL, N_GAS, O_PROD and CEM_SALES are good predictors for U_RATE. However, is this true?
Now although two variables are related, this doesn’t automatically mean than one is causing the other. Remember, the famous statement, “Correlation doesn’t mean Causation.”
Let’s take a deeper look at the independent variables in this model.
NPL
What this model states is that for every 1 percent increase in the NPL ratio, the average unemployment rate increases by 0.9709, holding all other variables constant. However, are changes in the NPL ratio affecting the unemployment rate or is it the other way around?
From my observations and basic macroeconomic knowledge, this is a classic case of Reverse Causality. If the unemployment rate increases, this means that more people are without work and not receiving a salary. With no wages, these individuals are likely to fall behind on their loan payments, which will lead to higher delinquency rates. Therefore, while the regression analysis is showing that NPL is significant to this model (with a p-value < 0.05), it has to be removed, as it is not explaining unemployment but rather, it may actually be the other way around. Unemployment may be causing NPL.
CEM_SALES
The linear model also shows that for every 1 Tonne of Cement sold, the mean unemployment rate falls by 0.4984 percent. But once again, I had to inquire whether changes in Cement Sales were really affecting the unemployment rate. Now, from an economic standpoint, cement sales is often a good proxy for construction sector activity in Trinidad and Tobago and heightened construction activity often equates to the creation of more jobs. So, in reality higher cement sales is indeed a side effect of a boost in construction activity. Therefore, it is safe to say that cement sales does not directly impact the unemployment rate, but it is in fact, construction sector growth.
One way to look at why the inclusion of cement sales does not make sense is to picture a weird scenario in which construction sector activity is low but cement sales spikes. Suppose construction projects are being delayed due to some legislative issue but all the while, there is mounting speculation about a shortage of cement. If contractors begin to panic buy, cement sales will increase, but in this special case, there is no construction activity. As a result, contractors are not hiring masons and other tradesmen for projects so in this case the rise in cement sales will mislead the model to make inaccurate predictions. Therefore, CEM_SALES will need to be removed from this regression analysis.
N_GAS AND O_PROD
Energy sector production can definitely affect the unemployment rate. Although the extraction of hydrocarbons is highly capital intensive, heightened energy-sector activity in Trinidad and Tobago tends to have a positive trickle-down effect on other sectors, like construction, distribution and other services. Also, natural gas output fuels activity in the downstream industries, which include the production of Liquefied Natural Gas (LNG), Ammonia, Urea and Methanol. Furthermore, around 87 percent of government revenue comes from the energy upstream market. During periods of high energy sector output, the government of Trinidad and Tobago profits heavily and this wealth is then invested into other industries to stimulate growth and create additional jobs.
Conclusion
Upon closer inspection of this model, it seems that while Non-Performing Loans and Cement Sales are both significant to this model and all OLS assumptions are upheld, this linear regression model may encounter some problems in making predictions.
This shows that although your regression diagnostics may check out perfectly, this does not automatically translate into a viable model.
In the final part (Part 3) of my analysis, I will experiment with three additional variables which may have more of a direct impact on the unemployment rate in Trinidad and Tobago. I will also demonstrate the importance of running diagnostics using the test data and provide my final thoughts on whether this linear regression can be used to predict unemployment rates.