Home » Regression
Category Archives: Regression
Managers, by the end of this article, you will find that many of the statistics on which you rely could have the wrong conclusions, even if the sample is random and representative, and even if the analyst uses an academically-accepted method!
If there is nothing else I impart to you, I want to give you the idea that we help our stakeholders the best by insisting on the 2nd/3rd/nth opinion.
In this case study, you will find that 3 types of very brilliant, highly paid analysts will draw the WRONG conclusion, and it is up to people like you to offer new perspective towards a MORE ACCURATE conclusion. We get the best conclusions not with lone champions but with Champion Teams.
A Quick, Provocative Case
In the following data set, we will approach a familiar question: Do abrasive people earn more money? You have probably heard the bromide, “Nice guys [and gals] finish last.” It is a topic of decades-long (if not millennia-long) research.
Although I am using fictional data in this table, they might not be far from the truth in a number of organizations.
For the most thorough experience, please download the MS Excel Spreadsheet (XLSX, 2010+)
20 employees are randomly selected from a Strategic Business Unit (SBU). We will assume for the sake of argument that this sample is large enough and representative of the SBU. Salaries are in $1,000s, and an Abrasive person is marked with a “1,” while a Non-Abrasive person is marked with a “0.” So respondent #6 is an abrasive person who earns $43,000 annually.
Looking at these data, do you think abrasive behavior is associated with higher pay? Notice that most abrasive folk are at the top, and they predominantly have high salaries. Let’s see what 3 highly-paid analysts (possibly your hires) think.
Analyst #1: The Descriptive Statistics Analyst (Most Common Type)
- Abrasive people on average earn $42K in this sample.
- Non-Abrasive people on average earn $29.5K in this sample.
- Abrasive behavior is associated with $12.5K higher pay on average.
The typical business analyst at a typical consultancy (paid $50K+) might argue that people make more money because of abrasive behavior. What about someone more rigid?
Analyst #2: The More Rigid Statistician (More Academic, Higher Paid)
- Analyst #1 has failed to account for whether that $12.5K difference is significant. If salaries are volatile (have high standard deviations), then the pay differences could simply be by random chance!
- A 2-population hypothesis test indicates that there is only 1.2% chance that Non-Abrasives make less money through sheer dumb luck.
- Therefore there is overwhelming evidence to infer that Non-Abrasives do earn less money than Abrasives.
The Statistician who conducted an hypothesis test also thinks that Abrasive behavior is associated with higher pay. What about someone who uses computer-based skills, like linear regression?
Analyst #3: The Linear Regression Analyst (Might Work in your Sales Ops or Marketing Department or Thereabouts)
- Although a low R-squared accounts for only 25% of actual data variation, a variable P-value suggests the Abrasive variable has a strong linear relationship with Salary (never mind; your analyst might be trying to sound smart).
- It is very likely that being abrasive has a relationship with salary, and on average, being abrasive is associated with a $12.5K gain in salary, all else equal.
The Linear Regression Analyst also thinks that people make more money because of being abrasive (yes, I know none of these things actually prove cause). And yes, I have read a very expensive report in which someone used linear regression in similar fashion.
Well, 3 highly paid folks with educations and different methods all converge on the same conclusion. What do you think?
Here’s a Hint: They’re All Wrong!
Although these 3 are experts of numbers and methodologies, you the manager have your own expertise. Since you might focus on things besides data, what do you think warrants further inquiry?
Could it be possible that some of those people in the SBU were paid more simply because of different professions or positions?
In this case, we simply have 10 managers and 10 non-managers. Otherwise, no major professional difference exists. It seems that 9 out of 10 managers are abrasive anyway. Are we confounded (unable to distinguish being a manager or being abrasive), or can we do something more?
Since you thought of another variable, or factor, call back Analyst #3, who works in linear regression (which deals with variables). Inform the analyst of your input.
The New Results
- Being a manager is associated with $34.7K gain in salary, strictly for being a manager.
- Being abrasive is associated with $15.3K…loss in salary!
It seems that the managers were making money “in spite of” being abrasive, not “because of” being abrasive. At the risk of having mere outliers, the nice manager (respondent #10) was paid the highest here, and the nastiest non-manager (respondent #20) was paid the least.
The Taketa Takeaway
Analysts are expensive and educated, but taking their conclusions at face value, in spite of good sampling and their generally accepted methodologies, is dangerous. We need to employ diverse perspectives, including yours, in order to help our stakeholders the best. Analysis is not about an expert’s work but about an expert team. Why not have me on board to help you create it?
A fictional case study by Gregory Taketa. For non-data Managers, this case illustrates that your analysts will not be able to solve anything as complicated as weight loss using typical linear regression methods. Regression practitioners can use a MS Excel Data Set to approach this problem by transforming the variables with respect to growth and time.
I’m sure you or somebody you know has an interest in a healthier diet or lifestyle (I’ve lost 25 lbs. in 19 months by the time of this writing).
A friend of mine has a fantastic web site for those of you who are interested in low-carb recipes:
This site has many visitors, and you gotta visit at least once just for the clever name!
As you know, weight lifting is very helpful for weight loss because an increase in muscle mass, all else equal, yields higher fat burn. This burn happens even when you are not actively exercising, so in the diet world, this is comparable to passive income!
The Case: Estimating the Impact of Weight-Lifting
Our subject weighs 200 lbs. at the start of a diet. At the end of each week, he marks his weight and whether he did a diligent amount of weight-lifting (according to his health advisor) that week. As typical of any binary (dummy) variable, “1” indicates “yes,” and a “0” indicates “no.”
Although many other variables, including calorie intake versus metabolic requirements, influence body weight, we will assume for simplicity that these variables have remained fixed.
Key Case Facts
- For the first 8 weeks, our subject has lifted weights but notices no weight loss.
- After the 8th week, he gives up on lifting weights (he perceived no weight loss earlier).
- He exhibits some weight loss from weeks 9-16. This is because his gain in muscle mass takes time, about 8 weeks in this case.* He does not know about this.
*Disclaimer: I am not a fitness expert of any sort, and this case is simplified for the purpose of quick practice. I do not have factual evidence and do not suggest that 8 weeks is required to develop muscle mass.
Did the weight-lifting help our subject lose weight? How could you demonstrate it? What was the rate of weight loss for the given effort?
The purpose of this MS Excel-based Linear Regression tutorial is to teach a practitioner for a non-academic organization the basics necessary to form a model, test the model, and refine it. Secondary-school level algebra is required to read a model. Elementary statistics is needed to test a model. We cover the necessary skills step-by-step in the various colored sections.
To expedite the process, certain features which could be taught in an academic setting are omitted (e.g. a calculus and matrix-based proof of how to do linear regression).
By the end of this tutorial, you will be able to:
- Run linear models
- Attempt basic nonlinear models (e.g. power models, binary variables)
- Test the model statistically
- Test the model heuristically
- Apply the practice to business needs
I have written this tutorial on Excel so that you can easily see the data, formulas, charts, and commentaries all on one screen (instead of switching your head from screen to a book all the time).
To use a Linear Regression Package:
Please seek instructions for the respective toolkit to use their Regression application. I will not reinvent the wheel with my own set of instructions.
I have provided quizzes to test your understanding, since we never really know what we “don’t know.” Solutions are provided in a separate file. Good luck! -Gregory Taketa, Data Analyst Extraordinaire
LRWD Solutions (XLSX)