Home » Regression (Page 2)
Category Archives: Regression
Winning Formulas (Regression – Fun Introductory Group Exercise)
Welcome to Cask Studies, where you can properly age your skills without getting old. Even sour grapes can become fine wines here.
Winning Formulas (Regression – Fun Introductory Group Exercise)
An inebriating exercise by Gregory Taketa for both NonData Decision Makers and SoonToBe DataBased Decision Makers to quickly grasp how to use regression analysis (no serious math needed, don’t worry). This is also an excellent social exercise and similar to the fun Feedforward exercise developed by famous Executive Coach Marshall Goldsmith.
OBJECTIVE: You create your own “winning formulas” based on your life experience and then collect “winning formulas” from other people.
HOW TO WRITE YOUR WINNING FORMULA (for your convenience, a Game Piece is provided at the end of the PDF document):
 On the Left side of the formula, write an ACHIEVEMENT, a result or outcome you have achieved at some point in your life.
 “Got a 3.8 GPA” is an achievement.
 “Got married to a beautiful spouse” is an achievement, though some couples question that years later…
 “Received a promotion” is an achievement.
 “Worked 40 hours a week” is NOT an achievement. It is an input.
 “Exercised 3 times a week” is NOT an achievement. That is an input, too.
 An ACHIEVEMENT, simply, is something you can succeed or fail to get. An INPUT, in contrast, is something you do or do not.
 On the Right side of the formula, write 35 INPUTS which you think led to getting that ACHIEVEMENT.
 For example, a friend won a lot of scholarship money (ACHIEVEMENT) because he applied a lot (Input #1), told stories (Input #2), and did volunteer work (Input #3).
 Fewer than 3 Inputs is not descriptive enough, while more than 5 Inputs is overkill (your credit scores are usually measured with 57 Inputs)
 For each Input, give a POWER score, such that higher scores indicate higher importance.
 For example, if you have 4 Inputs, you might give your most important Input a score of 4, and your least important Input a score of 1.
 Ideally, no 2 Inputs have the same score, so ranking might be a safe way to score.
 Be creative. You might not have as much fun writing formulas like “I made a lot of money by working hard (Score 3), studying hard (Score 1), and making lots of friends (Score 2).”
 Don’t worry too much about the accuracy of your choices. This exercise is for you, and nobody is judging.
When you have 3 Winning Formulas listed, begin exchanging with other people.
RULES OF THE STOCK EXCHANGE:
 You give 1 formula, you get 1 formula.
 NO CRITIQUING. Especially you selfproclaimed experts! Everybody is at risk of attribution error, period. The goal of this exercise is to see others’ worldviews about achievement and their opinions of the meaningful factors. We’re not here to judge; we’re here to help each other.
 Asking clarifying questions is okay, but there is no room for opinions nor argument.
 After accepting the formula in a nonjudgmental manner, say “Thank you.”
 This exchange should last no more than 510 minutes.
ASSESSING YOUR PORTFOLIO, YOUR BOOK OF WINNING FORMULAS:
You now have some inventory about your successes and others’ successes. In a short exercise, you had nothing to lose and everything to gain. A host could set up a game such that a winner who collected the most formulas wins a prize, but I think the true prize is collecting a portfolio of diverse winning formulas. On the Game Piece I provide, there are reflective questions to help you.
Cheers!
Electric Bill Savings (Regression – Binary, Subsampling)
Welcome to Cask Studies, where you can properly age your skills without getting old. Even sour grapes can become fine wines here.
Electric Bill Savings (Regression – Binary, Subsampling)
A real case study by Gregory Taketa. Nondata managers can briefly read this document to see how data analysis helps in hidden ways. Meanwhile, regression practitioners can hone using binary and control variables with a MS Excel Data Set to approach a realistic problem.
A house of 4 in the San Francisco Bay Area is in the highest tier of electricity usage, and the members desire to decrease their electric consumption to lower the utility bill.
Upon the advice of Pacific Gas & Electricity (PG&E), the utility supplier, one member decides to unplug a number of appliances. The rationale is that many appliances still employ residual energy while plugged in, even if they are not activated. In general, expected savings are 5%, or about $60/year.
Unfortunately, implementation is not so simple. While one member unplugs many appliances, other members are disgruntled at having yet another (3 second) chore to plug in their appliances (e.g. the portable tv).
Because of this tradeoff, the energy conserver of the house has decided that if unplugging does not decrease consumption by a statistically significant amount, then the efforts are too immaterial for the inconvenience.
PG&E has made the following 6 years’ data of electric consumption available to the household (some information, e.g. account information, has been omitted to maintain confidentiality of the household). The author has also added data, including the #people in the household for any given month, and the months of unplugging (3 months total):
Click Here to Download Data (MS Excel 2010+)
The relevant variables given in the monthly Source Data are as follows:
 kWh: the electric energy consumption measured in kilowatthours (output variable)
 People: the # of people living in that house for the respective months
 Unplug: a binary variable: “0” for when the Energy Conserver did NOT unplug the applicances, and a “1” for when the Conserver did (again, 3 months total).
You may consider other variables to add to this data set.
The author has run a regression with 77 observations (more than sufficient) and has discovered that the experimental variable, UNPLUG, is not statistically significant.
Do you tell the Energy Conserver to give up, or is there more to the story?
EPILOGUE: The author is convinced that the Energy Conserver IS saving a material amount of energy and money; the data are not as representative as the Energy Conserver initially believed.
Cask Questions:
 What is a variable or set of variables you think needs to be included before you run a regression off the source data?
 Run your own regression model using all the data. Do the coefficients of your own variables make sense? Did the coefficient of any variable surprise you?
 You will likely find, like the author has, that Unplug is not a statistically significant variable. Given that a whopping 77 data points exist in the model, a conventional regression analyst could argue that the data are sufficiently representative, and the experimental variable is not meaningful. What might make you suspect this argument?
 Run a regression using 75 data points, from End Date June 25, 2008, to End Date August 25, 2014.
 Run a regression using 76 data points, adding End Date September 24, 2014.
 What do you notice about Unplug as you marginally add 1 data point?
 What is your advice to the household? How does the above exercise buttress your argument?
 How will you apply this to your own analyses in your career?
Zum Wohl!
Pay Discrimination (Regression – Binary, Control Variables)
Welcome to Cask Studies, where you can properly age your skills without getting old. Even sour grapes can become fine wines here.
Pay Discrimination (Regression – Binary, Control Variables)
A fictional case study by Gregory Taketa. Nondata managers can briefly read this document to see how data analysis helps in hidden ways. Meanwhile, regression practitioners can hone using binary and control variables with a MS Excel Data Set to approach a realistic problem. Babs, an experienced office worker, believes that she and her female counterparts are victims of gross pay discrimination based on gender.
Babs: “I’m telling you, Gregory, there’s blatant gender discrimination at my workplace!”
Me: “What is your evidence of that, Babs?”
Babs: “I’ve been hearing about so many men at the office who are paid over $50,000 a year while a number of experienced women are still being paid in the $40,000s.”
Me: “I can understand that you perceive an injustice at first glance. Have you ruled out other factors for differences in pay, including education, hours worked, and certain results achieved?”
Babs: “Well, nobody gets paid higher for being more educated, since our office work does not require formal education to be excellent. Some have claimed greater experience as a justification to be paid more. We also have a scorecard filled out by the manager during our performance reviews.”
Eventually, Babs and I discuss possible key factors the management uses for pay. The manager agrees that these are meaningful factors in deciding salary, and a random sample of 30 employees (14 men & 16 women) is interviewed. The factors and some statistics are shown below:
Variable  The Logic Behind the Variable  Men’s Average  Women’s Average 
Salary  This is the output variable and is the pretax total compensation for this year.  $53,529  $46,513 
Merit (Composite Performance Score)  Higher weightedaverage scores suggest more valuable results achieved in the eyes of management.The score assesses performance in terms of:

5.52 (out of 10)  5.16 (out of 10) 
Hours/Week  Whether you are highly talented or almost highly talented, hard work is valued.  42.4  40.3 
Years in This Position at the Company  Employee commitment and experience.  5.43  5.31 
Years in That Profession  Experience and Skills overall.  8.43  8.38 
# Raises  The more raises you were awarded, the higher your salary. Normally, the employee asks for a raise.  2.71  2.06 
Manager: “Babs, I think it’s quite clear that there is no gender discrimination here. As you can see, the men on average have been achieving better results, working harder, holding more experience, and asking for more raises. Our company’s business analysts and general counsel have confirmed this after seeing these statistics.”
Babs: “Gregory, is he right? Or did you interview the wrong people?”
Me: “The sample is fine, Babs. Although these averages are as your manager and advisors say, I have a feeling you still have a case.”
Very quickly, I demonstrated a $4,500 shortfall for each woman in the sample and gave the management a couple of easy suggestions to implement, including encouraging women to ask for more raises. Babs and her coworkers were thrilled to quickly receive a welldeserved $72,000 collectively.
Click Here to Download Data (MS Excel 2007+)
Cask Questions:
 Most data sets in real life tend to report qualitative data such as gender in text format (e.g. “Female,” “Male”). How would you change these data to work under a mathematical model such as regression?
 Many data analysts would be satisfied to have only 1 input variable for their regression. In this case, they would simply use the gender variable. Why is this a poor practice for this case?
 Although the “Merit” variable comes from subjective, ordinal data (ranked scores of 110), we often accept this as a necessity. What is the real problem behind the Merit score, and how could you quickly get management to mitigate that problem?
 Do any of those variables seem redundant or conflicting with each other? What can you do to make the analysis easier?
 Run a regression model using your own judgment.
 What sorts of statistical diagnostics can you use to check that your model has satisfactory support?
 What do you infer from your own findings?
 What advice would you give this company based on your findings?
My own approach is provided in the latter red tabs of the Excel file. There is more than 1 way to legitimately approach the problem, and I do not claim to have the “perfect” or “the absolutely right” method. However, we can examine ourselves and determine whether we have a satisfactory model and inferences. Did you learn anything new from my own example?
How likely would your own business analysts agree with the manager upon seeing the statistics? What new value does the data analyst provide?
Skol!