Home » Articles posted by Gregory Taketa (Page 4)
Author Archives: Gregory Taketa
Linear Regression Without Depression Tutorial
The purpose of this MS Excel-based Linear Regression tutorial is to teach a practitioner for a non-academic organization the basics necessary to form a model, test the model, and refine it. Secondary-school level algebra is required to read a model. Elementary statistics is needed to test a model. We cover the necessary skills step-by-step in the various colored sections.
To expedite the process, certain features which could be taught in an academic setting are omitted (e.g. a calculus and matrix-based proof of how to do linear regression).
By the end of this tutorial, you will be able to:
- Run linear models
- Attempt basic nonlinear models (e.g. power models, binary variables)
- Test the model statistically
- Test the model heuristically
- Apply the practice to business needs
I have written this tutorial on Excel so that you can easily see the data, formulas, charts, and commentaries all on one screen (instead of switching your head from screen to a book all the time).
To use a Linear Regression Package:
Windows/PC: MS Excel has Regression in the Data Analysis Add-On.
Mac: Statplus offers a Regression tool.
R: You can get R for free and use its own regression tool.
Please seek instructions for the respective toolkit to use their Regression application. I will not reinvent the wheel with my own set of instructions.
I have provided quizzes to test your understanding, since we never really know what we “don’t know.” Solutions are provided in a separate file. Good luck! -Gregory Taketa, Data Analyst Extraordinaire
Download:
Linear Regression Without Depression (XLSX)
LRWD Solutions (XLSX)
Winning Formulas (Regression – Fun Introductory Group Exercise)

Welcome to Cask Studies, where you can properly age your skills without getting old. Even sour grapes can become fine wines here.
Winning Formulas (Regression – Fun Introductory Group Exercise)
An inebriating exercise by Gregory Taketa for both Non-Data Decision Makers and Soon-To-Be Data-Based Decision Makers to quickly grasp how to use regression analysis (no serious math needed, don’t worry). This is also an excellent social exercise and similar to the fun Feedforward exercise developed by famous Executive Coach Marshall Goldsmith.
OBJECTIVE: You create your own “winning formulas” based on your life experience and then collect “winning formulas” from other people.
HOW TO WRITE YOUR WINNING FORMULA (for your convenience, a Game Piece is provided at the end of the PDF document):
- On the Left side of the formula, write an ACHIEVEMENT, a result or outcome you have achieved at some point in your life.
- “Got a 3.8 GPA” is an achievement.
- “Got married to a beautiful spouse” is an achievement, though some couples question that years later…
- “Received a promotion” is an achievement.
- “Worked 40 hours a week” is NOT an achievement. It is an input.
- “Exercised 3 times a week” is NOT an achievement. That is an input, too.
- An ACHIEVEMENT, simply, is something you can succeed or fail to get. An INPUT, in contrast, is something you do or do not.
- On the Right side of the formula, write 3-5 INPUTS which you think led to getting that ACHIEVEMENT.
- For example, a friend won a lot of scholarship money (ACHIEVEMENT) because he applied a lot (Input #1), told stories (Input #2), and did volunteer work (Input #3).
- Fewer than 3 Inputs is not descriptive enough, while more than 5 Inputs is overkill (your credit scores are usually measured with 5-7 Inputs)
- For each Input, give a POWER score, such that higher scores indicate higher importance.
- For example, if you have 4 Inputs, you might give your most important Input a score of 4, and your least important Input a score of 1.
- Ideally, no 2 Inputs have the same score, so ranking might be a safe way to score.
- Be creative. You might not have as much fun writing formulas like “I made a lot of money by working hard (Score 3), studying hard (Score 1), and making lots of friends (Score 2).”
- Don’t worry too much about the accuracy of your choices. This exercise is for you, and nobody is judging.
When you have 3 Winning Formulas listed, begin exchanging with other people.
RULES OF THE STOCK EXCHANGE:
- You give 1 formula, you get 1 formula.
- NO CRITIQUING. Especially you self-proclaimed experts! Everybody is at risk of attribution error, period. The goal of this exercise is to see others’ worldviews about achievement and their opinions of the meaningful factors. We’re not here to judge; we’re here to help each other.
- Asking clarifying questions is okay, but there is no room for opinions nor argument.
- After accepting the formula in a nonjudgmental manner, say “Thank you.”
- This exchange should last no more than 5-10 minutes.
ASSESSING YOUR PORTFOLIO, YOUR BOOK OF WINNING FORMULAS:
You now have some inventory about your successes and others’ successes. In a short exercise, you had nothing to lose and everything to gain. A host could set up a game such that a winner who collected the most formulas wins a prize, but I think the true prize is collecting a portfolio of diverse winning formulas. On the Game Piece I provide, there are reflective questions to help you.
Cheers!
Electric Bill Savings (Regression – Binary, Subsampling)

Welcome to Cask Studies, where you can properly age your skills without getting old. Even sour grapes can become fine wines here.
Electric Bill Savings (Regression – Binary, Subsampling)
A real case study by Gregory Taketa. Non-data managers can briefly read this document to see how data analysis helps in hidden ways. Meanwhile, regression practitioners can hone using binary and control variables with a MS Excel Data Set to approach a realistic problem.
A house of 4 in the San Francisco Bay Area is in the highest tier of electricity usage, and the members desire to decrease their electric consumption to lower the utility bill.
Upon the advice of Pacific Gas & Electricity (PG&E), the utility supplier, one member decides to unplug a number of appliances. The rationale is that many appliances still employ residual energy while plugged in, even if they are not activated. In general, expected savings are 5%, or about $60/year.
Unfortunately, implementation is not so simple. While one member unplugs many appliances, other members are disgruntled at having yet another (3 second) chore to plug in their appliances (e.g. the portable tv).
Because of this tradeoff, the energy conserver of the house has decided that if unplugging does not decrease consumption by a statistically significant amount, then the efforts are too immaterial for the inconvenience.
PG&E has made the following 6 years’ data of electric consumption available to the household (some information, e.g. account information, has been omitted to maintain confidentiality of the household). The author has also added data, including the #people in the household for any given month, and the months of unplugging (3 months total):
Click Here to Download Data (MS Excel 2010+)
The relevant variables given in the monthly Source Data are as follows:
- kWh: the electric energy consumption measured in kilowatt-hours (output variable)
- People: the # of people living in that house for the respective months
- Unplug: a binary variable: “0” for when the Energy Conserver did NOT unplug the applicances, and a “1” for when the Conserver did (again, 3 months total).
You may consider other variables to add to this data set.
The author has run a regression with 77 observations (more than sufficient) and has discovered that the experimental variable, UNPLUG, is not statistically significant.
Do you tell the Energy Conserver to give up, or is there more to the story?
EPILOGUE: The author is convinced that the Energy Conserver IS saving a material amount of energy and money; the data are not as representative as the Energy Conserver initially believed.
Cask Questions:
- What is a variable or set of variables you think needs to be included before you run a regression off the source data?
- Run your own regression model using all the data. Do the coefficients of your own variables make sense? Did the coefficient of any variable surprise you?
- You will likely find, like the author has, that Unplug is not a statistically significant variable. Given that a whopping 77 data points exist in the model, a conventional regression analyst could argue that the data are sufficiently representative, and the experimental variable is not meaningful. What might make you suspect this argument?
- Run a regression using 75 data points, from End Date June 25, 2008, to End Date August 25, 2014.
- Run a regression using 76 data points, adding End Date September 24, 2014.
- What do you notice about Unplug as you marginally add 1 data point?
- What is your advice to the household? How does the above exercise buttress your argument?
- How will you apply this to your own analyses in your career?
Zum Wohl!