If you ask my co-workers to describe me, there is a solid chance you would hear the phrase “map geek.” My love for maps springs from how incredibly insightful and useful I find them to be. I frequently work with investor-owned utilities to analyze participation in their energy efficiency programs and, when starting these projects, I almost always recommend examining program data using geographic analysis (or spatial analysis, as it is now called by those of us in the field). Even simple spatial analyses illustrate complex relationships, characterize the real-world experiences of customers, and make data more accessible for all parties. Developing these analyses can allow utilities to micro-target marketing efforts, predict future program participation, and develop new programs that cater to specific customer needs.
How you actually go about completing a spatial analysis, however, drastically affects the usefulness of the results. In my work, the geographic unit of analysis that is used to bind data together will determine how well your analysis represents the customer population. Using the wrong unit could not only skew the conclusions, but ultimately lead to decisions for a program that don’t effectively address customers’ needs.
Not all units are created equal, and there is none more vilified in my mind than the ubiquitous, seemingly innocuous zip code.
I know what you must be thinking: Every customer has a zip code, they are often included in customer databases, and you can group customers together without any further data processing. Zip codes are clearly the easiest way to spatially analyze data. Additionally, many companies requesting zip codes as the unit of choice have seen zip code spatial analyses performed in the past, and they would like to be able to compare the analyses over time. Skipping further data processing and aligning with past analyses should mean that using zip codes will deliver the most bang for your buck, right? I get it. I really do.
But before you order up that research, here’s what I want you to know about zip codes:
- Five-digit zip codes are a fairly arbitrary grouping chosen by the U.S. postal service in the 1960s to more easily deliver mail. Because of that goal, places that receive more mail, such as universities or large businesses, may receive their own individual postal code.  By any other metric, postal codes are not uniform. Postal codes do not contain a set number of households, do not span a unified amount of geographic area, and are not shaped to conform to any particular geographic feature.
Because of this, you end up with zip codes like 79936. This beastly zip is located just outside of El Paso, Texas, and contains more than 114,000 residents.
Now compare that to 05901, which contains just 19 people. 
At the time I examined these populations, I found that there were more than 100 zip codes with a population of less than 20 people, and there were more than 100 zip codes with a population over 70,000 people. This means that zip codes with extremely small or extremely large populations are not just a fluke. If you use zip codes to analyze your data, you are comparing one unit to another that could be up 6,000 times the size of another. This is not what I’d call the ideal “apples to apples” comparison.
- There is also an issue with zip code accuracy. Customers often report extremely outdated zip codes as part of their address. Zip codes change at the whim of the postal service. This means to analyze their data at the zip code level, you would need to (1) pick a version of the zip codes you consider the “true” zip code, and then (2) map all customers without their zip code attached and trace back to the map of “true” zip codes for each person. Fixing these errors takes a good deal of time and can eat into the budget of a project.
My advice is to think about the full life cycle of your data and to consider performing an analysis at the smallest “unit of analysis.” In a spatial analysis, the unit of analysis is the boundary box you plan to put around your data. For most organizations, customer data is harnessed in many departments and for many purposes. Though examining customer data at the zip code level may seem to serve your needs, it may not serve your entire organization’s needs. To make the data as nimble and widely useful as possible, I advise using a unit of analysis that has no boundary box (i.e., one household or one person). Analyzing spatial data at the household level allows you to develop a detailed understanding of customer behavior that is not muddied by boundaries and that can be used in a wide variety of applications.
If you can’t get your hands on individual household data, I recommend choosing a unit that groups data together in a logical manner. I prefer to use census block group (CBG) data. The CBG unit was developed by the U.S. Census Bureau and contains between 600 and 3,000 people. The bar graph below compares the minimum and maximum populations of zip codes versus CBGs.
CBGs also try to map to natural geographic features or known neighborhood boundaries, which helps CBGs more effectively characterize their contained populations. Customer groups tend to cluster on one side of natural neighborhood barriers such as highways or rivers. Utilities can more effectively micro-target customers, predict customer behavior, and effectively design new programs if they use a spatial analysis unit that aligns with natural customer groupings.
Selecting the proper unit of analysis is but one of the many ways to ensure that spacial analysis can benefit program evaluations.
See our spatial analyses in action:
For one of the nation’s largest natural gas utilities, we used spatial analysis to help them determine which sub-sectors of their customers had the greatest potential for energy savings. By using census tract data (one level up from a census block), rather than zip codes, we were able to analyze datasets that representated our client’s customers, rather than groupings only meaningful to the postal service. Additionally, using census tract data for this study enabled us to the link customer data to statewide environmental justice indicators, which provided richer data that could be used for a wider range of purposes than would have been possible with zip codes.