# Using SQL to calculate nearby ZIP codes

## This tip shows how to use SQL to calculate nearby ZIP codes.

Any type of application that involves names and addresses inevitably involves ZIP codes as well. If you've built...

the data backend for such an application, you've probably dealt with databases containing ZIP codes by city/state or even by specific street address. But have you ever needed to find ZIP codes within a certain distance of a given ZIP code, city, or street address? Maybe your company needs a "find a store near you" function for their web site. Or maybe your marketing department has asked to generate targeted mailing lists from ZIP code-based customer demographic information from the corporate database. Where do you start when you need to calculate the distance between two ZIP codes, cities, or other geographical features? If you're like me, you start with spherical coordinates and a lesson in trigonometry! Don't worry, if you want to skip to the end, I promise you'll get all the information you need without the gory details.

#### The Data

The very first ingredient for building a ZIP code-oriented application is the appropriate data. At a minimum, we'll need the ZIP code and a set of latitude and longitude values for that ZIP code. This information is available from a number of commercial vendors, which provide a lot of added value in the form of demographic information and other location-based information garnered from both government and private source. A short search of the web turned up over a dozen vendors with prices ranging from the low hundreds to well over a thousand. Another alternative is to choose a completely free data set. The best one I found was part of the information available from the U.S. Census Bureau. The downside is that it is based on the 1990 census, only included 5-digit ZIP codes, and has limited additional information that is completely out of date. But it gives us a functional database for building this application. The database can be downloaded in text format from this site.

One crucial factor to determine about the latitude and longitude information is whether it is in degrees/minutes/seconds format or in the decimal equivalent--the decimal equivalent is what you want. The quick way to translate is

Decimal value = degrees + (minutes/60) + (seconds/3600)

#### The Formula

Once you've got the ZIP code and latitude/longitude data, we've got the spherical coordinates to describe a location. The next step is to determine the distance between any two locations using the spherical coordinates. You'd think this would be a straightforward process, but in reality it is fairly complex to come up with an extremely precise distance between two points on the surface of the earth. For an extremely good discussion of the problem, and for the source I used to build my code, you can refer to this document, which discusses the details of what is basically the problem aircraft pilots face every day. For our purposes of calculating the distance between any two ZIP codes in the US, we can settle for the accuracy of the Haversine formula:

Distance = 2R *arcsin (x)
where x=(sin((lat2-lat1)/2))^2 + cos(lat1) * cos(lat2) * (sin((long2-long1)/2))^2.

It is crucial to realize that the trigonometric functions of most databases are expecting arguments to be in radians instead of degrees. The best approach is to pre-calculate the latitude and longitude values for each location in the database in terms of radians so that the SQL stored procedure will have that many less calculations to perform. The formula for the conversion is

radians = 57.295*degrees (or 180/pi * degrees)

#### The Fine Print

The algorithm presented here works extremely well for calculating the distance between two ZIP codes in the US (or cities or other locations). But if you read the full discussion about the various ways to calculate the distance between two points on the surface of the Earth (http://www.census.gov/cgi-bin/geo/gisfaq?Q5.1), you'll notice that the Haversine formula does not work perfectly for all possibilities. As the mathematically-inclined would say, it "blows up" under certain circumstances which results in an undefined mathematical value and thus a runtime error in the calculation. For the purposes of ZIP codes however, there's no problem since the distances involved are well shy of the danger points. It's just important to realize that you'd need to use a more sophisticated approach to calculate the distance between any two arbitrary points on the surface of the Earth. As the article shows, there are simpler formulas, but for ZIP code calculations, the Haversine formula is a good tradeoff between accuracy and performance.

It is also important to understand a little bit about how ZIP codes are distributed on the map of the US. It would be nice to think that they are all regularly-sized polygons that reproducibly blanket the country and are easily related to the latitude and longitude...but of course real life is never that easy. ZIP codes are basically ways to describe postal routes. Many of them represent squares or ovals or other regularly-shaped polygons with an easily-defined center, but some, especially in rural areas, are shaped in an arc along a road or highway. Keep in mind that each database of ZIP codes and latitude/longitude values makes different assumptions about how to assign a physical location to a ZIP code--the value for any given ZIP code will vary the equivalent of a few blocks to a few miles depending on the shape of the ZIP code. This means that for closely spaced ZIP codes, the distances may be off by a large relative percentage. For most applications, this will not matter, but it is an important limitation to consider.

#### The Bottom Line

The goal of this article is to create a stored procedure to determine all of the ZIP codes within a certain radius of a starting ZIP code. We'll use the Haversine formula and a database of latitude/longitude and ZIP codes to make the calculation. I implemented this in SQL Server 7.0, but any database should be fine as long as it has the arcsin, cosine, and sine trigonometric functions and the ability to do square roots and raise values to a power (Microsoft Access is out, but MySQL, Oracle, and DB2 certainly work).

``` CREATE PROCEDURE zip_nearby_locations @lat1 float, @long1 float, @radius float AS SELECT (2*3956*ASIN( (SQRT( POWER(SIN(((rlat-@lat1))/2),2) + COS(@lat1) * COS(rlat) * POWER(SIN(((abs(rlong)-@long1))/2),2) )) ) ) AS dist, location, zipcode FROM Zipcodes WHERE (2*3956*ASIN( (SQRT( POWER(SIN(((rlat-@lat1))/2),2) + COS(@lat1) * COS(rlat) * POWER(SIN(((abs(rlong)-@long1))/2),2) )) ) ) < @radius ORDER BY dist GO
```

In this stored procedure, R (radius of the earth) is 3956 miles so our result comes out in miles. The fields rlat and rlong are the latitude and longitude in radians. This procedure basically calculates the distance between the given ZIP and every ZIP in the database and numerically sorts the ones within the given radius.

As an example, try determining the number of ZIPs within 1 mile of the White House (ZIP 20001). Using the public domain database from the Census Bureau, I got the following 6 results:

``` ZIP Location Distance 20001 WASHINGTON, DC 0.0 20405 ZIP CODE 20405 0.52719256096716194 20060 ZIP CODE 20060 0.70030985977018034 20005 WASHINGTON, DC 0.7223858318763472 20552 ZIP CODE 20552 0.75187311639892684 20038 WASHINGTON, DC 0.840488509229948
```

If you get different results, first check whether the trigonometric functions in your database expect radians (most do), and if so ensure that your latitude/longitude data is in radians. This is the most common problem other than a typo in the equation.

Based on a very rough profiling of the stored procedure, the majority of the processing time is the ORDER BY statement. If you don't need an ordered list of nearby ZIP codes, dropping this clause should maximize performance.

I've put the code on SourceForge as well as the database I used so that you can keep up with changes and discuss any problems with the code. It is fairly easy to take this starting point and use it and a good database of ZIP codes, locations, and demographic data to explore all sorts of possible functionality.

#### About the Author

John Paul Ashenfelter is CTO & President of TransitionPoint.com, a technology development and analysis group focusing on web database applications. He is the author of "Choosing a Database for Your Web Site" (John Wiley & Sons, Inc.) and the co-author of "ColdFusion 4 for Dummies" (IDG Books). He also founded and maintains the webDatabase.org Web site.

This was last published in February 2001

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchDataManagement

• ### With better scaling, semantic technology knocks on enterprise's door

Cambridge Semantics CTO Sean Martin says better scalability can lead to richer representations of data. Such advances are behind ...

• ### Big data management and analytics weather tumult -- with more in store

Cloud had a big impact on big data management and analytics last year. Machine learning and streaming designs will contribute to ...

• ### Embedded analytics to feel widest impact of machine learning projects

Ovum analyst Tony Baer discusses machine learning tools, IoT-driven streaming analytics and Hadoop in the cloud, all of which ...

• ### Efforts to monetize data should be built for the long haul

Most companies have data monetization opportunities they could exploit, experts say. But a clear strategy and long-term plan are ...

• ### Debate over big data and privacy is just getting started

For years, the tension between privacy and big data has been apparent, but with emerging technologies generating huge amounts of ...

• ### Missions for monetizing data need lift from upfront groundwork

Organizations launching data monetization strategies should factor some key initial steps into their plans to develop ...

## SearchSAP

• ### SAP HANA, express edition makes HANA easier for non-SAP developers

Relaxed RAM and licensing requirements and the ability to run on local machines could help popularize the HANA in-memory platform...

• ### SAP futurists name five future technology trends to plan for now

Three SAP futurists explain what future technology trends in augmented reality, blockchain, AI, robotics, and contingent labor ...

• ### Abakus acquisition beefs up SAP Hybris marketing portfolio

SAP acquires Abakus marketing measurement software for SAP Hybris marketing suite; openSAP adds new courses for S/4HANA, data ...

## SearchSQLServer

• ### Options for scaling out SQL Server applications to boost workloads

Scaling out a database to meet the needs of a heavy processing workload can be a challenge. Here are details on the SQL Server ...

• ### Four trends that will impact SQL Server DBAs in 2017

Flash storage adoption, cloud computing's growth, Linux's increased importance and broader big data integration are a few trends ...

• ### DATEADD and DATEDIFF SQL functions for datetime values

DATEADD and DATEDIFF SQL functions allow you to easily perform calculations, like adding a time interval from a datetime value. ...

## TheServerSide

• ### Is JSON and XML your REST performance bottleneck?

Learn how ASCII encoding formats like JSON and REST can adversely impact server application performance compared with emerging ...

• ### Making multi-cloud deployment a reality at Netflix with Spinnaker

Learn how Netflix leveraged automation and Spinnaker to perform 4,000 multi-cloud deployments per day.

• ### The benefits of Java microservices in a Docker and DevOps world

To seasoned developers, microservices may sound like SOA by another name. But Java microservices apps delivered via Docker ...

## SearchDataCenter

• ### HPE-SimpliVity deal raises support, price and development questions

With HPE's buy of No. 2 SimpliVity -- the first big deal in the HCI space -- IT pros see a more robust offering, but also higher ...

• ### Converged infrastructure drop-off doesn't mean data center death

Traditional converged infrastructure has been supplanted by hyper-converged infrastructure and cloud computing, but it remains a ...

• ### New options to evolve your data backup and recovery plan

The server backup market first evolved to protect VMs, but now it's undergoing another transformation. Find out how it's evolved ...

## SearchContentManagement

• ### Why SharePoint Framework could benefit business, not just developers

SharePoint Framework heralds a new, mobile-friendly, low-code era for Microsoft, which may give some organizations without .NET ...

• ### Using Microsoft MyAnalytics to improve workers' time management

Microsoft MyAnalytics has been hailed as the next great thing for analyzing worker productivity. But how can employees be assured...

• ### Is the Drupal content management system right for your business?

The open source Drupal content management system offers users responsive design and WYSIWYG content and web development tools, as...

## SearchFinancialApplications

• ### ADP acquires performance coaching and employee engagement software

The Marcus Buckingham Company's cloud-based performance management and employee engagement software is set to be integrated into ...

• ### Six keys to creating strong data-security measures

The rush to embrace digital technologies can put organizations at extreme risk. Here are six foundations for creating an ...

• ### Group-chat software sees explosive growth and intense competition

Microsoft Teams and Workplace by Facebook are facing off against rivals such as HipChat and Slack in a high-stakes competition ...

Close