Problem solve Get help with specific problems with your technologies, process and projects.

# Using SQL to calculate nearby ZIP codes

## This tip shows how to use SQL to calculate nearby ZIP codes.

Any type of application that involves names and addresses inevitably involves ZIP codes as well. If you've built...

the data backend for such an application, you've probably dealt with databases containing ZIP codes by city/state or even by specific street address. But have you ever needed to find ZIP codes within a certain distance of a given ZIP code, city, or street address? Maybe your company needs a "find a store near you" function for their web site. Or maybe your marketing department has asked to generate targeted mailing lists from ZIP code-based customer demographic information from the corporate database. Where do you start when you need to calculate the distance between two ZIP codes, cities, or other geographical features? If you're like me, you start with spherical coordinates and a lesson in trigonometry! Don't worry, if you want to skip to the end, I promise you'll get all the information you need without the gory details.

#### The Data

The very first ingredient for building a ZIP code-oriented application is the appropriate data. At a minimum, we'll need the ZIP code and a set of latitude and longitude values for that ZIP code. This information is available from a number of commercial vendors, which provide a lot of added value in the form of demographic information and other location-based information garnered from both government and private source. A short search of the web turned up over a dozen vendors with prices ranging from the low hundreds to well over a thousand. Another alternative is to choose a completely free data set. The best one I found was part of the information available from the U.S. Census Bureau. The downside is that it is based on the 1990 census, only included 5-digit ZIP codes, and has limited additional information that is completely out of date. But it gives us a functional database for building this application. The database can be downloaded in text format from this site.

One crucial factor to determine about the latitude and longitude information is whether it is in degrees/minutes/seconds format or in the decimal equivalent--the decimal equivalent is what you want. The quick way to translate is

Decimal value = degrees + (minutes/60) + (seconds/3600)

#### The Formula

Once you've got the ZIP code and latitude/longitude data, we've got the spherical coordinates to describe a location. The next step is to determine the distance between any two locations using the spherical coordinates. You'd think this would be a straightforward process, but in reality it is fairly complex to come up with an extremely precise distance between two points on the surface of the earth. For an extremely good discussion of the problem, and for the source I used to build my code, you can refer to this document, which discusses the details of what is basically the problem aircraft pilots face every day. For our purposes of calculating the distance between any two ZIP codes in the US, we can settle for the accuracy of the Haversine formula:

Distance = 2R *arcsin (x)
where x=(sin((lat2-lat1)/2))^2 + cos(lat1) * cos(lat2) * (sin((long2-long1)/2))^2.

It is crucial to realize that the trigonometric functions of most databases are expecting arguments to be in radians instead of degrees. The best approach is to pre-calculate the latitude and longitude values for each location in the database in terms of radians so that the SQL stored procedure will have that many less calculations to perform. The formula for the conversion is

radians = 57.295*degrees (or 180/pi * degrees)

#### The Fine Print

The algorithm presented here works extremely well for calculating the distance between two ZIP codes in the US (or cities or other locations). But if you read the full discussion about the various ways to calculate the distance between two points on the surface of the Earth (http://www.census.gov/cgi-bin/geo/gisfaq?Q5.1), you'll notice that the Haversine formula does not work perfectly for all possibilities. As the mathematically-inclined would say, it "blows up" under certain circumstances which results in an undefined mathematical value and thus a runtime error in the calculation. For the purposes of ZIP codes however, there's no problem since the distances involved are well shy of the danger points. It's just important to realize that you'd need to use a more sophisticated approach to calculate the distance between any two arbitrary points on the surface of the Earth. As the article shows, there are simpler formulas, but for ZIP code calculations, the Haversine formula is a good tradeoff between accuracy and performance.

It is also important to understand a little bit about how ZIP codes are distributed on the map of the US. It would be nice to think that they are all regularly-sized polygons that reproducibly blanket the country and are easily related to the latitude and longitude...but of course real life is never that easy. ZIP codes are basically ways to describe postal routes. Many of them represent squares or ovals or other regularly-shaped polygons with an easily-defined center, but some, especially in rural areas, are shaped in an arc along a road or highway. Keep in mind that each database of ZIP codes and latitude/longitude values makes different assumptions about how to assign a physical location to a ZIP code--the value for any given ZIP code will vary the equivalent of a few blocks to a few miles depending on the shape of the ZIP code. This means that for closely spaced ZIP codes, the distances may be off by a large relative percentage. For most applications, this will not matter, but it is an important limitation to consider.

#### The Bottom Line

The goal of this article is to create a stored procedure to determine all of the ZIP codes within a certain radius of a starting ZIP code. We'll use the Haversine formula and a database of latitude/longitude and ZIP codes to make the calculation. I implemented this in SQL Server 7.0, but any database should be fine as long as it has the arcsin, cosine, and sine trigonometric functions and the ability to do square roots and raise values to a power (Microsoft Access is out, but MySQL, Oracle, and DB2 certainly work).

``` CREATE PROCEDURE zip_nearby_locations @lat1 float, @long1 float, @radius float AS SELECT (2*3956*ASIN( (SQRT( POWER(SIN(((rlat-@lat1))/2),2) + COS(@lat1) * COS(rlat) * POWER(SIN(((abs(rlong)-@long1))/2),2) )) ) ) AS dist, location, zipcode FROM Zipcodes WHERE (2*3956*ASIN( (SQRT( POWER(SIN(((rlat-@lat1))/2),2) + COS(@lat1) * COS(rlat) * POWER(SIN(((abs(rlong)-@long1))/2),2) )) ) ) < @radius ORDER BY dist GO
```

In this stored procedure, R (radius of the earth) is 3956 miles so our result comes out in miles. The fields rlat and rlong are the latitude and longitude in radians. This procedure basically calculates the distance between the given ZIP and every ZIP in the database and numerically sorts the ones within the given radius.

As an example, try determining the number of ZIPs within 1 mile of the White House (ZIP 20001). Using the public domain database from the Census Bureau, I got the following 6 results:

``` ZIP Location Distance 20001 WASHINGTON, DC 0.0 20405 ZIP CODE 20405 0.52719256096716194 20060 ZIP CODE 20060 0.70030985977018034 20005 WASHINGTON, DC 0.7223858318763472 20552 ZIP CODE 20552 0.75187311639892684 20038 WASHINGTON, DC 0.840488509229948
```

If you get different results, first check whether the trigonometric functions in your database expect radians (most do), and if so ensure that your latitude/longitude data is in radians. This is the most common problem other than a typo in the equation.

Based on a very rough profiling of the stored procedure, the majority of the processing time is the ORDER BY statement. If you don't need an ordered list of nearby ZIP codes, dropping this clause should maximize performance.

I've put the code on SourceForge as well as the database I used so that you can keep up with changes and discuss any problems with the code. It is fairly easy to take this starting point and use it and a good database of ZIP codes, locations, and demographic data to explore all sorts of possible functionality.

John Paul Ashenfelter is CTO & President of TransitionPoint.com, a technology development and analysis group focusing on web database applications. He is the author of "Choosing a Database for Your Web Site" (John Wiley & Sons, Inc.) and the co-author of "ColdFusion 4 for Dummies" (IDG Books). He also founded and maintains the webDatabase.org Web site.

This was last published in February 2001

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchDataManagement

• ### Hyperledger Fabric offers path to enterprise blockchain future

Blockchain arose from bitcoin, but it's looking to find a place in the enterprise. Frameworks like Hyperledger Fabric could ...

• ### MongoDB 4.0 takes ACID transactions to multi-document level

MongoDB is taking a deeper step into SQL-style processing waters with a 4.0 update that brings increased support for ...

• ### Data lake concept needs firm hand to pay big data dividends

Data lakes pose technology deployment and data management challenges that can leave analytics users high and dry if the ...

• ### AI functionality limited today but could be a game-changer

Limited AI capabilities could soon give way to technology that is truly transformative for enterprises, surpassing the overhyped ...

## SearchSAP

• ### SAP's Barry Padgett on future of SAP Ariba Network

In this Q&A, new SAP Ariba President Barry Padgett discusses the future of procurement and the experience he will bring to Ariba ...

• ### Avoiding SAP indirect access woes requires good faith

Some customers are concerned that SAP will hit them for indirect access licensing fees, but they can avoid trouble if they act in...

• ### ControlPanelGRC app eases Steelcase's compliance pain

When Steelcase's SAP environment grew in size and complexity, it turned to Symmetry ControlPanelGRC to save time, have more ...

## SearchSQLServer

SQL Operations Studio simplifies routine administration of SQL Server and Azure SQL databases, making database development and ...

• ### Meltdown and Spectre fixes eyed for SQL Server performance issues

Microsoft has responded to the Spectre and Meltdown chip vulnerabilities with patches and other fixes. But IT teams need to sort ...

• ### Five SQL Server maintenance steps you should take -- ASAP

Putting off SQL Server administration tasks can lead to database problems. Enact these often-neglected maintenance items to help ...

## TheServerSide.com

• ### How DevOps concepts eluted from cloud computing and service platforms

The popularity of DevOps can be traced back to the emergence of cloud computing. As programmers began scripting their ...

• ### Pluralsight IQ, Stack Overflow boost developer street cred

Tying the Pluralsight IQ skills test to the Stack Overflow Developer Story helps developers measure their technical skills and ...

• ### Why this quantum computing breakthrough is a security risk

Quantum computing will void pretty much all security encryption techniques and open the door to hackers. Here's how to protect ...

## SearchDataCenter

• ### IBM Power9 servers seek market inroads to AI, cloud

IBM follows up its first Power9 server with a raft of systems designed to appeal to a wider array of markets -- most notably, AI ...

• ### Evaluate read-intensive and write-intensive SSD use cases

Consider write wear, performance and other factors when choosing between read-intensive, write-intensive and mixed-use ...

• ### Some hyper-converged infrastructure use cases pose pitfalls

Hyper-converged infrastructure adoption is skyrocketing, but that doesn't mean that the technology is the best choice for every ...

## SearchContentManagement

• ### Content management in the cloud a main theme in 2018

The future of content management resides in the cloud and with AI, as several 2018 conferences will assure you.

• ### Six things to know about today's SharePoint implementations

As companies migrate their on-premises Microsoft SharePoint sites to the cloud, here are some things they should know about the ...

• ### Upgrades for the SharePoint Online portal

As more organizations migrate SharePoint sites to the cloud, Microsoft has increased at-a-glance dashboard data and analytics to ...

## SearchHRSoftware

• ### HR plays to win with gamification for learning

At some companies, corporate education has become a tool for engagement and employee experience. See how gamification is putting ...

• ### HR recruiting tools get VC funding, as tech wages rise

The HR recruiting tools market shows no shortage of investor interest. In two months, three platforms have launched. Rising tech ...

• ### Social media key to future of learning and development

Employees aren't just wasting time on social media; some are learning the career skills that will take them to the next level. ...

Close