My question is an ongoing point of discussion between programmers and
DBAs; what is the correct circumstance to store a numeric string as
VARCHAR2 vs. NUMBER? Some of the examples would be zipcodes, addresses,
SSN# and customer IDs. The response I have received before is to
store all numeric strings as VARCHAR2 unless the intention is to
manipulate the field mathematically; but no reason has been given.
Could you please clarify?
One of the guidelines I use is to check if you will ever be performing arithmetic operations on the data. If so, then I store the data in a NUMBER datatype. If I will not be performing arithmetic on the data, then I am free to use the VARCHAR2 datatype. Personally, I cannot think of a reason to perform arithmetic on a zip code or phone number. So which would I choose? In this case, I would store data in a VARCHAR2 datatype. The biggest reason is that when stored in a VARCHAR2 datatype, you can retain your leading zeros, whereas in a NUMBER datatype, you cannot. For instance, many zip codes on the east coast start with a zero. If the zip code is "04217," then when I store this in a NUMBER datatype, I get back "4217" when I query for that value. I have now lost meaning to this data. I have lost the leading zero and when I print a zip code, I need to add that zero back. I should never have to guess if a number is needed in my data value. If that zero is needed, it should be stored in a datatype that supports it. Of course, you'll probably want to write a trigger to verify that the zip code is appropriate when inserted into the table. You do not want someone inserting "abcde" into a ZIP_CODE column.
Similarly, I store phone numbers in a VARCHAR2 datatype. This is typically because I have some sort of data verification routine before the phone number is inserted into the table to ensure that the value given is grammatically correct. For instance, you may decide that your phone number must be stored in the format (XXX)XXX-XXXX complete with the parentheses and the minus sign. You could opt to store this in a NUMBER datatype as XXXXXXXXXX if you want. There are no area codes in the US that have a leading zero, so you will not be losing meaning in this case. But you probably will not be performing arithmetic on these values either. It is more likely that you will be performing string manipulation instead. For instance, you might query for all phone numbers in the 555 area code (WHERE phone_num LIKE '555%'). If you were storing this value in a NUMBER datatype, a string conversion would need to take place. So a string datatype makes a little more sense here.
This was first published in August 2005