Extracting a part number from a text field
Our database contains a text field that users input information in that is not in a specific format. We need to extract data (specifically a part number) from this field for reporting purposes. What would be the best method to extract this information? The part numbers always begin with a 0, 1, 5 or 7 and are 7 characters in length.
If the text field is exactly 7 characters long, then it either contains a part number or it doesn't --
select textfield from yourtable where substring(textfield from 1 for 1) in ('0','1','5','7')
If it's longer than 7 characters, you're in trouble. You want my honest answer? Write a program. In robust databases like Oracle and SQL/Server, the program can be a stored procedure. But do yourself a favour and don't try it in SQL.
To get an idea how tough this is, consider that the SQL has to extract a 7-character substring from somewhere within the text field, starting at the first position within the text field where a certain character is found. The problem is, you don't have too many tools -- expressions and functions -- at your disposal for this kind of processing.
Let's start by extracting part numbers that begin with a 5 --
select substring( textfield from position('5' in textfield) for 7 ) from yourtable where position('5' in textfield) > 0
The WHERE condition ensures that you actually do find a '5' somewhere in the text field. That's important, because the POSITION function returns a 0 if it doesn't find what it's looking for, and then the SUBSTRING function would end up taking the first six characters of the text field (since position 0 is outside of the string).
However, the above query will still produce incorrect results. Consider the following rows --
row textfield 1 part number 5093705 2 5234567 is the part number 3 i wanted 5 widgets, fred, not 3 4 6365937 is the part number
The query above will return
5093705 5234567 5 widge 5937 is
So perhaps you need to look for ' 5' instead, i.e. with a leading blank. But that wouldn't find the 2nd row above, so you'd have to test for that condition separately...
select substring( textfield from position(' 5' in textfield) + 1 for 7 ) from yourtable where position(' 5' in textfield) > 0 union all select substring( textfield from 1 for 7 ) from yourtable where substring( textfield from 1 for 1 ) = '5'
This would still pick up rows 3 and 4 (incorrectly). And that's only the beginning of our problems, because we've only done part numbers beginning with a 5.
What you really want is a stored procedure that can loop through the text field character by character. PL/SQL and Transact-SQL can do this -- contact your resident database programmer.
For More Information
- The Best SQL Web Links: tips, tutorials, scripts, and more.
- Have a SQL tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical SQL questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts yourself: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.