Q
Problem solve Get help with specific problems with your technologies, process and projects.

Normalize a column containing a list

I've just completed an assignment for my databases class in which we were given a database and told to put it in...

1NF. Several of the columns contained multiple values (e.g. 1;2;3;4), and I wanted a new row for each value in the column. I ended up projecting out the columns with multiple values and then importing them back in split up by semicolons, copying them into the original database and then using a bunch of select statements to get it down to what I wanted. I'm dying to know what a better way to do this is. If you don't understand what I'm talking about, just try to explain how to split a column that contains multiple values separated by a semicolon into separate columns.

The obvious way to do this is with a scripting language. Both Microsoft SQL Server and Oracle have comprehensive languages (Transact-SQL and PL/SQL, respectively) in which this would be both easy and efficient.

The scripting approach involves a loop, in which the target column is searched for an occurrence of the list separator (a semicolon in your case, more often a comma). In pseudocode:

variable tempstring varying character
set tempstring = columnvalue
while position(';' in tempstring) > 0
  begin
    insert 
      into normalizedtable 
         ( newcolumn )
    values 
         ( left( tempstring
               , position(';' in tempstring)-1) ) 
    set tempstring = 
      substring(tempstring
         from position(';' in tempstring)+1
          for length(tempstring)
              - position(';' in tempstring) )               
  end 
insert 
  into normalizedtable
     ( newcolumn )
values
     ( tempstring )

The POSITION function finds the separator, or returns zero if one is not found. If a separator is found, then the substring, up to the position before the separator, is extracted with the LEFT function, and used in the insert statement to insert a new row. Then the substring just used is chopped out of the string, by setting the string equal to the remainder of the string starting one position beyond the separator, and looping continues. After looping has finished, the last part of the string is used to generate the last row.

In a database that has a built-in scripting language like SQL Server or Oracle, this script would be saved as a stored procedure and invoked with one call to the database. With an external scripting language, like PHP with MySQL for example, control would be bouncing back and forth between the scripting engine and the database engine for each insert, so an external script, while perhaps just as easy to write, is substantially less efficient, although still better than doing it manually.

But is there a straight SQL solution, one that does not involve scripting? If there is, it would have to involve a cross join of some sort, since it must be able to generate multiple rows for each row that contains at least one separator. It will more than likely involve a cross join with an integers table, so that if, for example, there are four separators in a given column value, then five rows would be generated, probably by the cross join with integers 0 through 4.

That said, I have seen an SQL solution, one written by Joe Celko. It was truly complex, involving CAST, MAX, COUNT, SUBSTRING, and DATALENGTH functions, a double cross join with two copies of the integers table, and a GROUP BY clause. Not for the faint of heart. My advice is to use a script, or even do it by hand. After all, you don't normalize on the fly all the time, right?


This was last published in November 2002

Dig Deeper on Oracle and SQL

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide.com

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close