I have a CLOB column in my table which contains delimited 10 digit codes. Earlier, the application allowed duplicate codes in this column. But now when I add some more codes to the existing rows, I should remove the existing duplicates and the codes to be added should also not be part of existing codes in that row.
Should we clean the value first and then add the new codes with existence check, or it can be done parallely? When I append code one by one doing existence check, it takes more time for me. Could you please give me some clarity on this?
I think it might be faster to scan the CLOB for an occurrence of the 10 digit code before appending. You may have to load the CLOB into a variable first.
I can think of a few ways to scan the CLOB to see whether the new code exists already:
- Prepend and append the CLOB string with commas so you can search for ',1234567890,'
- Use a regular expression which allows you to treat comma, beginning-of-string and end-of-string
as delimiters, which simplifies the code to write
- Write a function to parse and load the current set of codes into a collection of number, then use SQL to scan the collection for a match.
If you find the code already exists, then you can skip updating that row entirely.
If you maintain the set of codes in ascending order, then perhaps whatever scanning method you use can be short-circuited to end when the value you’re checking is lower than the one you wish to add.
Have a question for Dan Clamage? Send an e-mail to firstname.lastname@example.org
This was first published in February 2010