Problem solve Get help with specific problems with your technologies, process and projects.

Fixing duplicate primary keys

I created several tables and inserted some test data; then I found out about sequences and primary keys. I created a sequence for each table, and now I'm creating unique indexes for all my primary keys, but some tables have duplicate values. Is there an easy way to correct this? When there are two rows with the same primary key, I don't mind deleting one (either one: it's all fake data) but I don't want to delete them both, because the next thing I'm going to work on is foreign keys.

The main problem here is telling one row from another when you can't count on any of the data being distinctive. If two rows are completely identical, how can you write a WHERE clause to DELETE one and not the other? The trick is to use the pseudo-column ROWID. ROWIDs are useful because they have these properties:

  • Every row in every table has a ROWID
  • They're all unique
  • They're guaranteed not to change until the row is DELETEd
  • Comparison operators (like <) work on them

ROWIDs are based on where the row is physically stored, so they have other fascinating properties that aren't important for us now.

This script UPDATEs all but one row (the one with the lowest ROWID) using the new sequence you created. If you'd really prefer to DELETE the duplicates, the basic idea is the same. This script also creates a unique index, so the problem won't recur.

--  Fix_Duplicate.sql - change duplicate primary keys and
--     create unique index

DEFINE  table_name    = &1
DEFINE  column_name   = &2
DEFINE  sequence_name = &3

PROMPT  =====  Fixing duplicates in table &table_name  =====

UPDATE  &table_name    main
SET     &column_name = &sequence_name..NEXTVAL
        SELECT  &column_name
        FROM    &table_name
        WHERE   &column_name = main.&column_name   AND
                ROWID        < main.ROWID

PROMPT  =====  Creating index on column &column_name  =====

CREATE UNIQUE INDEX  &table_name._pk_idx
    ON               &table_name (&column_name)
    TABLESPACE       indx;
Run this script from SQL*Plus once for every table. For example, if the table is called org, and the column org_id should be the primary key, taking its values from the sequence org_id_seq, you can say
@fix_duplicate  org  org_id  org_id_seq

Dig Deeper on Using Oracle PL-SQL

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.