Musings of a Software Developer: Deleting duplicate rows with unique key from MySQL

A requirement just popped up for me, where I have to delete rows with duplicate "MainUrl" but these rows have unique ID, generated as an auto-increment column. Thought, it would be a straight-forward thing, but not until I actually sat down writing the query. Finally, after some lookup and some meditation, I ended up with the following query.
DELETE FROM    CrawledUrl
WHERE ID NOT IN
(SELECT    MIN(ID)
FROM CrawledUrl
GROUP BY   MainUrl)
AND IsProcessed <> 'True'

Well, now that I have tested it in SQL Server, it worked and I was happy. So I decided to try it out on my production database (I now call my thesis database as production database, sounds professional). So, I ran the query and I ended up with this error:
ERROR 1093 (HY000): You can't specify target table 'CrawledUrl' for update in FROM clause
So after doing a little more meditation, I came to know that MySQL right now does not support both updates and select on the same table.
So, the fix? The problem, like I mentioned is to work on the same table which MySQL does not allow. So we need a temporary table - create one or let MySQL create one.

So when does MySQL create a temporary database? It does when you create an alias in the query saying (select * from myTable) as mT. Now mT is a temporary table and no longer the same myTable. So pretty neat huh!

Finally, the changed query was:

        1: delete FROM thesis.CrawledUrl

       2:     WHERE  ID NOT IN

       3:     (

       4:     SELECT MIN(ID)

       5:         FROM  (select Id,MainUrl from thesis.CrawledUrl) as x

       6:         GROUP BY  MainUrl)

       7:     and IsProcessed = 'False'

Musings of a Software Developer

Monday, October 13, 2008

Deleting duplicate rows with unique key from MySQL

No comments: