Monday, October 13, 2008

Deleting duplicate rows with unique key from MySQL

A requirement just popped up for me, where I have to delete rows with duplicate "MainUrl" but these rows have unique ID, generated as an auto-increment column. Thought, it would be a straight-forward thing, but not until I actually sat down writing the query. Finally, after some lookup and some meditation, I ended up with the following query.
DELETE FROM    CrawledUrl
WHERE ID NOT IN
(
SELECT    MIN(ID)
 
FROM  CrawledUrl
  GROUP BY   MainUrl)
  AND
IsProcessed <> 'True'

Well, now that I have tested it in SQL Server, it worked and I was happy. So I decided to try it out on my production database (I now call my thesis database as production database, sounds professional). So, I ran the query  and I ended up with this error:
ERROR 1093 (HY000): You can't specify target table 'CrawledUrl' for update in FROM clause
So after doing a little more meditation, I came to know that MySQL right now does not support both updates and select on the same table.
So, the fix? The problem, like I mentioned is to work on the same table which MySQL does not allow. So we need a temporary table - create one or let MySQL create one.

So when does MySQL create a temporary database? It does when you create an alias in the query saying (select * from myTable) as mT. Now mT is a temporary table and no longer the same myTable. So pretty neat huh!

Finally, the changed query was:
   1: delete FROM thesis.CrawledUrl


   2:     WHERE  ID NOT IN


   3:     (


   4:     SELECT MIN(ID)


   5:         FROM  (select Id,MainUrl from thesis.CrawledUrl) as x


   6:         GROUP BY  MainUrl)


   7:     and IsProcessed = 'False'


No comments: