look for duplicates using #mongodb #mapreduce

I recently had to look for dups inside our mongo instance... I wrote a simple script, which failed because our data is too big, so then I wrote a mapreduce version, which ran beautifully...

here is a sample of the data (you can re-create on your own)

results

and here are the simple and complex (mapreduce) versions of my script

 

 

Christian Sanz

Christian Sanz

Father, Husband, Coder, Serial Entrepreneur.

CTO & Co-founder @ Geekli.st

Former tech exec/CTO at Storify.com and Deca.tv. Former lead architect at break.com. Former software guy at Disney.com and Go.com/Infoseek

Archive

2012 (15)
2011 (94)
2010 (34)
Posterous theme by Cory Watilo