Announcing Google Refine 2.0, a power tool for data wranglers
Wednesday, November 10, 2010 | 2:05 PM
Labels: data sets, freebase, Metaweb
Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.
Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.
Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:
http://www.youtube.com/watch?v=yNccGtn3Wb0 (7 min)
http://www.youtube.com/watch?v=45EnWK-fE9k (9 min)
http://www.youtube.com/watch?v=m5ER2qRH1OQ (6 min)

23 comments:
Travis Harshaw said...
Completely awesome.
November 10, 2010 5:35 PM
TJGodel said...
Awesome tool!!
November 10, 2010 6:22 PM
ollerac said...
This is awesome. You guys are awesome.
November 10, 2010 7:49 PM
Yoav said...
w00t!
November 11, 2010 5:35 AM
Onegreen said...
Being an IT architect and having worked on EII/ETL technologies like Denodo, Composite, Ab Initio, Informatica and others, I have to say you guys have delivered a really powerful open-source simple data refining tool for public use. Now its up to the public and open-source community to really make sense out of this tool. Thanks and keep it up.
November 11, 2010 6:51 AM
GR said...
Impressive! If only I had this all those times I needed it.
November 11, 2010 6:54 AM
Thad said...
Thanks Onegreen, being compared to those guys is a true honor.
November 11, 2010 7:58 AM
Britt said...
Hi David, Nice to find you at Google! You'll recall we are big fans of timeline at MIT.
Best regards,
Britt Blaser
November 11, 2010 9:15 AM
William Lindner said...
This looks awesome! Is it possible to export it to an Oracle table?
November 11, 2010 11:19 AM
WatchSteveDrum said...
Awesome. Totally awesome. And open source!
You guys just love putting stuff in the browser though, don't you?
...That's ok. I love having stuff in the browser.
November 11, 2010 12:54 PM
Raag said...
This is extremely useful and awesome.
November 11, 2010 1:42 PM
Alex said...
Nice demo. Does anybody know of datasource similar to the Google Geocoding for Census data? Like if I wanted to augment zip codes with that data?
thanks
November 11, 2010 2:49 PM
Luko said...
Incredible.
November 11, 2010 3:57 PM
amit said...
Tool is totally awesome but with limited usage. Excel can already handle all the refining but it is not as simple.
Don't see a great future unless it is able to handle large volumes of data, have automation capabilities and also able to build charts and stuff. In other terms Microsoft should learn something here and make excel much simpler.
November 12, 2010 3:48 AM
Ashutosh Kumar said...
Awesome Tool... It'll be very useful in data analytics
November 12, 2010 10:23 AM
id said...
it would be nice google have also a reconciliation api ... ;-)
November 13, 2010 7:46 AM
Moonglare said...
The content of third video is pure gold. I will surely use this.
November 13, 2010 2:14 PM
Samar said...
It seems like such a simpe concept; but alas it takes Google to provide it. Thanks
November 14, 2010 7:53 PM
Patricia said...
wow....awesome videos.
Thanks for sharing.
December 8, 2010 3:03 AM
Christian said...
Very COOL!
December 9, 2010 1:48 PM
Guo said...
Man I just spent 4 hours cleaning data and NOW you tell me Google Refine can sort it out in no time?
January 29, 2011 6:08 AM
Jim said...
No new info or updates in over a year? Some more documentation would be great for us non-programmers.
December 8, 2011 6:30 PM
Lighton Phiri said...
You just saved me the hours I would have had to put in revising sed and awk. My boss recommended it to me and I haven't looked back since.
January 9, 2012 8:33 AM
Post a Comment