Announcing Google Refine 2.0, a power tool for data wranglers

Wednesday, November 10, 2010 | 2:05 PM

Labels: , ,


Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:

http://www.youtube.com/watch?v=yNccGtn3Wb0 (7 min)



http://www.youtube.com/watch?v=45EnWK-fE9k (9 min)



http://www.youtube.com/watch?v=m5ER2qRH1OQ (6 min)



The project is open source and its code and downloads are available here. Changes from version 1.1 to 2.0 are listed here.

23 comments:

Travis Harshaw said...

Completely awesome.

TJGodel said...

Awesome tool!!

ollerac said...

This is awesome. You guys are awesome.

Yoav said...

w00t!

Onegreen said...

Being an IT architect and having worked on EII/ETL technologies like Denodo, Composite, Ab Initio, Informatica and others, I have to say you guys have delivered a really powerful open-source simple data refining tool for public use. Now its up to the public and open-source community to really make sense out of this tool. Thanks and keep it up.

GR said...

Impressive! If only I had this all those times I needed it.

Thad said...

Thanks Onegreen, being compared to those guys is a true honor.

Britt said...

Hi David, Nice to find you at Google! You'll recall we are big fans of timeline at MIT.

Best regards,

Britt Blaser

William Lindner said...

This looks awesome! Is it possible to export it to an Oracle table?

WatchSteveDrum said...

Awesome. Totally awesome. And open source!

You guys just love putting stuff in the browser though, don't you?
...That's ok. I love having stuff in the browser.

Raag said...

This is extremely useful and awesome.

Alex said...

Nice demo. Does anybody know of datasource similar to the Google Geocoding for Census data? Like if I wanted to augment zip codes with that data?

thanks

Luko said...

Incredible.

amit said...

Tool is totally awesome but with limited usage. Excel can already handle all the refining but it is not as simple.
Don't see a great future unless it is able to handle large volumes of data, have automation capabilities and also able to build charts and stuff. In other terms Microsoft should learn something here and make excel much simpler.

Ashutosh Kumar said...

Awesome Tool... It'll be very useful in data analytics

id said...

it would be nice google have also a reconciliation api ... ;-)

Moonglare said...

The content of third video is pure gold. I will surely use this.

Samar said...

It seems like such a simpe concept; but alas it takes Google to provide it. Thanks

Patricia said...

wow....awesome videos.
Thanks for sharing.

Christian said...

Very COOL!

Guo said...

Man I just spent 4 hours cleaning data and NOW you tell me Google Refine can sort it out in no time?

Jim said...

No new info or updates in over a year? Some more documentation would be great for us non-programmers.

Lighton Phiri said...

You just saved me the hours I would have had to put in revising sed and awk. My boss recommended it to me and I haven't looked back since.