Open Source Blog
News about Google's open source student programs and software releases
Learning the meaning behind words
Wednesday, August 14, 2013
Today computers aren't very good at understanding human language, and that forces people to do a lot of the heavy lifting—for example, speaking "searchese" to find information online, or slogging through lengthy forms to book a trip. Computers should understand natural language better, so people can interact with them more easily and get on with the interesting parts of life.
While state-of-the-art technology is still a ways from this goal, we’re making significant progress using the latest machine learning and natural language processing techniques.
Deep learning
has markedly improved speech recognition and image classification. For example, we’ve shown that computers can learn to
recognize cats
(and many other objects) just by observing large amount of images, without being trained explicitly on what a cat looks like. Now we apply neural networks to understanding words by having them “read” vast quantities of text on the web. We’re scaling this approach to datasets thousands of times larger than what has been possible before, and we’ve seen a
dramatic improvement
of performance -- but we think it could be even better. To promote research on how machine learning can apply to natural language problems, we’re publishing an open source toolkit called
word2vec
that aims to learn the meaning behind words.
Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are. This chart shows how well it can learn the concept of capital cities, just by reading lots of news articles -- with no human supervision:
The model not only places similar countries next to each other, but also arranges their capital cities in parallel. The most interesting part is that we didn’t provide any supervised information before or during training. Many more patterns like this arise automatically in training.
This has a very broad range of potential applications: knowledge representation and extraction; machine translation; question answering; conversational systems; and many others. We’re
open sourcing the code
for computing these text representations efficiently (on even a single machine) so the research community can take these models further.
We hope this helps connect researchers on machine learning, artificial intelligence, and natural language so they can create amazing real-world applications.
By Tomas Mikolov, Ilya Sutskever, and Quoc Le, Google Knowledge
Labels
gsoc
418
releases
178
conference
90
gci
79
ghop
55
meetups
49
Linux
28
GSoC Meetups
25
Python
25
project hosting
24
students
23
hackathon
21
App Engine
16
C++
16
open source release
15
Git
14
OSCON
14
library
13
Eclipse
12
JavaScript
12
KDE
12
games
12
GNOME
11
testing
11
Android
10
R
10
BSD
9
Java
9
accessibility
9
education
9
security
9
Chrome
8
Go
8
HTML5
8
Subversion
8
awards
8
Chromium
7
GSoC 10 Things
7
Google Earth
7
Selenium
7
database
7
licensing
7
maps
7
usability
7
Django
6
Google I/O
6
Samba
6
contest
6
documentation
6
student programs
6
Free Software Foundation
5
GCC
5
Gerrit
5
Google Cloud Platform
5
events
5
fonts
5
government
5
machine learning
5
science
5
standards
5
Creative Commons
4
Dart
4
GNU
4
GitHub
4
Haskell
4
Perl
4
mobile
4
protocol buffers
4
season of usability
4
statistics
4
webdriver
4
BioJS
3
C
3
CSS
3
Google Compute Engine
3
JSON
3
Mercurial
3
PHP
3
Unicode
3
fun propulsion lab
3
internationalization
3
mentors
3
patents
3
profiles
3
translation
3
Haiku
2
Kubernetes
2
Objective-C
2
deep learning
2
hardware
2
ios
2
k8s
2
metabrainz
2
open data
2
research
2
time zones
2
ADC
1
BigQuery
1
FOSSASIA
1
Neural Networks
1
OpenMRS
1
Ruby
1
SCoRe
1
Science Journal
1
algorithms
1
artificial intelligence
1
audio
1
bazel
1
big data
1
cardboard
1
clojure
1
compression
1
debugging
1
gcloud
1
gmail
1
language
1
lisp
1
logo
1
making
1
melange
1
musicbrainz
1
natural language
1
nmap
1
performance
1
sugar labs
1
ui automation
1
virtual reality
1
webvr
1
zopfli
1
Archive
2016
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Feed
Google Summer of Code
on
Google Code-in
on
Follow @gsoc
Visit
Google Open Source Programs Office
for more information.