Learning the meaning behind words
While state-of-the-art technology is still a ways from this goal, we’re making significant progress using the latest machine learning and natural language processing techniques. Deep learning has markedly improved speech recognition and image classification. For example, we’ve shown that computers can learn to recognize cats (and many other objects) just by observing large amount of images, without being trained explicitly on what a cat looks like. Now we apply neural networks to understanding words by having them “read” vast quantities of text on the web. We’re scaling this approach to datasets thousands of times larger than what has been possible before, and we’ve seen a dramatic improvement of performance -- but we think it could be even better. To promote research on how machine learning can apply to natural language problems, we’re publishing an open source toolkit called word2vec that aims to learn the meaning behind words.
Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are. This chart shows how well it can learn the concept of capital cities, just by reading lots of news articles -- with no human supervision:
The model not only places similar countries next to each other, but also arranges their capital cities in parallel. The most interesting part is that we didn’t provide any supervised information before or during training. Many more patterns like this arise automatically in training.
This has a very broad range of potential applications: knowledge representation and extraction; machine translation; question answering; conversational systems; and many others. We’re open sourcing the code for computing these text representations efficiently (on even a single machine) so the research community can take these models further.
We hope this helps connect researchers on machine learning, artificial intelligence, and natural language so they can create amazing real-world applications.
By Tomas Mikolov, Ilya Sutskever, and Quoc Le, Google Knowledge
5 comments:
-
Very interesting stuff. I can see that this will help us use computers more effectively. However, when I saw the headline of the article, I was attracted, because I find myself frequently in situations where people don't get the meaning behind a word and was hoping to learn more about it. e.g. I have recently read an article in which it was very difficult for me to discern the writers use of myth. It was not clear whether he was referring to a traditional or legendary story or an invented idea or fantasy concept. Makes me wonder whether human communication is declining, while machines get smarter.
-
As a (former) interpreter, I know that there is a vast difference between translating using only "languages" and using the intermediate step of a representation of a reality. If anyone could, Google could build a machine (physical or theoretical) that could "learn" something about the meaning of words - but it needs to use the resources it has to first construct some kind of "consciousness" some kind of reality to use as an intermediate step.
That would change computing!
Paolo -
Paolo, that will not be possible in a near future. However, as the world in becoming ever more inscripted in data, in the end google will have access to real meaning, not only from words but everything you communicate and do.
The problem is that language are bind to humans and our experiences that machines (up to recently) had no access. Machines extract meaning from a different reality and communicate through other means.
We are reaching a point, however, where our world will be immersed (and duplicated) in a digital format (as a stream of data). That, machines will allow machines to understand us. -
It is a fallacy to assume that consciousness is nearly impossible to duplicate; after all, what are we but a bunch of neurons communicating with electricity? Moreover, if you try to pinpoint what consciousness (which we take to mean "subjective experience") really means and HOW it arises, there is not a single brain process that answers the question (emotions, higher thinking, self-awareness etc. still could be a PHILOSOPHICAL ZOMBIE). It leads to the only rational conclusion: Consciousness is a continuum, and literally everything in the world is at least somewhat conscious.
I think that given enough time, there will be no need explicitly create a "consciousness" to interpret all these words -- As soon as the word interpretation becomes sophisticated enough to become indistinguishable from a human interpreting it, it will for all purposes and concerns be "conscious". There is no magical dust in the brain that makes the brain conscious. -
I am working on imaging recognition software that seperate items in an image. I can use it
for speech recognition. It could be used to determine processes in the brain. It begins
as an handwriting recognition method. The method can be used for ai and many other
applications
Who do I contact in Google email me at vangie.allums@aol.com

