Let's say I was to search Google for the word Jaguar:
There's an immediate problem. The semantics of Jaguar only exist in my head, not in any search box. Did I mean...
Whichever it is, Google is displaying a lot of search results that are totally irrelevant to me. Sure, I could type in more words, but that's at odds with the Google philosophy of simplicity. A single word should get me what I want.
Now compare the same search for Jaguar on eBay:
Although I get the same poor results initially, I can indicate which kind of Jaguar I really meant with an additional click on the categories on the left side of the page. This immediately filters the search results to something relevant with almost no effort on my part.
Search dominates the web now, and for good reason. My apologies to Yet Another Hierarchically Organized Oracle and the Open Directory Project, but rigid hierarchy is evil. However, a rigid hierarchy is tremendously powerful as a semantic-narrowing filter on search results.
In a brave new Google world of "I'll just type in what I want and hit Enter" search, there may still be room for some quasi-evil hierarchy in there somewhere. For example, if Google is going to suggest "Did you mean.." corrections when I misspell a search term, why don't they do the same thing to disambiguate semantics?
Unlike the rigid, manual categorizations of eBay and DMOZ, you could probably automate this kind of semantic suggestion engine using Markov chain probabilities on existing web pages.
Google has no metadata about content except the stuff that goes into the PageRank system, and they're already using that. If I understand correctly, any after-the-fact categorization in any search engine is done with human intervention
Certain words are much more likely to follow other words in text:
http://www.cs.bell-labs.com/cm/cs/pearls/sec153.html
Eg, if the words "Jaguar" and "cat" are found together (or nearby) many times more often than, say, "Jaguar" and "can-opener"..
This is also the basis of many excellent spam filters, so it's already known to be eminently automatable:
http://www.codinghorror.com/blog/archives/000423.html
That said, the "human touch" could still be relevant for tweaking results on common queries. And why not? If you took the top 100 queries (probably all sex related, but humor me) and hand-optimized the results using information science experts, is that a bad thing?
Jeff Atwood on November 16, 2005 2:56 AMGoogle did have something like this in the labs section, don't know what ever happen to it. IIRC as you typed in information it used AJAX to generate a popup list of available topics based on what you had typed so far.
---------
Read this somewhere...
The google "Did you mean.." does not do a spell checking instead they do a check of what users searched for right after your current search and if a large number of people entered the same thing right after that is what is presented.
will dieterich on November 16, 2005 5:12 AMYour example of word proximity analysis does not necessarily solve the problem, does it? You still want to be able to type "Jaguar" and have Google know you mean cat, not car. There's no guarantee that even with proximity analysis the results would be any different than what you're getting now. Or am I missing something here?
Hand-optimization, IMO, is a "bad thing" inasmuch as it essentially breaks the Google model. Some queries are "optimized," some are not; the algorithm used in one search is not the same as that used in others. Moreover, the human touch implies judgement by humans who _still_ might not see the world the way that people do who are actually doing the searches. And that's assuming that Google could even keep up with their Top 100 searches, which surely change by the second.
I stand by my statement that Google already has powerful syntactical tools that can help you find dang near anything, and that a two-word search will winnow your list by a huge percentage. Based on the Google searches that bring people to my blog (/BlogGoogleSearches.aspx), it seems that people frequently type MORE, not less, than they need. Perhaps that's a self-selected audience, but still.
mike on November 16, 2005 5:59 AMTry a href="http://clusty.com"http://clusty.com/a - it does what you need.
IainW on November 16, 2005 6:32 AMYour example of word proximity analysis does not necessarily solve the problem, does it?
I think it does, since that's how http://www.clusty.com appears to work..
still want to be able to type "Jaguar" and have Google know you mean cat, not car
Not quite: I want Google to give me a one-click method of refining my search, in exactly the same way they do with the existing "Did you mean.." feature.
Hand-optimization, IMO, is a "bad thing" inasmuch as it essentially breaks the Google model
The whole historical argument is that hand-built directories like DMOZ and Yahoo are obsolete. I agree, however that's considering them as an opposing poles of an either/or solution. When considered alone, search is the clear winner, but I don't think it has to be an simpleminded choice of one method or the other. They can be quite complementary when used together.
So, therefore, "hand optimization" (eg, categorization) can still be useful.
I'm not entirely sure we're talking about the same thing, though. You seem to be implying that somebody would go in and re-order search results, which isn't what I'm proposing at all. I propose exactly what is shown in my screenshots: showing the hierarchy as an optional aid to filtering your search.
Jeff Atwood on November 16, 2005 6:57 AMHave you looked at Vivisimo (www.vivisimo.com) or Clusty (www.clusty.com). They both do exactly what you're talking about, and even bring up results for the Jacksonville Jaguars (aka "Steeler Fodder").
Richard Dudley on November 16, 2005 8:42 AMCompare the search results from searching "half.com" on Google, versus the results on Yahoo. Yahoo pulls up results about the dinky township that changed their name to half.com, and searching for "half.com books" is required to get me to the site I want.
The site is technically www.half.ebay.com now, but still... get it together, Yahoo.
(Working on multiple machines not all under my direct control, Yahoo was the home page, yadda yadda. Besides, I learned something!)
Todd Derscheid on November 16, 2005 10:24 AM+1 for Clusty. I rarely use it except when I can't figure out the words I want to use to search on. Google would be way more useful with something like that, but I'd bet there are stupid patent issues.
Damien Katz on November 16, 2005 11:28 AMGoogle did have something like this in the labs section, don't know what ever happen to it. IIRC as you typed in information it used AJAX to generate a popup list of available topics based on what you had typed so far.
That is Google Suggest
http://www.google.com/webhp?complete=1hl=en
A variation of which is included in the latest Google Toolbar, at least the one in Firefox.
David Grant on November 16, 2005 11:41 AMHoly crap, Clusty *IS* exactly what I wanted. Why had I never heard of this until today? Is Google really so dominant that the mainstream doesn't publicize these great alternatives?
Jeff Atwood on November 16, 2005 12:45 PMAlso, I don't think Google suggest is quite the same thing. That's *popular* searches, not semantically related ones.
Jeff Atwood on November 16, 2005 12:47 PMI guess I'm in the minority here, but I don't actually see much of a problem. Given the a) incredible syntactical tools that Google allows you to construct your search with and b) the Google API that anyone is free to leverage and, say, add their own front end, it just isn't that hard to zero in on what you want. When I look something up and it isn't first or second in the list, I reflexively look at the number of hits, and it it's in the gajillions, I refine my search. Add one more word to your search (any word remotely connected with your topic) and you're golden in 99.999% of the cases. (Statistics (c) 2005 Mike Pope, any resemblance to real people or numbers strictly coincidental.)
Don't forget that all Google is really doing is showing you the results of a popularity contest for your term. Just because YOU didn't mean "Jaguar-the-mispronounced-car-name" doesn't mean others -- the majority of others, poor blokes -- were not searching for overpriced vehicles.
I'm also interested in how you think Google could actually implement categories. Google has no metadata about content except the stuff that goes into the PageRank system, and they're already using that. If I understand correctly, any after-the-fact categorization in any search engine is done with human intervention. (?)
Incidentally, there's a certain irony here in that Google's initial success was precisely that their ranking algorithm was spookily prescient about what you meant, as distinct from engines that weighted pages based on, say, word count. We sure have become spoiled ... :-)
mike on November 16, 2005 12:53 PMPS Google Suggest would in this case not help -- it wouldn't be until the second word that the search would be sufficiently refined ... other than that you'd know it before hitting I Feel Lucky, I guess.
mike on November 16, 2005 12:54 PMSome related posts:
Help Grandfather Google Improve Search With Wiki Directories
http://www.marketanomaly.com/?p=63
The answer you're searching for is "browse"
http://www.humanfactors.com/downloads/jan05.asp
Your example of word proximity analysis does not necessarily solve the problem, does it?
I think it does, since that's how http://www.clusty.com appears to work..
Well, I typed "Jaguar" into Clusty and the first hit was for the car. They pull up Wikipedia's (not their) page as a disambiguator -- is that the result of the proximity analysis?
mike on November 17, 2005 1:28 AMIt seems that Google does do what you ask, sometimes:
http://www.google.com/search?sourceid=navclientie=UTF-8rls=GGLG,GGLG:2005-31,GGLG:enq=%40%40identity
Notice a few hits down it says:
See results for: @@identity sql
So it kind of does what you're looking for. However, after reading these comments I'm going to check out Clusty...
Bryant Likes on November 17, 2005 4:29 AMThey pull up Wikipedia's (not their) page as a disambiguator -- is that the result of the proximity analysis?
Well, I'm referring to the category list on the left of the Clusty results. That pretty effectively mirrors what I see in the eBay screenshot.
Notice a few hits down it says: See results for: @@identity sql
Hmm, interesting, that is what I'm proposing. But in the zillions of Google searches I've performed, that's the very first time I've seen that behavior!
There must be some special consideration given to a technical search term like "@@identity"?
Jeff Atwood on November 17, 2005 6:12 AMFrom what I've been able to see (in 5 minutes of playing on Google), when you search for a word that is highly visible from one or two very different vocabularies (e.g. Tacoma or Basic) you can get those suggestions to come up. It'll be interesting to see if I see that type of behavior more often. This is a very interesting topic indeed.
Ryan on November 21, 2005 6:28 AMGoogle has people with Ph.D's that pick up trash and clean bathrooms why would they care that an idoit like you wants to be able to put in one word and find what you want. Oh no I'm a typical american that is too lazy to add a second word and I have a website to bitch on...go fuck yourself!
Fuck Jeff on November 19, 2007 7:46 AMGoogle has people with Ph.D's that pick up trash and clean bathrooms why would they care that an idoit like you wants to be able to put in one word and find what you want.
Maybe because making a good search engine is what Google is all about?
Now, it seems eBay have started to also second guess what we're looking for, but I guess they only have people with degrees cleaning their toilets since it doesn't work that well...
http://www.piku.org.uk/diary/2008/07/17/ebays-search-system-is-broken
James on July 17, 2008 8:21 AMHeck, lots of search engines, and meta search engines, these days offer either (usually) to narrow the phrase with related phrases, or (sometimes) clustering.
Ask jeeves, Gigablast, Wisenut, Exalead, Looksmart (who are silly enough to put it at the bottom of the page where you won't find it without scrolling), Altavista, and plenty of others.
The fact that Google doesn't have it now, doesn't mean nobody does, or even that it's a rare and unique feature.
Yaron on February 6, 2010 9:47 PMThe comments to this entry are closed.
|
|
Traffic Stats |