![]() Google’s Deep Web search strategy involves sending out a program to analyze the contents of every database it encounters. “This is the most interesting data integration problem imaginable,” says Alon Halevy, a former computer science professor at the University of Washington who is now leading a team at Google that is trying to solve the Deep Web conundrum. That approach may sound straightforward in theory, but in practice the vast variety of database structures and possible search terms poses a thorny computational challenge. With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine - no matter how powerful - to sift through every possible combination of data on the fly. Rajaraman said, “but what we’re trying to do is help you explore the haystack.” “Most search engines try to help you find a needle in a haystack,” Mr. Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources. “The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (a Deep Web search start-up whose investors include Jeffrey P. ![]() While that approach works well for the pages that make up the surface Web, these programs have a harder time penetrating databases that are set up to respond to typed queries. ![]() Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |