Search better with “” and -’s ; and setting up OpenDNS

August 27th, 2008

This Sunday in the Missoulian:

photo

I’ll explain how to search better with Google and other search engines by using quotes around two or more words and phrases in order to find those exact matches, and by using a “-” in front of a word you don’t want to be included in the search. And offer other search details, too.

And, more about OpenDNS. If you change your DNS server setting to OpenDNS to guard against DNS poisoning attacks (which are still happening, even though they are less widespread that originally thought after the discovery of the DNS flaw last month), you only have to go through the first step of the instructions at setting up OpenDNS. They want you to register, but you don’t have to. So complete step one and when you get the step 2 or other steps where OpenDNS wants you to register, you can close your network control panels and reboot if necessary.

Email   Front Page   Printable Post   Share/Bookmark

    How Search Engines build indexes

    August 24th, 2008

    My 8/24/08 Missoulian column: Complex indexes answer queries

    Internet search engines such as Google, Yahoo! and MSN help us find news, blogs, people, hiking boots, definitions of words and collectible widgets. Even hackers use search engines to find vulnerable Web sites. The term “search” - as both a noun and a verb - has become ubiquitous, and “Google” has entered the lexicon as a generic term like Kleenex and Xerox.

    Search is invaluable to users like you and me, but it’s also big business. Search engines make money by selling clickable advertisements and by drawing eyeballs to their “portals,” Web pages that also provide links to other services. Some portals are so crammed with news and personals that search seems to be an afterthought.

    photo

    But how do search engines work? It isn’t magic, but requires a lot of money, hardware and software, as well as very high-capacity “pipes” to the Internet.

    The basic concept behind search is to create an index of what’s available on the Web. Search engines build and maintain extremely large and complex computer networks to catalog the Internet and respond to your queries any time of day or night. Indexes are based on words in Web sites and documents, file names, etc. - wherever a particular word exists online, regardless of context, it will end up in an index.

    Google supposably has around three-quarters of a million to a million personal computers spread around the world in data centers, arranged for speedy access and redundancy in case undersea cables are cut or satellites go dark. Google uses fast, desktop-style PCs because they’re inexpensive to install and easy to change out when they die.

    When you search for something, Google doesn’t send a query out right then to find what you’re looking for. It refers to its vast pre-built index of the Internet. The job of indexing the Internet is done by “bots,” automatic software critters that traverse the Web “reading” sites. The bots are called “spiders” - a play on the term World Wide Web - and they work 24/7, collecting information as fast as they can to compete for market share with other search engines, as the best results will bring a user back to the same site.

    Spiders can work reasonably slow, too. The growth of the number of Web sites is such that even Google, with all its vast computing power, sometimes can only visit a Web site once a month or less. According to Internet services company Netcraft, there were more than 175 million Web servers online in June, with a growth rate of about 3 million a month, so the number of Web sites is actually more.

    Some early search engines - like the old Yahoo! - operated on indexes compiled by people rather than spiders. While there are still some human-generated indexes around, modern Internet users want access to everything - Web sites, images, documents, databases - so a search engine must constantly update by spider to keep up.

    These days, Google is the search leader, accounting for nearly 70 percent of search traffic. Google has a leg up on Yahoo! and other search engines - even a one called Cuil (pronounced “cool”), which was supposed to be a “Google killer” but failed miserably - because of its search algorithm, the complex method by which its index is developed.

    The algorithm is a corporate secret, except for the way search results are ranked, called PageRank: Sites at the top of the list are there because other sites are linked to them, the idea being that a Web site with other sites pointing to it should be more important and relevant than a loner.

    Let’s say there are two Web sites that have similar content, one called Gadget Montana and the other Made in Montana Gadgets. If you manage Gadget Montana and have other quality sites link it - such as the National Gadget Association and International Gadgets - or have blogs or news stories mention it and your competitor doesn’t, there’s a good chance Google will rank your site higher after its spiders see the difference in links. Now, “a good chance” is the key phrase, as other factors come in to play. The best way to rank high in Web search is to have high-quality, original content that is rich with links. There once were ways to game how high your site ranked, but most don’t work anymore.

    Unless you’re savvy enough with technology to know how to block them, nothing much escapes a spider. Most are “nice” in that they will respect the two primary ways you can control their crawl of your site, through “robots.txt” files or “.htaccess” files (that leading dot is important). Some spiders aren’t nice and will grab your content for what I call “bottom feeder” sites, which use excerpted text and images to promote clickable ads and make money.

    photo

    The next big step in search engine technology is called “semantic search.” For the user that means getting more relevant results because the search engine understands the context of the word, not just its existence on a Web site.

    With semantic search, a search for “Montana Gadgets” would bring back results for gadget manufacturers and dealers in Montana, and not results for tween singer Hannah Montana and the gadgets she sells, regardless of how popular she might be. For more reading on this subject and when it might become a reality, try Googling “semantic search.”

    Email   Front Page   Printable Post   Share/Bookmark

      This Sunday in the Missoulian; More DNS news

      August 22nd, 2008

      This Sunday in the Missoulian I’m going to look into search engines a bit deeper: how they work and why some of the details of how they work are secret; how the search engine “bot” looks at a web site; and more.

      The DNS flaw story continues:

      The recent DNS fix that was developed in (mostly) secret and has exploitable flaws, according to the 8/08/08 New York Times:

      Faced with the discovery of a serious flaw in the Internet’s workings, computer network administrators around the world have been rushing to fix their systems with a cobbled-together patch. Now it appears that the patch has some gaping holes. On Friday, a Russian physicist demonstrated that the emergency fix to the basic Internet address system, known as the Domain Name System, is vulnerable and will almost certainly be exploited by criminals.

      Dan Kaminisky released some details of the flaw and fix the same week at the annual Blackhat conference in Las Vegas.

      According to an engineer of the original DNS software,

      “We’ve bought some time,” said Paul Mockapetris, chairman of Nominum, a firm that makes a version of the D.N.S. software that is not vulnerable to the current flaw. Mr. Mockapetris described the patch that is now being put in place as the equivalent of “playing Russian roulette with a gun that has 100 bullet chambers instead of six… The point,” he said, “should be to take the gun out of people’s hands.”

      And the National Telecommunications and Information Administration is accused of moving too slowly to help fix the system.

      Email   Front Page   Printable Post   Share/Bookmark

        Earlier »