Trusted Search Sites
When doing your research finding web sites with a search engine like Google or Yahoo or Wikipedia leads to the question of which results can you trust and actually use in your paper. Here are two short articles from Librarians who have the same problems:
January 2006
How Does Google Determine Which Web Sites Are the Most "Trusted"?
In the debut issue of the Google Librarian newsletter, we published an article by quality engineer Matt Cutts explaining how Google collects and ranks search results. The most common question we heard in response was "How does Google determine which web sites are the most 'trusted'?" Here, his reply:
This question goes to the heart of what we do. You already know the short answer: Google uses more than 100 different factors, including the PageRank algorithm, to determine whether a site is trusted or reputable. If you think of the internet as a democracy, a web page that links to another page is "voting" for the value of the page. As we explain in our Technology Overview, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. But that's not the end of the story. If Page A itself has more votes from other pages, the vote carries more weight. Or to put it another way, if more people trust your site, your trust is more valuable.
In addition to using the PageRank algorithm, we automatically analyze the content of pages we crawl. This goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags. We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
The long answer is more complicated. Since how we determine search results is the core of our business, there are some ingredients in our "special sauce" that we can't share. In addition, it goes without saying that we're on constant guard against people exploiting the information to achieve artificially high placement in our search results. At the same time, Google was born in a university research environment, and there is a large and growing body of academic work exploring and analyzing our technology. That includes the grand-daddy of them all, The PageRank Citation Ranking: Bringing Order to the Web, the original Stanford University paper by Larry Page, Sergey Brin, Rajeev Motwani and Terry Winograd. If you'd like to take a look, Google Scholar is a good place to start (especially if you click on the citations as well as the papers themselves).
Finally, you might also want to check out this link, which takes you to a collection of technology papers written by people now at Google. It contains oldies-but-goodies like the Stanford paper on PageRank, but also brand new research about everything from algorithms to artificial intelligence. Enjoy!
January 2006
Beyond Algorithms: A Librarian's Guide to Finding Web Sites You Can Trust
Okay, so your favorite search engine has turned up thousands of web sites on the topic of your choosing. Which ones should you trust?
As a librarian who runs a web site catering to people with a hunger for authoritative resources, I'm often asked that question. As a result, my colleagues and I have developed a five-point system for separating the wheat from the chaff. While we pride ourselves on our small but well-groomed collection of reliable, trustworthy, librarian-selected web sites, there's really no magic to what we do. It's simple methodology you can use at the reference desk or any other place you find yourself staring at a page of search results and wondering where to begin.
Whether we're selecting new web sites for our newsletter or deciding whether to toss or keep sites already in our collection, we rely primarily on what we call the "big five show-stoppers": availability, credibility, authorship, external links and legality.
1. Availability
Is the site up and running? Is the information freely available?
The first question can be answered fairly easily — either the web site is there, or it's not — but the second question is more complicated. Many web sites put information behind walls of one sort or another. It may be worth it for you to pay a fee or register to gain access to a web site, but at the Librarians' Internet Index (LII), we pass along only freely available sites because our working assumption is that when you're hunting for information, either for yourself or for a library patron, free access is good. In addition, we are leery of sites that require registration to view most or all of the site, since it's often unclear how your personal information will be used.
Of course, this isn't a hard-and-fast rule. We don't reject a site just because it has some information behind a wall. For instance, Open WorldCat is a terrific database for locating books in libraries worldwide, and if the book is available for purchase, you'll find at least one link to an online store. There's no harm in that (except to bookaholics like us, where it's dangerous to our pocketbooks). But if most or all of WorldCat site was fee-based, it wouldn't be very useful to anyone who isn't a subscriber.
Shortcut: To determine if information "behind the wall" is worth your time and/or money, skim the web site's mission statement, "About" page, or registration sign-up page. For example, the Ellis Island Foundation makes it clear that by registering for free, you'll be able to take full advantage of the site's functionality.
2. Credibility
Does the web site contribute current, accurate information? Is the site author(s) qualified to present the content provided?
In reviewing the sites we've rejected for LII in the past six months, we found that the majority had credibility problems. Either the content was clearly substandard (including, for instance, recipes that misstated quantities, or definitions we knew to be wrong) or the author lacked the credentials to present the content on the site.
We don't rule out personal web sites, but we scrutinize them carefully. Sometimes we select sites maintained by hobbyists when the content is fun or recreational, such as Patently Absurd, a web site featuring weird patents. Sometimes we select sites when we can use our own subject knowledge to assess the content, as we did when we chose the yummy web site, Tiramisu: Heaven In Your Mouth. But personal web sites aren't always what they seem, and we wouldn't want anyone following health advice from a quack, or using a knitting pattern that results in the proverbial sweater with three arms.
We're always surprised when potentially good web sites don't provide information about the author's credentials right up front. If we aren't sure about a site, we write the author. If they don't respond, or we're not convinced of their credibility when they answer, we reject the site.
Shortcut: Look for an "About" page or an author biography.
Shortcut: There are some sources that you can nearly always trust. Many librarians busy helping patrons at the desk, over the phone, or in instant messaging sessions use Google searches limited to the .edu or .gov domains to quickly winnow the search to sites known to be authoritative. For example, a Google search for "breast cancer site:gov" will yield high-quality web sites.
3. Authorship
At LII we're very skeptical of web sites with more than a couple of typographical or grammatical errors. In addition to how poorly it would reflect on us to point someone to a grammatically challenged web site, it's a big hint that the content on the site is generally not up to snuff.
We do make some exceptions for web sites translated from languages other than English, if we can find someone to verify that the content in the original language has correct spelling and grammar. The English is a little rocky on the lovely web site, Paris at the Time of Philippe Auguste, but we have it on good word that the original French is très bien.
Shortcut: If you think a web site has more than the average number of typos, copy a representative page and dump it into a Word document for a spell-check.
4. External Links
Nothing kills a web site's reputation faster than broken links leading elsewhere.
Broken links are a flag that the author is not paying attention to the content. Give web sites some latitude, though; there was a time when one broken link among many would cause us to reject a web site, but it's more common nowadays for people to move content to another URL, making it difficult for even the most fastidious webmasters to keep up. If you spot a broken link on a site you like and use, let the webmaster know; we appreciate these tips, and so do people at the sites we communicate with. But if you see many broken links rather than just a few, that's a cue to pass the site by.
Shortcut: Look for evidence that the web site maintains its links, such as notes indicating when a page was last updated, and beware of student project web sites and personal web pages with many, many links!
5. Legality
The author of a legitimate web site will ensure that she is legally entitled to publish the content on her site, working within copyright and fair use guidelines.
It's common to hear the author of a web site claim she is engaging in "fair use." Sometimes this is a reasonable argument, such as when an author uses examples of an artist's work in order to discuss it. Sometimes it's just a smokescreen — an excuse to justify posting someone else's work.
Shortcut: It's a lot easier to assess whether a web site complies with copyright law when you're familiar with its basic principles. Brad Templeton's guide to common copyright myths is a good primer.
Shortcut: Trust your instincts. If a web site looks and feels like a rip-off, it probably is. Take a chunk of its text and paste it into Google to see if it shows up elsewhere.
Shortcut: Avoid fan sites, lyric sites, paper mills, and any site posting newspaper or magazine articles (the full articles, not quotes or links) without also posting explicit permission statements.
So there you have it — the big five show-stoppers. Of course, once a web site makes it past the first cut, there are more finely grained heuristics for gauging authority. But you'll have what you need to be sure it's worth your time to dig deeper: a site you can trust.
Digging Deeper: A Few More Questions to Consider
1 Does the author provide sources for information?
2 If the site provides opinion rather than facts, are these opinions clearly identifiable as such?
3 Who are the audiences for this site? Is the site appropriate for the intended audiences?
4 Does the point of view provide balance to the information seeker?
5 How does the site compare with other sites on the same subject?
Karen G. Schneider, a librarian and writer, is the Director of Librarians' Internet Index (LII). Her personal blog is Free Range Librarian. She freelances for the library press, most recently at the ALA TechSource blog.

0 Comments:
Post a Comment
<< Home