What it is, Why it exists, How to find it, and Its inherent ambiguity (2024)

Finding Information on the Internet: A Tutorial
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

Invisible or Deep Web:
What it is, How to find it, and Its inherent ambiguity
UC Berkeley - Teaching Library Internet Workshops
About This Tutorial | Table of Contents | Contact us

What is the "Invisible Web", a.k.a. the "Deep Web"?

The "visible web" is what you can find using general web search engines. It's also what you see in almost all subject directories. The "invisible web" is what you cannot find using these types of tools.

The first version of this web page was written in 2000, when this topic was new and baffling to many web searchers. Since then, search engines' crawlers and indexing programs have overcome many of the technical barriers that made it impossible for them to find "invisible" web pages.

These types of pages used to be invisible but can now be found in most search engine results:

  • Pages in non-HTML formats (pdf, Word, Excel, PowerPoint), now converted into HTML.
  • Script-based pages, whose URLs contain a ? or other script coding.
  • Pages generated dynamically by other types of database software (e.g., Active Server Pages, Cold Fusion). These can be indexed if there is a stable URL somewhere that search engine crawlers can find.

Why isn't everything visible?

Thereare still some hurdles search engine crawlers cannot leap. Here aresome examples of material that remains hidden from general searchengines:

  • The Contents of Searchable Databases.When you search in a library catalog, article database, statistical database,etc., the results are generated "on the fly" in answer to your search. Because the crawler programs cannottype or think, they cannot enter passwords on a login screen orkeywords in a search box. Thus, these databases must be searchedseparately.
    • A special case: Google Scholaris part of the public or visible web. It contains citations to journalarticles and other publications, with links to publishers or othersources where one can try to access the full text of the items. This isconvenient, but results in Google Scholar are only a small fraction ofall the scholarly publications that exist online. Much more - includingmost of the full text - is available through article databases that arepart of the invisible web.

  • Excluded Pages. Search engine companies exclude some types of pages by policy, to avoid cluttering their databases with unwanted content.
    • Dynamically generated pages of little value beyond single use.Think of the billions of possible web pages generated by searches forbooks in library catalogs, public-record databases, etc. Each of theseis created in response to a specific need. Search engines do not wantall these pages in their web databases, since they generally are not ofbroad interest.

    • Pages deliberately excluded by their owners.A web page creator who does not want his/her page showing up in searchengines can insert special "meta tags" that will not display on thescreen, but will cause most search engines' crawlers to avoid the page.

How to Find the Invisible Web

Simply think "databases" and keep your eyes open. You can find searchable databases containing invisible web pages in the course of routine searching in most general web directories. Of particular value in academic research are:

Use Google and other search engines to locate searchable databases by searching a subject term and the word "database". If the database uses the word database in its own pages, you are likely to find it in Google. The word "database" is also useful in searching a topic in the Google Directory or the Yahoo! directory, because they sometimes use the term to describe searchable databases in their listings.

Examples:
plane crash database
languages database
toxic chemicals database

Remember that the Invisible Web exists. In addition to what you find in search engine results (including Google Scholar) and most web directories, there are other gold mines you have to search directly. This includes all of the licensed article, magazine, reference, news archives, and other research resources that libraries and some industries buy for those authorized to use them.

As part of your web search strategy, spend a little time looking for databases in your field or topic of study or research. The contents of these may not be freely available: libraries and corporations buy the rights for their authorized users to view the contents. If they appear free, it's because you are somehow authorized to search and read the contents (library card holder, company employee, etc.).

The Ambiguity Inherent in the Invisible Web:

It is very difficult to predict what sites or kinds of sites or portions of sites will or won't be part of the Invisible Web. There are several factors involved:

    • Which sites replicate some of their content in static pages (hybrid of visible and invisible in some combination)?
    • Which replicate it all (visible in search engines if you construct a search matching terms in the page)?
    • Which databases replicate none of their dynamically generated pages in links and must be searched directly (totally invisible)?
    • Search engines can change their policies on what they exclude and include.

Want to learn more about the Invisible Web?

Quick Links
Search Engines |Subject Directories | Meta-Search Engines | Invisible Web


Copyright (C) 2010 by the Regents of the University of California. All rights reserved.
Last update 01/08/10

What it is, Why it exists, How to find it, and Its inherent ambiguity (2024)
Top Articles
Latest Posts
Article information

Author: Merrill Bechtelar CPA

Last Updated:

Views: 5958

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Merrill Bechtelar CPA

Birthday: 1996-05-19

Address: Apt. 114 873 White Lodge, Libbyfurt, CA 93006

Phone: +5983010455207

Job: Legacy Representative

Hobby: Blacksmithing, Urban exploration, Sudoku, Slacklining, Creative writing, Community, Letterboxing

Introduction: My name is Merrill Bechtelar CPA, I am a clean, agreeable, glorious, magnificent, witty, enchanting, comfortable person who loves writing and wants to share my knowledge and understanding with you.