Searching and finding information

Clicking through the pages of one of the large web sites (``portal sites'') one finds information about all aspects of practical life: the latest world news, the results of the baseball leagues or the horoscope - almost lost between lots of advertisement. To find exactly the information one is looking for, is much more difficult. In the following we will present some of the most important strategies.

$\triangleright$
Often one can find the homepage of a company or an organisation simply by guessing.

$\diamond$
Pattern Group Example
www.MYCOMP.com companies (often in the U.S.) www.compaq.com
www.MYCOMP.de german companies (sometimes under .com) www.siemens.de
www.NAME.org all kinds of organisations www.linux.org
www.uni-CITY.de german universities www.uni-frankfurt.de
www.tu-CITY.de   www.tu-harburg.de
www.fh-CITY.de   www.fh-friedberg.de

$\triangleright$
Collections of important links exist for many special topics.

$\diamond$
Several mathematical institutes in Germany present a large collection of links relevant for mathematics, the MathNet project: http://www.math-net.de/

$\diamond$
The computer journal c't has several pages, which help to solve most common (and a lot of not so common) problems with hard- and software:

http://www.heise.de/ct/tipsundtricks/

$\triangleright$
Most university libraries have online access to their catalogues (OPAC).

$\diamond$
The german central library for mathematics and physics is the TIB/UB at the University of Hannover. Their huge catalogue is online at

telnet://opac.tib.uni-hannover.de

$\triangleright$
Most articles that have been posted in the news since 1995, are archived on the Dejanews server:

http://www.dejanews.com/

One can search for keywords in special news groups.

$\diamond$
Looking for the keyword Strassen in the news group sci.math.research one finds 6 articles, which are related to the famous Strassen algorithm for fast matrix multiplication. If one searches in all newsgroups with math as part of their name (i.e.*math*), one gets 72 results, some of which discuss the probability to win the game of Monopoly with certain "Strassen" (german for "streets"). Of course these numbers change frequently.

$\triangleright$
Several large servers exist, which present a (more or less) large part of the internet in searchable form. Such services are usually sponsored through advertisements.

Web catalog, searching machine where information is sorted into several topics and subtopics. This is done manually by editors.

$\diamond$
Name URL Comments
Dino www.dino-online.de german pages, started at the U. Göttingen
Yahoo! www.yahoo.com mostly US american pages, classic catalogue
  www.yahoo.de Yahoo! for german pages

$\triangleright$
Web catalogues can only represent a small part of the internet, but they give a good overview of important sites. They are especially useful for

-
a first survey of a topic
-
broad searches

$\diamond$
At Yahoo, we find under "Science", subtopic "Mathematics" e.g a math FAQ, resources for math exams or the Quantum Lie Group page - and lots of further subsubtopics.

5104

Search engine, system that automatically searches through the internet and creates a large database of keywords and pages. One can search for combinations of keywors with a simple mask or using a complex query language.

$\diamond$
Name URL Comments
Altavista www.altavista.com first HUGE search engine
HotBot www.hotbot.com  
Fireball www.fireball.de german documents, same query language as Altavista

$\triangleright$
Even the largest search engines can't go through more than aboput 10 % of the internet. They revisit pages in intervals of several weeks, which means that many of the links in the database are already outdated or non-existent.

$\triangleright$
Search engines are especially useful for

-
searching for concrete names or rare terms
-
searches that can be refined by several conditions

$\triangleright$
Simply entering a single keyword usually returns a huge amount of hits. It's important to further restrict the search using the query language.

$\triangleright$
Different search engines often use different query languages. For efficient searches one should learn the basics of some of the mostly used languages.

$\diamond$
Some elements of the query language of Altavista:


Syntax Meaning
alllower matches lower and capital letters, e.g. alllower, AllLoweR
upAndDown matches only the exact form
WORD1 WORD2 ... at least one of the words have to appear in the page, sorted by number of hits
¨WORD1 WORD2¨ phrase has to match exactly (except for lower/upper letters)
+WORD1 +WORD2 -WORD3 WORTD and WORD2 have to appear, WORD3 must not appear
do*ument * is a (wildcard) that matches any combination of characters, e.g. document, dokument

$\diamond$
A search at Altavista:


Query Result
Strassen 89540 hits,almost all in german, referring to streets

+Strassen +matrix

3100 hits

+Strassen +matrix +implementation

196 hits

5303

$\triangleright$
Many portal sites combine a web catalogue and a search engine.

$\diamond$
If a certain keyword is not found in the Yahoo pages, it is forwarded to a search engine.

$\triangleright$
The number of large search engines and catalogues is ever increasing - hundreds already -, each covering only a small part of the web.

Meta search engine, special server, which forwards a query to several search engines simultaneously. It combines the results that arrive in time and tries to rate their relevance.

$\triangleright$
Meta search engines can't use the query languages of the used search engines. At best they have an own syntax, which is translated for each machine used.

$\diamond$
Name URL Comments
MetaCrawler www.metacrawler.com own query syntax
Highway 61 www.highway61.com admits repeated searches
MetaGer meta.rrzn.uni-hannover.de asks german search engines

$\triangleright$
A good strategy is to look for pages which themselves contain links to the areas one is looking for, instead for a special document, and bookmark them. This will lead to less "dead links". Such a collection will soon become the best personal "search engine".

previous    contents     next

Peter Junglas 8.3.2000