pjs/webtools/web-sniffer/READ-ME.txt



                                   SniffURI

                     by Erik van der Poel <erik@vanderpoel.org>

		           originally created in 1998

                              http://sniffuri.org/


Introduction

This is a set of tools to work with the protocols underlying the Web.


Description of Tools

  view.cgi

    This is an HTML form that allows the user to enter a URL. The CGI then
    fetches the object associated with that URL, and presents it to the
    user in a colorful way. For example, HTTP headers are shown, HTML
    documents are parsed and colored, and non-ASCII characters are shown in
    hex. Links are turned into live links, that can be clicked to see the
    source of that URL, allowing the user to "browse" source.

  robot

    Originally written to see how many documents actually include the HTTP
    and HTML charsets, this tool has developed into a more general robot
    that collects various statistics, including HTML tag statistics, DNS
    lookup timing, etc. This robot does not adhere to the standard robot
    rules, so please exercise caution if you use this.

  proxy

    This is an HTTP proxy that sits between the user's browser and another
    HTTP proxy. It captures all of the HTTP traffic between the browser and
    the Internet, and presents it to the user in the same colorful way as
    the above-mentioned view.cgi.

  grab

    Allows the user to "grab" a whole Web site, or everything under a
    particular directory. This is useful if you want to grab a bunch of
    related HTML files, e.g. the whole CSS2 spec.

  link

    Allows the user to recursively check for bad links in a Web site or
    under a particular directory.


Description of Files

  addurl.c, addurl.h: adds URLs to a list
  all.h: header that includes all the required headers
  buf.c, buf.h: I/O routines
  cgiview.c: the view.cgi tool
  config*: autoconf-related files
  dns.c: experimental DNS toy
  doRun: used with robot
  file.c, file.h: the file: URL
  ftp.c: experimental FTP toy
  grab.c: the "grab" tool
  hash.c, hash.h: incomplete hash table routines
  HISTORY: brief history of this project
  html.c, html.h: HTML parser
  http.c, http.h: simple HTTP implementation
  index.html: used with view.cgi tool
  INSTALL: see this file for build instructions
  link.c: the "link" tool
  main.h: very simple callbacks, could be more object-oriented
  Makefile, Makefile.in: to build everything
  mime.c, mime.h: MIME Content-Type parser
  net.c, net.h: low-level Internet APIs
  pop.c: experimental POP toy
  proxy.c: the "proxy" tool
  robot.c: the "robot" tool
  run: used with robot
  test: directory used in testing
  thread.c, thread.h: for threading in the robot and locks elsewhere
  TODO: notes to myself
  url.c, url.h, urltest.c: implementation of absolute and relative URLs
  utils.c, utils.h: some little utility routines
  view.c, view.h: presents stuff to the user


Description of Code

  The code is extremely quick-and-dirty. It could be a lot more elegant,
  e.g.  C++, object-oriented, extensible, etc.

  The point of this exercise was not to design and write a program well,
  but to create some useful tools and to learn about Internet protocols.