NAME
BW Whois -- A whois client by Bill Weinman
SYNOPSIS
whois [options] request[@host[:port]] [ ... ]
VERSION
This documents BW Whois version 3.4
DESCRIPTION
BW Whois was originally designed to work with the new "Shared
Registration System" whois introduced 1 December 1999. This new system
has proved to be remarkably disorganized and inconsistent, resulting in
tremendous confusion for those of us who need to find the ownership of a
domain now and then.
This program mitigates most of that confusion by referring to a table of
TLDs (Top-Level Domains) and associated registrars in the tld.conf file.
Over the past few years this program has evolved into the most
full-featured whois client available providing features like a
self-detecting CGI mode and SQL database caching, for those who need
such features, while still maintaining a simple command-line interface
for those who just need that.
The CGI mode can be secured against abuse by a number of different
methods including "Referer:" headers, IP addresses, and a system of
128-bit hashed cookies. These security options can be tailored to suit
the demands of a given installation using the whois.conf configuration
file.
There are features to support a web-based whois service, including
support for Apache-style server-side includes, and support for a
distinct initial page a "domain not found" page.
An optional caching capability is provide for using an SQL database
(currently MySQL is supported). When configured for caching, requests
are forwarded to the corresponding whois server only if the cache does
not contain a result for the given request/server combination. Cached
values are expired after a configurable amount of time.
OPERATION
When given a request, the program first checks the requested domain
against the tld.conf file for an associated whois server. If not found
the program will then submit the request to the "root" whois server
(currently whois.crsnic.net) and wait for a referral to a registrar's
whois server.
If given a referral, the program will then submit the request a second
time to the referred whois server.
The request can be a domain name, (e.g. whois bw.org) or any other
entity that the given host can resolve (e.g. whois
!ww104@whois.networksolutions.com).
If request is an IP address (or part thereof), the ARIN whois server
will be used as a root server (whois.arin.net).
If host is specified, the request will be sent literally to the
specified host.
If both host and port are specified, the request will be sent to that
host using the specified port instead of the normal whois port (43).
Multiple requests on a single command line are supported.
Self-detecting CGI Support
BW Whois detects CGI operation by looking for the standard "SCRIPT_NAME"
environment variable. This behavior can be overridden by using the
--nocgi switch.
In CGI mode the program attempts to make intelligent links out of IP
addresses, domain names, and handles. It doesn't always get it right,
but it tries real hard!
You can also specify an optional whois.html file to create your own
look. The HTML file will need a few simple "placeholders" in it. The
placeholders are replaced at runtime with the various values which make
this work. These placeholders are represented by text enclosed in '$'
signs like this: "$PLACEHOLDER$"
Separate HTML files may be specified for an initial page and a "not
found" page, if desired.
The placeholders are described here:
$SELF$
The URI path of the program on your web server, taken from the value
of the "SCRIPT_NAME" environment variable.
$DOMAIN$
The domain that was last looked up, if any.
$RESULT$
The result of the whois query from BW Whois.
You can get an example file from the program with:
whois --makehtml > whois.html
Optional Apache SSI Support
If you need to include other files into your HTML file dynamically,
experimental support for Apache-style SSI (server-side includes) is
provided with the bwInclude.pm module. This currently works only for
"include virtual" and "echo var" directives.
Simply place the bwInclude.pm file with your other perl module files, or
specify the directory that contains the module in the "use lib" line in
the source code.
Optional TLD Table Support
Bcause of the unfortunate design of the Shared Registration System, only
the .COM, .NET, and .ORG Top-Level Domains (TLDs) are referred by the
"root" domain servers at whois.crsnic.net and whois.internic.net. If you
want results for other TLDs you must know where to find them, and there
is no central repository for current whois server referrals.
The optional whois.tld file includes whois servers for all known TLDs,
and some second-level domains that are registrered separately (e.g.
.net.au, .uk.com, etc.).
The format of the tld.conf file is as follows:
Lines that begin with "#" are ignored.
Token lines are like:
token token optional comments
The first token is the TLD, the leading dot (".") is required.
The second token is the fully-qualified domain name for the whois
server that responds to requests for the given TLD.
The two tokens can be separated by spaces and/or tabs
Anything on the line after the second token is ignored.
A leading "#" for in-line comments is not required, but may be in
the future.
The file is searched sequentially, so it's important to have
2nd-level domains earlier in the file than corresponding top-level
domains. (e.g. .net.au before .au).
Optional Support for Stripping Disclaimers
Most whois servers deliver a disclaimer along with thier whois results.
The disclaimer generallly says something like "By submitting the request
that you already submitted before you saw this agreement you have agreed
to this binding contract. Haha!"
Many people who are not otherwise lawyers are annoyed by this. The
stripdisclaimer option will remove the disclaimers before you see them.
This feature requires the sd.conf file.
The format of the sd.conf file is:
server "first line" "last line"
server is the DNS name of the whois server
"first line" and "last line" are regular expressions that match
the first and last line (respectively) of the disclaimer to be
stripped. The quotes are required.
Netblock Referrals
This program attempts to find netblock requests. If a request is
entirely numeric (e.g. 123.234), the program first checks with
whois.arin.net (ARIN). If an ARIN record contains a referral to another
whois system, (e.g. RIPE or APNIC) the program will attempt to detect
that and snatch the record from the referened whois system. Note: ARIN's
records are very inconsistent in their formatting, so this may not
always do something intelligent.
Packed IP addresses
If the request is a string of numbers without any other characters, the
program will treat it as a 32-bit (packed) IP address. It will first
unpack it into dotted-quad notation and then submit it to the ARIN whois
server.
Packed IP addresses are often used by spammers in an attempt to confuse
those who might try to report thier abuse. This feature makes it easy
for you to decypher those addresses and find the owner of the netblock
all in one step.
IP addresses are actually 32-bit integers (until we get IPv6 -- but
that's another story). The common notation represents the address as
four separate 8-bit integers, like this: 192.149.252.21 (actually one of
ARIN's servers). That's called "dotted-quad" notaion. If you were to
represent that address as one big 32-bit integer it would look like
this: 3231054869. I call that a "packed" IP address.
Sometimes a spammer will use a packed IP address in a URL like this:
http://3231054869/index.html
That address will work in a web browser, but it's hard to look up. This
program will accept a packed IP address like this:
whois 3231054869
The program will unpack it into dotted-quad notation, and submit it to
the ARIN whois server just like a normal IP address.
COMMAND LINE SWITCHES
--help
Print a usage message.
--version
Print the version information and exit.
--config=path
Full path to the configuration file. Default: /etc/whois/whois.conf
--refresh, -r
Refresh the cache for this query. Forces the request to go to the
whois server even if the result is cached. (Only valid if caching is
configured.)
--tld=path
Full path/file name for tld.conf file. Default: /etc/whois/tld.conf
--host=host, -h host
Specify a specific host.
--port=port, -p port
Specify an alternate port.
--timeout=seconds
Set the timeout to a number of seconds. The default is 60 seconds if
this is not specified.
--quiet, -q
Be wery, wery quiet. I'm hunting wabbits. (--quiet overrides
--verbose)
--verbose, -v
Show details of every step. (--quiet overrides --verbose)
--stripdisclaimer, -s
Sets the stripdisclaimer mode. The program makes an attempt to strip
off those inane disclaimers that so many registries are starting to
include with their whois records. This feature requires the sd.conf
file.
--makehtml
Writes a sample HTML file (for CGI mode use) to standard out.
--nocgi
Prevent CGI mode. This is useful if you have a script that used a
legacy character-mode whois program.
--html
Create HTML links of handles, IP addresses, and domains without
using HTML in the rest of the output. Useful with --nocgi for using
an external wrapper CGI program.
--jpokay
Allow japanese output from nic.ad.jp.
CONFIGURATION FILE
A sample whois.conf file is included with the BW Whois distribution. It
is not necessary to use the whois.conf file to use the program.
If you want to use advanced features, such as caching or optional CGI
security features, you will need to install the whois.conf file and
configure it to reflect your preferences.
The standard location for whois.conf is in the /etc/whois directory. If
you do not have access to that directory, or are running on a non-UNIX
operating system that does not use the /etc directory, you may specify
another location by setting the "WHOIS_CONF" environment variable or by
editing the source code.
If you need to edit the source code, be sure you are using a plain
text editor (not a word processor!) and that you save the file with
appropriate line-endings for your system. If you do not understand those
distinctions I highly recommend that you find a friend or hire a
consultant who knows about such things. (The author is occasionally
available for such small consulting tasks -- feel free to contact him if
you need help.)
Format of the Config File
The config file format is very simple.
Lines that begin with "#" are considered comments and are ignored.
Anything after a "#" to the end of a line is considered a comment and
ignored.
The format of each non-comment line is:
option value
For logical values, "1" or "true" (without the quotes) are considered
true. Anything else is considered false.
For options that take a list of values, the list is separated by colons
(":") without spaces. Spaces are not currently supported in any value.
See the SECURITY section of this man page for more information about
security features.
The following options are supported:
stripdisclaimer true|false
Strip off the disclaimer/header from the results returned by many
registrars. This feature requires the sd.conf file.
tld_conf filepath
Alternate location for the tld.conf file. Default:
/etc/whois/tld.conf
sd_conf filepath
Alternate location for the sd.conf file. Default: /etc/whois/sd.conf
timeout number
The number of seconds to timeout if a result is not returned by a
whois server. Default: 60 seconds.
default_host hostname
A hostname to use as a default whois server if the TLD is not found
in the tld.conf file. Default: whois.crsnic.net
htmlfile filepath
An HTML file to use for queries and results. Default: internal
htmlfirst filepath
An HTML file to use for the initial page. This is the page displayed
when no query is submitted. Default: htmlfile or internal
htmlnotfound filepath
An HTML file to use for results that are not found. This is the page
displayed when a query returns a negative response. It may be used
to display a page indicating that a domain may be available for
registration. Default: htmlfile or internal
htmlfound filepath
An HTML file to use for results that are found. This is the page
displayed when a query returns a positive response. It may be used
to display a page indicating that a domain is not available for
registration. Default: htmlfile or internal
error_403 filepath
An HTML file to use for error 403 (Forbidden) results. Default:
internal
error_408 filepath
An HTML file to use for error 408 (Expired Session) results.
Default: internal
logfile filepath
This option enables logging and provides a path and filename for the
log. Log entries look like this:
2002-12-11 20:06:00 [12745] (192.168.0.30) whois.cgi: cgi domain: bw.org (1)
Items are, from left to right:
Date and time (UTC) of the log entry.
The process ID, enclosed in square brackets.
The IP address of the CGI client, enclosed in parenthesis. This
item only appears in CGI mode.
The process name, or the log_name (see below), followed by a
colon.
The text of the log entry (in this case, "cgi domain: bw.org").
A log-level for this item. The log-level only appears if
log_level (see below) is provided in the config file.
Make sure the user-ID that owns the whois process has permission to
write the log file. This option is usually used when running in CGI
mode. In that case, you need to ensure that the user-ID of the web
server has permission to write to the log file.
log_level level
level can be a number from 1-9.
This item specifies what level of logging you want. Without this
item, events with log-levels higher than 1 will not be logged. For
most purposes, that will be fine. The higher the number, the more
events get logged.
log_name name
This option provides a specific name for log entries. This will be
used instead of the process-name in log entries.
database token
This option enables database operations. Currently the only token
allowed is mysql.
connect connect string
This option is required if database is used. It specifies the
connection parameters used to access the database. The format is:
database:host:port:user:pass
For example, if your database were named "whois" on the local
machine, on the standard port (3306) and the user was "web" and the
password was "foo.bar" you could use:
connect whois:localhost:3306:web:foo.bar
cache_table table_name
The name of the database table to use for the results cache. This
also serves to enable results caching.
cache_expire seconds
The number of seconds to hold a result before it is considered
stale. Stale results will be refreshed when requested again.
Default: 432000 seconds (five days).
control_table table_name
The table name to use for security control records. This is required
to enable security control features.
cookie_name cookie_name
The name to use for control cookies. This also serves to enable the
cookie control feature.
cookie_expire seconds
How many seconds a cookie is valid for. Default: 3600 seconds (one
hour).
ip_control number
The number of hits allowed from one IP address within the ip_expire
time. This also serves to enable the IP control feature.
ip_expire seconds
The number of seconds required between hits from one IP address
before that address is expired from the control table.
allow_referer list:of:domains
A list of valid hostnames to allow in the "Referer:" header. Use a
value of to turn off referer checking entirely. Default: The
hostname in the HTTP "Host:" header.
direct_link number
Allow links to a whois record without a cookie or a referer. This is
useful for providing a link in an email message. The number is how
many seconds apart to allow linked hits from the same IP address.
This requires control_table and ip_control.
ENVIRONMENT
The environment variable "WHOIS_CONF" may be used to specify an
alternate path to the whois.conf file.
The environment variable "BW_WHOIS" is no longer supported.
SECURITY
This version of BW Whois contains features to help secure a
web-accessable installation from abuse.
Over the past few months many users of BW Whois have sustained attacks
from automated web clients (bad robots) that would rapidly request whois
results, presumably for illicit purposes. My own server was attacked and
queries from my server became disallowed by Verisign (ne Network
Solutions).
When I first detected these attacks on my own site, I quickly
implemented a simple control that kept a flat-file list of IP addresses
and refused connections from an IP address after it was represented more
than a given number of times in that file.
A few weeks later the attack started up again from a number of IP
addresses too large to control in this manner. I was amazed, to say the
least. My server was blocked again by NSI. This was a coordinated attack
from a large number of hosts on a large number of disparate networks.
This time I buckled down and devised a set of controls that would
require a lot more sophistication to subvert. So far these controls have
been very successful on my server.
Three Types of Controls
There are three distinct types of controls. They can be used separately,
but personally, I use all three and I recommend you do the same.
Referer Controls
The referer controls are enabled by default and do not require that a
database be installed.
If a request is received that does not provide an HTTP "Referer:"
header, or provides a referer that does not match the hostname in the
"Host:" header, the request is denied and a 403 (Forbidden) result code
is returned.
So far the robots do not provide an HTTP "Referer:" header, but I expect
they will soon if people rely on this control without the others. It
would be a trivial addition to their code.
IP Controls
The IP control requires an SQL database. Currently only MySQL is
supported (by far the most popular database on the net). Support for
others will come later.
Whenever a request comes in from a web client, the database is queried
to see if that IP address has visited recently. If not found, the
request is allowed and a record is created.
If the IP address is found in the database, a counter is updated to
reflect how many hits have arrived from that address. If the count is
above the limit, the request is denied and a 403 (Forbidden) result code
is returned. If more than "ip_expire" seconds have passed since the last
hit from that IP address, the count is reset and the request is allowed.
This control will be difficult to subvert. The problem is that the count
must be high enough to permit hits from clients behind proxy servers,
such as AOL and Earthlink users.
Cookie Controls
The cookie controls also require an SQL database. Currently only MySQL
is supported (by far the most popular database on the net). Support for
others will come later.
When a first request comes in from a web client (e.g., a request for a
web form, but not for data), a unique cookie is generated with a 128-bit
pseudo-random hash, and given to the browser. The cookie is then stored
in the database with a timestamp showing when it was generated.
When a web client makes a request that requires a data response, a
registered cookie is required. If no cookie is provided a 403
(Forbidden) result code is returned. If an expired cookie is provided a
408 (Expired Session) result code is returned.
A new cookie is generated on each connection from each client.
In order to subvert this control, a robot would have to process and
store actual cookies. So far, they don't do that.
Direct Links
Some users have requested a way to provide links to individual whois
records to their clients in email messages. A facility is provided to
allow this practice without significant compromise to the system.
When the direct_link option is set in the whois.conf file, links are
allowed with neiter a cookie nor a referer, but not if that IP address
has been used within the number of seconds provided in the option line.
This has the same problem as the IP controls with proxy clients, but it
should work under most circumstances.
CAVEATS
Not all whois servers comply with RFC 954. Unfortunately that lack of
compliance is so inconsistent that the same commands can produce wildly
different results from server to server.
This client deals with the situation by sending fully-qualified requests
only to NSI's servers, and the simplest form of request to other
servers. This tactic is not entirely reliable.
SEE ALSO
RFC 954: NICNAME/WHOIS
http://www.ietf.org/rfc/rfc0954.txt
FILES
/etc/whois/tld.conf
An optional table of TLDs and associated whois servers.
/etc/whois/whois.conf
A configuration file for optional flags and other configurable
values.
/etc/whois/sd.conf
A configuration file for optional stripdisclaimer feature.
NOTE BENE
The format of the tld.conf file changed in version 2.7. Please be sure
your file has leading dots (e.g. .au) if you are using a current version
of BW Whois.
The tld.conf file for versions 3.0 and above includes servers for the
.COM, .NET, and .ORG domains. Older versions of the program did not
support tld.conf file lookups for these domains.
The default location for all the configuration files was changed to
/etc/whois/ in version 3.1.
The stripheader feature was changed to stripdisclaimer in version 3.1.
This feature now requires the sd.conf configuration file.
HISTORY
The whois command first appeared in 4.3BSD. The BW Whois command first
appeared 2 December 1999.
See the HISTORY file for more detail about the history of BW Whois.
AUTHOR
Bill Weinman
You can find the latest version of BW Whois at .
You can send email to Bill Weinman using the web form at
.
COPYRIGHT
Copyright 1999-2003 William E. Weinman
This program is free software. You may modify and distribute it under
the same terms as perl itself.