> Is there a utility to strip away HTML tags.
if you can't find anything else, the following
perl script (which i call 'unhtml') will work ok:
#!/usr/bin/perl
$* = 1; # turn on multi-line string matching
undef($/); # turn off paragraph-mode reading
$_ = <>; # read in entire file
s/<[^>]+>//g; # remove <...>'s in the entire string
print; # print the file
this would be run like:
unhtml file.html >file.txt
it's not by any means perfect -- angle brackets
within quoted strings will be munged, and nothing
is done with entities (like &).
another option, especially if you want the html
code to be formatted, is to use the lynx browser
in 'dump' mode:
% lynx -dump file.html >file.txt
hope this helps.
-- John Labovitz Technical Services Manager, Global Network Navigator <http://gnn.com/> O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)