For the WWW client I wrote for VM, I chose:
f) If it contains control chars (in the range 0x00-0x1f and
0x80-0x9f except for CR+LF+FF+HT) it must be binary; else
if the upcase of the file suffix is HTM or HTML or if the file
contains (upcased) tags such as <HTML> or <HEAD> or <TITLE> or
<PRE> then it must be HTML. Otherwise it must be plaintext.
I also use this same rule to handle documents retrieved via ftp,
gopher, etc.
I claim this method follows the "rule of least surprise".
-david