Robert P. Goldman said: > Kevin, I think there is an *in principle* reasons why this should not > be possible. > > Parsing HTML is a context-free parsing problem (since the tags can > embed and you have to have a stack to track the things you want to > match), not a regular expression parsing problem (there's no fixed > bound of memory you need to do this job). I disagree. Unless there's more going on here than the original question stated, Kevin doesn't sound like he's interested in the structure of the HTML tags or whether they match up. He just wants to create a list of 'approved' tags and make everything else go away. At worst, he might need to walk through the (surviving) tags with a set of flags for whether, e.g., <I> is turned on and append a </I> to the document if the submitter forgot to close it. -- "Two words: Windows survives." - Craig Mundie, Microsoft senior strategist "So does syphillis. Good thing we have penicillin." - Matthew Alton Geek Code 3.1: GCS d- s+: a- C++ UL++$ P+>+++ L+++>++++ E- W--(++) N+ o+ !K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI++++ D G e* h+ r++ y+ --------------------------------------------------------------------- To unsubscribe, e-mail: tclug-list-unsubscribe at mn-linux.org For additional commands, e-mail: tclug-list-help at mn-linux.org