?

Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Geek question: RTF to HTML - shadows of echoes of memories of songs — LiveJournal
j4
j4
Geek question: RTF to HTML
Does anybody know of any good, preferably FREE (as in beer) software for converting RTF to HTML? We've been using r2h95 (which is shareware) but it doesn't work with Win2K, which we've just upgraded to. We need something that will run on Win2K or linux, preferably, though MacOS stuff could be considered.

Yes, I know we could roll our own, but it would be nice not to have to reinvent wheels which are already rolling along happily.

* * *

Whew. Just tried this one and while it does convert, the last line of the HTML it outputs is:


</font></B></font></font></font></B></font></font></Body></font></Body></Html>


*groan*
Read 20 | Write
Comments
huskyteer From: huskyteer Date: June 22nd, 2004 01:27 am (UTC) (Link)
</font></B></font></font></font></B></font></font></Body></font></Body></Html>

Wow, it's almost as good as Dreamweaver!
wechsler From: wechsler Date: June 22nd, 2004 01:31 am (UTC) (Link)
Try running the output of that through Tidy: http://tidy.sourceforge.net/ ? ;)
j4 From: j4 Date: June 22nd, 2004 02:22 am (UTC) (Link)
Yeah, that helps, but it'd be nice to start with something a bit tidier...
rbarclay From: rbarclay Date: June 22nd, 2004 01:52 am (UTC) (Link)
You could always resort to OOo, even if it nearly redefines the meaning of BloatWare.
j4 From: j4 Date: June 22nd, 2004 02:25 am (UTC) (Link)
I assume this is a replacement for MSOffice? If so, I'm afraid it's no help -- we don't get any say in how the original document is created, it will come to us as something which Word can output, whether we like it or not. :-(
rbarclay From: rbarclay Date: June 22nd, 2004 02:55 am (UTC) (Link)
Yeah, it's an office suite (free as in speech). But I meant using it just as a converter, eg. open the file, save it as something different.
j4 From: j4 Date: June 22nd, 2004 04:13 am (UTC) (Link)
Oh, I see. ... Is its save-as-HTML any better than Word's, then?
oldbloke From: oldbloke Date: June 22nd, 2004 06:20 am (UTC) (Link)
How could it be worse?
rbarclay From: rbarclay Date: June 22nd, 2004 07:41 am (UTC) (Link)
I've no idea.
crazyscot From: crazyscot Date: June 22nd, 2004 01:53 am (UTC) (Link)
My Debian box at work knows about unrtf, which sounds like it might do the job though it's no longer supported by the original author, who has gone down the shareware road - see http://home.comcast.net/~smithz/. Beware, though, that the darker depths of RTF are rumoured to be Microsoft-proprietary.
imc From: imc Date: June 22nd, 2004 01:56 am (UTC) (Link)
I've previously used GNU UnRTF with a certain amount of success to read RTF files. I don't know much about its HTML output because I usually use the plain text filter. (It has LaTeX and PostScript filters too, but the HTML one claims to be the most developed.)

No idea what systems it runs on, but it certainly runs on Unix, and Mac is Unixy, right?
imc From: imc Date: June 22nd, 2004 02:02 am (UTC) (Link)
(OK, call me silly - I didn't notice you'd actually mentioned Linux in the question. You'll have no problems getting it to run on that.)

The latest version seems to be 0.19.1 and it's worth getting because 0.18.1 sometimes crashes and they claim to have fixed that (or at least some crashing).
chrisvenus From: chrisvenus Date: June 22nd, 2004 03:05 am (UTC) (Link)
Anything tha twrites HTML with two body tags is not to be trusted... Unfortunately I have no idea what would be best to convert. I'd probably do similar routes to somebody else's suggestion and open it in word and save it as non-bloaty HTML. Probably doesn't produce great HTML either but at least it wouldn't start wrapping body tags in font tags.... Eww! :)

And not helpful I know. I just needed to briefly release my anger at those body tags. :)
j4 From: j4 Date: June 22nd, 2004 04:14 am (UTC) (Link)
Have you seen Word's save-to-HTML? Nested body/font tags would be the least of your worries!
chrisvenus From: chrisvenus Date: June 22nd, 2004 04:48 am (UTC) (Link)
Its not too bad. At least it is valid. And with office 2000 you can get a html filter thingy from microsoft that will allow you to save without the office metainfo in there and this comes built in to later versions (I think). I'd much rather have bloated but valid HTML because that is easier to filter. If something is giving me two body tags, somethign that realyl shouldn't even be allowed I wouldn't trust it to give me any kind of markup that would be properly interpreted by a browser. On the other hand this is almost certainly a personal preference thing and I would agree that loading into word and saving is not the best option here anyway so its something of a moot point.
j4 From: j4 Date: June 22nd, 2004 04:58 am (UTC) (Link)
Agreed valid HTML is better than invalid, but Word's HTML has so much extra crap in it that "filtering it" involves basically rewriting the HTML from scratch. Not convinced that's any better than rolling your own RTF-to-HTML converter in the first place!
oldbloke From: oldbloke Date: June 22nd, 2004 06:28 am (UTC) (Link)
I just tried opening an rtf (created in Word) in StarOffice6 and saving it as html.
It puts more in then I'd like (blank lines replaced by p blocks with a style attribute), but fairly sensible other than that.
It does put some meta stuff in the top so you know it went through Star Office.
otoh, the original rtf was really a notepad file with zero interesting content, so i dunno how SO6 would get on with something more complex.
Can you run SO on your platform?
j4 From: j4 Date: June 22nd, 2004 06:59 am (UTC) (Link)
Dunno if we can run SO -- I don't have a linux box, only the boys have those. 8-) Will ask 'em.

"Something more complex" is the problem -- we get lots of stuff with weird-ass formatting which we're expected to preserve and/or turn into something useful. Lots of styles, headers, tables, lions, tigers, bears -- oh my! -- bells, whistles, Old Uncle Tom Cobbleigh and all.

Which I really should get back to. :-(
burkesworks From: burkesworks Date: June 22nd, 2004 09:54 am (UTC) (Link)
Pretty sure you can run StarOffice on Win2k.... got a copy lying around doing nothing that you can have.
oldbloke From: oldbloke Date: June 23rd, 2004 01:06 am (UTC) (Link)
If you can't run SO, you should be able to run its kissin'cousin OpenOffice - they cover almost all platforms between them.
Read 20 | Write