Why OpenOffice.org? Open Document Formats 478
Jem Berkes writes "In this current article about OpenOffice.org (also covered at Linux Today), I try to make a point about OpenOffice's commitment to open document formats and interchange as the strongest selling point - never mind cost. The OOo developers are putting a lot of effort into their XML format; will this pay off, and will users notice the significance of OpenDocument/OASIS document formats?" This can't be said enough: file formats are what determine whether and how easily data is portable, or whether the user is just stuck.
file size (Score:5, Interesting)
OO in law offices (Score:5, Interesting)
I'm learning --- ever so slowly --- more about Linux and Samba so I can complete the office transformation some day. Its hard to find patient teachers, and tech understanding comes slowly to some of us. Its worth the effort though.
Re:The sad thing is... (Score:5, Interesting)
A non binary filetype has many more perks as well (Score:4, Interesting)
XML Formats rock! (Score:5, Interesting)
Using a proprietary data-format, I'd be lost now. Using an XML-Format, I just open the file in a text-editor, check what happenend since my last (regular) save, copy&pasted the changes step by step to the old file, until it crashed.
Then one step back, analyze the problem, send bug-report to Scribus-developers and be a happy man.
Data Interchange with Open File Formats (Score:5, Interesting)
In another procect, I use a similar technique to visualize raw data given by CSV (e.g. Adsense data). It saves me a bunch of work I'd had to do manually in Excel.
Magic like this would not be able utilizing proprietary file formats. OOo's XML file format has made my life easier. And I love OOo for it :)
Re:Who cares if its XML? (Score:2, Interesting)
You are right, still XML is a hard hitting buzz word that has the attention of the politicians. XML and open formats have been synonymous at least in my country (Denmark) where open formats is something no politicians talk against (as opposed to open source).
Re:file size (Score:4, Interesting)
Re:Who cares if its XML? (Score:3, Interesting)
Might other word processors adopt the format?? (Score:4, Interesting)
I wonder how feasible it would be for other word processors, such as AbiWord, to use this format natively. Or, at least appear to use the format natively.
That is, after all, what happens in other areas: MS owns the market leading, proprietary, format/protocol, and then the others rally around an open alternative.
BTW, I don't think that the XML encoding is important. What matters is that the format is legally open, that it is published with good documentation, and that there is nothing hidden in it to tie people to OOo.
Re:Too Bad OO Sucks So Bad (Score:2, Interesting)
Which is quite odd, because a huge number of people still are using Office 97. The bank I work for is 100% Office 97 (on NT4, not kidding), at home I use Office 97. Actually, I strongly dislike anything beyond Office 97. I don't see any reason to upgrade... many people don't. So OpenOffice is probably what I need to install in order to get what I need and don't have to battle with Office XP (or whatever it's called these days)
Also note that many OEM machines don't come with Office. They have Word. All the rest is Works, and Works really is a bad bad suite.
wouldn't that make data recovery harder? (Score:3, Interesting)
File format interchange (Score:2, Interesting)
I favor html to the doc (in any shape or form), but what I do like about OOo is it's file conversions, which are still a little clunky, but they're still usable. I find the following especially useful:
Re:Why not just .pdf? (Score:1, Interesting)
Are you serious? Just to give you two very common examples (since this is what I need to do frequently):
Studying: Group assignments on courses
Working: I run a small consulting/custom software development business - we have had cases where clients want to change some terms in our default contract, in those cases we send it as a
Re:Who cares if its XML? (Score:2, Interesting)
I would still fear working with binary formats [pipex.com] (not that the example I cite is properly documented, but the bits people have figured out give me nightmares).
Re:50 years from now (Score:4, Interesting)
Re:Formatting Woes (Score:5, Interesting)
It's by design. When MS Word was being pushed by Microsoft as "industry standard" (back in the late '80ies, early '90ies), it came with dozens of import filters for about any word processor format known to Man. So the MS sales person could always point out that no one would loose any old data, because Word was pretty capable of reading the format in question.
With the later versions, the number of file formats MS Word was supporting, shrank. And today it is reduced to old MS Word formats (and none of them as perfect as other office suites) and to a number of good documented formats (RTF, HTML, plain text). I remember when the company I was working for was converting from OS/2 to Windows NT4.0 and the old Ami Pro documents were no longer readable. It was quite an effort to finally find an old copy of Winword 6.0a to import the Ami Pro files, because the later incarnations of MS Word weren't able to read them directly.
Re:How to speed OpenOffice file-format adoption (Score:5, Interesting)
PDF? Proprietary? Only if you mean Adobe's implementation. There are thousands of tools out there for generating and viewing PDF content in the open source world. Calling PDF proprietary simply because Adobe doesn't provide a viewer for all platforms would be like calling multicast DNS proprietary because at least initially, stock versions of Rendezvous wouldn't compile under Linux.
Based on that same definition, Postscript is proprietary. Oddly enough, Ghostscript is sometimes known to open encapsulated postscript files generated by Adobe Illustrator that Adobe's own Photoshop can't. When the open source software exceeds the quality and reliability of the reference implementation, it can no longer reasonably be described as proprietary, even if the reference implementation happens to be, IMHO.
That said, I would no more recommend people posting PDF or OOo docs than Word docs, for exactly the same reason. You have to download special software to view it. Even if Firefox had a plug-in in the shipping version, most people wouldn't have that version. For that matter, most people don't use Firefox.
The web is a powerful platform for deployment of information precisely because there are a very limited number of standard formats for contents, and a single standard environment for viewing them. It pisses me off to no end when I see a PDF file without an HTML version alongside it. The last thing I want to do is deal with a whole different environment to view content---whether it's Acrobat or a viewer plug-in makes no difference. Ditto for Word, OOo, etc. (As I always say, "Repeat after me: 'HTML is for Viewing, PDF is for Printing'.")
And I hope I -never- have to read something that some clueless peson uploaded in Postscript again. Yes, there's software for every platform, but no, most people don't have it installed, and it's a pain in the ass to distill to PDF just to view something that's usually mostly plain text anyway. And before you ask, yes, sometimes I have been known to just read the Postscript file in vi.
Bottom line, if in doubt, HTML. If HTML won't work because the person posting it is too anal about formatting... HTML anyway, and post a nice, neat, formatted PDF for the three other people in the world who are as anal as they are. ;-)
</rant>
We now return you to your regularly scheduled discussion of open formats.
Re:Might other word processors adopt the format?? (Score:4, Interesting)
Re:Data Interchange with Open File Formats (Score:3, Interesting)
While it is right that MSO has some interoperation features, it might not have the ones I have to use. My Accounting Suite uses Postgres. So great - there seems to be no way to make an invoice with Word or Excel from one single database entry. With OOo, I write my Interoperation features by myself, in any language I am willing to, using any input format I want to.
And try to trigger MSOs interoperation features with a cron job (The first day of any month, print the Finanzamts [german IRS] paperwork).
That are the reasons I like my Linux, and that are the reasons I like open file formats.
Yes, open formats are required. (Score:3, Interesting)
This was a big reason they did NOT adopt open office, because in their corporate world (that is the opposite of real life) Microsoft Office was the guarantee that their documents would be accessible in 10 years, or more. I disagreed and did some arguing with them for the importance of open formats, but in the end they choosed Microsoft Office. Because; In the corporate world, Microsoft is king.
I believe they made the wrong choice and (IMO) the correct way of following FDA regulations, etc, is to use open formats for data/documents/etc. However this has not yet been realized by the industry (or FDA, I believe).
However, when the industry DO realize, all open formats will be at a very nice spot compared to Microsoft Office/closed document formats.
Re:wouldn't that make data recovery harder? (Score:4, Interesting)
Nope Zip files can be recovered either entirely or in part...depending on the dammage. A minor amount of corruption may not lead to any data loss -- something that isn't true if the original uncompressed data is dammaged by the same amount.
Since the contents of the zip are text files, at worst they could be edited by hand to correct them. I can't think of a more stable document format that doesn't involve having multiple copies of the document.
Re:[OT] devolution of MS Office (Score:5, Interesting)
"WtF?!" you might ask :) A collegue tried switching to OpenOffice. We got into swapping a PowerPoint document back and forth, and at some point I started getting .ppt files that PowerPoint97 could not open, claiming that the file had been created by a future version of PowerPoint. So something is broken in OpenOffice's "export to PowerPoint" that is emitting files that PowerPoint97 cannot read.
Oh, the irony. Forced to upgrade to Office 2003 because someone in my organization tried OpenOffice :(
Crispin
Re:[OT] devolution of MS Office (Score:3, Interesting)
If there is already a Macro language that works in a very similar way it would not take much effort to fill in the gaps and change the syntax so it's VBA compatible.
Re:Open formats are good (Score:4, Interesting)
Many people are telling me that OpenOffice could be faster and less demanding on memory, and these are areas where our own products shine. Have you never wanted OpenOffice to start a little quicker?
My personal feeling is that even open source products are not beyond the realm of criticism in areas where they fall down. Mind you, I am seeing that our little PlanMaker/OpenOffice comparison page [softmaker.de] is causing the OOo developers to improve their product. So, even if you never use TextMaker or PlanMaker, you profit from our little row.
Apart from that, I am still convinced that open document formats are the way to go if we all (united and apart) want to break Microsoft's monopoly.
Re:Who cares if its XML? (Score:3, Interesting)
Just look at the XML parts of Winamp 5. Colors are specified on a scale running from about +4,000 to -4,000 for each shade, instead of say 24 bit RGB, and including other required settings. Various parts of the skin may get variable names like "Glass Highlight", "Glass Substrate Highlight", "Glass Shadow Highlight", "Glassy Text Area", "Glassy Text Substrate Area" and "Glassy Text Shadow Substrate Highlight Area", all in the same skin, or buttons defined only as the "Hard Button Group" and the "Soft Button Group", with no method except hack in some value and run the program, to figure out which is which. Some skins with 80 colors themes or so include a 150 Kb+ XML file.
These examples come from skins with good, professional graphics, and even well written code in other areas. i.e. some people who are actually coding whole new functions into the Maki code still don't hesitate to write XML like this to accompany it. At this rate, we could use an obfuscated XML contest.
I'll start (Score:3, Interesting)
Don't lose graphics in imported Word documents.
When you export Word documents, they need to have file sizes that are similar to what they would have if you saved them with Word. I can't email someone back a document that has had a huge increase in file size. Word is bad enough with file sizes, but OO.o is much, much worse.
Don't crash so much. That's just annoying.
A grammar checker would be nice. Word and Wordperfect have had this for over a decade.
Faster load times would be great. Word loads in about one second on my computer; there is no excuse for OO.o taking more than ten seconds.
This is just a minor nit, but still... I use a text editor to edit text documents. OO.o shouldn't claim that its formatted word processor document is a text document.
The dialog box that asks if you are sure you want to export to a non-native file format because you might lose information should tell you what information you might lose. When I import a document, add a few sentences, then save it, I should not be seeing this nonsensical warning. In fairness, Word has this problem as well for some older formats, although not for Word 97 or later formats.
My most annoying point to me(since this one means I can't even use OO.o for documents that I distribute in pdf form only): support for using custom styles for section numbering.
Fix the last one of those and I will use OO.o again. Fix most of them and I will give it another try for regular use. Right now, though, OO.o is as useful to me as Wordperfect for the Atari ST is.
Re:XML Formats rock! (Score:3, Interesting)
Re:Stability (Score:2, Interesting)
Possibly, but they're the best damned brain dead monkies that money couldn't buy!
Besides, I doubt that OpenOffice is inherently unstable. I started using it exclusively now, and apart from minor irritations (such as spacing inconsistencies when converting to/from MSOffice), I've never run into any serious issues. I've used it for some very large projects (such as essays that I will leave running in the taskbar for days at a time while I "research"), and I've also used it to take notes (daily).
If I did have any issues with OpenOffice, they woul be with the automatic PDF generation. It's a wonderful tool, and every office app should have it, BUT... Under windows, I use a different program to make PDFs (PDF995 [pdf995.com] - a free virual "printer" that makes PDFs), and I find it outputs much higher quality PDF's that are SMALLER in size. (For example, when I'm making a Resume, it goes to 30k (pdf995), from 60k (oo.org pdf)) Not that big of a deal, but when emailing resumes, it makes a difference.
However, since this only works in windows, and it's not "open source" (AFAIK), it's not a solution for everyone.
Re:Stability (Score:1, Interesting)
And believe it or not, saving to
OO is still in it's infancy, having been around for a very short period of time. The Word
Story from the front lines (Score:4, Interesting)
Source: a poorly rendered GIF.
Equipment: one Linux machine, with OpenOffice.org installed.
I found the matching font, got the dots lined up, converted it to a traced object, found the right "burnt sienna" color... but that pukey-green was nowhere in any color selector I could find.
After hunting for nearly a half hour, for an edit box that would let me enter an arbitrary hex triplet, I just saved the file and quit OOo. Then I unzipped the document, opened the style sheet in NEdit, and changed the hex triplets by hand. Save, exit, re-zip, and open it in OOo to see if the changes were correct. Voila!
I never, never ever would have been able to do that in a Microsoft product. I will grant that Microsoft may have made the hex triplet entry somewhat more obvious, but that doesn't mean I would have been able to find it any more easily. They absolutely control how the user accesses the document. OOo lets you access it any way you want.
Re:Righto Mate (Score:2, Interesting)
Several of us have maintained for quite some time [that] Microsoft wanted to patent their XML format so something such as OpenOffice can't write to MS Office. You can see the format, read the format, but not write the format. Frustrate the geeks. (don't think this happened by mistake in Redmond)
It also forces the business-end of decisions when it comes to migrating away from MS Office.
Noone is going to move from one app to another overnight in a large environment, no matter how good or inexpensive the proposition. This means a one-way bridge...everyone who moves across can't come back just as their material can't come back.
Why is Microsoft so touchy about MS Office? It represents [at least] 1/3 of their profit (not revenue, profit). They have to protect their cash cow someway until they can supplement it with another pass-the-hat release of Windows.
Pass-the-hat as in what they did with 98SE, ME as intermediate releases of 98 before XP was done cooking. Add a couple of changes, pass the hat around, those who buy anything new will pay and the revenue stream increases a little bit. This is why the mags are jumping on XP-to-Longhorn intermediate releases - more income until Longhorn is done.
Office 2003 XML (Score:1, Interesting)
<w:start w:val="1"/><w:nfc w:val="4"/><w:lvlText w:val="%2."/><w:lvlJc w:val="left"/><w:pPr><w:tabs><w:tab w:val="list" w:pos="1800"/></w:tabs>
The hell. It would take me days to decode what the tags mean! Here is a snipit from the same document (not same part of the document) in OOo XML:
<text:span text:style-name="T1">- ANOVA model: For all subjects with a given level, say j, of the explanatory variable, the mean</text:span></text:p><text:p text:style-name="P7">outcome is j and the distribution of outcomes is Normal. The errors (deviations of actual</text:p><text:p text:style-name="P6"><text:span text:style-name="T1">values from predicted value) are independent and the spread (sig-squared) is the same for every j.</text:span>
Much easier to decode
Re:Why bother with WYSIWYG? (Score:3, Interesting)
I just looked at the OpenOffice file format specification. The page from which it is downloaded states:
Presumably brought to us by the department of redundancy department. The specification itself is a PDF that was obviously created in OpenOffice. It is 571 pages long, and yet doesn't include a PDF table of contents, making it very hard to navigate (these are created automatically from any LaTeX document including the hyperref package). It contains things that look like hyperlinks. These probably aren't meant to be - they are XML namespaces - but OpenOffice has converted them to hyperlinks (and made them blue to highlight this) and then completely failed to make them clickable in the PDF. This is completely inconsistent. Either they are links, in which case clicking on them should do something, or they are not links, in which case they should not be randomly made blue and underlined.[1] A word processor does the same thing to words that a food processor does to food.
For those who haven't learnt the lesson of history (Score:2, Interesting)
A long time ago the bible was only available in Latin. Very, very few lay people could understand Latin and hence most had to use the services of a priest to read, and interpret, said tome for them. In other words a nice little earner for the priests who carved themselves a niche as "official middlemen to Grud" and who resisted all attempts to break up this monopoly. (Hmmm... methinks they were more like the *AA of their day)
Anyway I think it's simple. Proprietary data formats return us to the spirit of these times.
Lets face it, the only use for a computer is as a tool (admittedly a tremendously versatile and powerful tool). To all intents and purposes the only thing that's really important are the results of using that tool. i.e your data.
Saving your data in a non open format is like putting your work at the mercy of a "digital priest". It's simply stupid. And on this note then having had numerous run ins with data in crappy undocumented formats over the years I have also learnt the lesson of the Unix masters first hand. i.e. Wherever possible use plain text (ASCII or EBCIDIC)
Personally I will no longer use a tool that doesn't produce data in an open format. The tool itself can be licensed however the writers choose (I'm quite happy to pay for good tools) but if MY data isn't stored in an open format then, unless there really is no alternative and I simply must get the job done, I won't use the tool.
People who don't understand this argument leave themselves open to extortion and, quite simply, deserve everything they get.
Furthermore if data's held in an open format everyone can compete on a level playing field to produce the best tool to manipulate it.
So, to get back on topic, not only is OpenOffice.org a very capable office suite but the data's held in a published open format and the authors are commited to keeping it that way. It's got my vote. It's on my desktop. It's staying there.