Vendor Neutral File Formats? 83
timmyv asks: "I have recently been tasked with developing a corporate wide policy that will standardize all employee created documents on vendor neutral file formats. OASIS is good in theory, but I haven't been able to locate enough concrete examples of policies or implementation schemes that work at a corporate level. Does anyone work at a company where documents can only be saved as RTF, HTML, etc. or have any experience with this type of problem?"
RTF (Score:2, Informative)
I work in a nice company (Score:3, Insightful)
I've already tried to encourage the adoption of hassle-free formats (rtf, html, TXT, whatever).. they don't pass.
It seems that people simply can't get it.
Unfortunately.
OpenOffice (Score:3, Informative)
Re:OpenOffice (Score:2, Interesting)
what are you talking about? (Score:1, Informative)
It is restricted by patents, see..
http://news.com.com/Microsoft+seeks+XML-re
Re:OpenOffice (Score:4, Insightful)
Re:OpenOffice (Score:1, Informative)
So... the format is pretty much vendor neutral.
Cheers,
Daniel Carrera.
OpenOffice.org volunteer.
OASIS Open Document vendor independent (Score:2, Interesting)
Like HTML, which surprised people in the 1990's, the OASIS OpenOffice.org file format is indeed vendor independent, though, it is now called Open Document [oasis-open.org]. Anyone can use it or develop tools for it without restriction. Even Microsoft is part of the team at OASIS, at least on paper [zdnet.com.au]. And, even if MS doesn't get out of the way, interesting things will happen with Open Doument.
So far OASIS Open Document being used by at least the following:
Wrong question for the task. (Score:3, Interesting)
Sorry, but looking at that statement, it seems to me that you are asking the wrong questions. Rather than getting concerned about formats and standards organizations, you should realize that to replace certain formats you will need to improve on open source projects without funding for the development of them. If they say "no" to this, then congratulations, you don't actually have to do this research. Nothing's quite as useless as an unfunded mandate.
Sadly, I'm not sure if this post is meant to be funny.
Nonsense. (Score:2)
For all applications there are formats that are industry standards and unencumbered by patents (as far as it is possible to ensure this in certain litigious countries).
The knee jerk reaction "boooh! Open Source software is not ready" should be only used when actually Open Source is a necessary part of a requested solution.
Not sure what the question is limited to (Score:5, Insightful)
There could be a huge number of different files you need. CAD files, images, Powerpoint presentations, complex spreadsheets will all mess up any format you can come up with (eg HTML). How would you even edit some of these things?
Even OpenOffice formats are not vendor neutral, you have only one product out there that really uses it.
Re:Not sure what the question is limited to (Score:3, Informative)
also don't forget that it may be made an ISO standard [slashdot.org].
Re:Not sure what the question is limited to (Score:2)
Re:OOo file format is open though (Score:2)
Re:OOo file format is open though (Score:3, Insightful)
After that, you can your favorite XML widget, such as the XML::Parser [cpan.org] Perl module, to turn it into HTML or other things of your choosing.
Or create an XSLT file and use something like Xalan [apache.org] to
format it on the fly.
Gotta love OOo and those open formats!
What's the true question? (Score:2)
There could be a huge number of different files you need. CAD files, images, ...
Before starting, try to determine what the true question is. Were you asked to choose something that is truly vendor neutral, or were you asked to choose corporate standards that will interoperate with your customers and suppliers? The first question is *very* difficult to answer; the second one is easily solved (albeit in a non-Slashdot friendly manner).
I will assume the latter question is the true question, and continue
PDF (Score:3, Interesting)
but with PDF Printers (files are printed to pdf's) for Linux [sourceforge.net] and Windows [sourceforge.net] (I asume Mac has it built in), it's a good option for creating documents that'll be displayed everywhere in the same manner.
Re:PDF (Score:3, Informative)
The only issue with PDF is the tendancy to be one-way. But there are programs out there designed to convert PDF documents to other formats.
Re:PDF (Score:3, Informative)
There's also -
pdf2txt@adobe.com
pdf2html@adobe.com
Re:PDF (Score:2, Informative)
My company is standardized (at least for production work) on PDF format, which everything can make. The problem is getting things back out or editing such documents...
It seems that the only truly accurate interpreter is Adobe's Acrobat Software, but it 'just works' for the final output. Converting it to anything else useable doesn't seem to work vey well or reliable.
Editing these things is a bit of a pain, but it can be done, and we do for a chunk of the production
Re:PDF (Score:2)
Hmmm. (Score:1)
Re:Hmmm. (Score:3, Insightful)
Re:Hmmm. (Score:2)
Make it muave - I hear it's faster.
Re:Hmmm. (Score:2)
Re:Hmmm. (Score:1)
Re:Hmmm. (Score:4, Insightful)
Re:Hmmm. (Score:2)
I work at a company that regulary consumes vendor data - We're plagued by a certain unnamed corporate enties lack of technical knowledge and insistance upon using XML. I don't understand what it is about that format that draws additional users, but it drives me fsckin nuts.
Re:Hmmm. (Score:2)
But I do agree that there's too much hype around XML.
postscript/PDF and XML? (Score:4, Interesting)
XML is a good start, because it's easy for a new app (the fictional YCircuit) to add support for the format, but you are still stuck unable to print it if you don't have the skills to write a conversion script and no one else has written it for you.
Why not combine the two? XML embedded in a standard PDF file would allow any application with support for the creator's XML tagset to import the file, and at the very least those without any similar application could view and print the file.
SVG instead (Score:1)
XML embedded in a standard PDF file would allow any application with support for the creator's XML tagset to import the file, and at the very least those without any similar application could view and print the file.
For a more pure XML solution, it'd be better to embed domain-specific XML data in an SVG document, which Adobe's SVG viewer [adobe.com] can display and print. In fact, it might even be possible to XSLT the XML into SVG.
PDF and the Things That Turn Into It (Score:3, Interesting)
The point is that it doesn't matter which method I used to create the document; I can convert any of them into either of the other formats without losing information, and any of the three can be turned into HTML or PDF for display purposes.
You've probably got several different types of documents to mess with. Technical papers with plots, accounting spreadsheets, secretary generated memos, and presentations with pretty pictures so that management can understand what's going on. LaTeX alone could handle all of these situations. Create document types and environments to match the needs of each type of document. XML, being completely generic, could also handle any of the situations, but it's easier to type LaTeX markup than it is XML. There is at least one caveat: you have to be careful what type of images you feed TeX.
Heck, you could use Perl bindings to MS-Excel to snag data out of spreadsheets and export it into a format that some other chart making tool uses. You could use Excel itself to export as CSV files, which you could then use awk to convert into some other format.
Basically, it doesn't matter what tool each person uses, as long as what they export off their own workstation is in a standard format.
Re:PDF and the Things That Turn Into It (Score:3, Insightful)
Re:PDF and the Things That Turn Into It (Score:2)
And this isn't a mystery?
Re:PDF and the Things That Turn Into It (Score:1, Insightful)
No. It's a matter of researching documentation.
Re:PDF and the Things That Turn Into It (Score:2, Interesting)
Re:PDF and the Things That Turn Into It (Score:2, Informative)
I don't know of converters that turn XML,SGML->HTML, but they probably exist.
The tool to convert from domain-specific XML to XHTML is called XSLT. For more info, Ask Google [google.com].
Re:PDF and the Things That Turn Into It (Score:2)
Vendor neutral is not always the answer.... (Score:4, Insightful)
Comfort level:
It's like having designers switch from Photoshop to The GIMP, or MS Word to OO Writer. Granted, the apps accomplish the same thing, but it's not the *same* program. People will resist the change because they know how to use the first program, and the reason for the change isn't a concern for them.
Dominance:
Going vendor neutral when the major still use vendor specific requires you to see if your users use vendor specific features that are not available in the neutral. If those features aren't there, then what do you do? Write code to compenstate for the feature, or get plugins, or do nothing if there's nothing you can do. Are there tools that can do as good a job as the old tools, to work in this neutral envirnoment?
It would help more if you stated your case in more detail.
Re:Vendor neutral is not always the answer.... (Score:1)
the gimp isn't even close to paint shop pro's level yet
In what way, specifically?
Re:Vendor neutral is not always the answer.... (Score:1)
It is not possible to move from Photoshop to Gimp in many, incredibly common, situations. Assuming one would even want to.
Vendor Neutral? (Score:2)
Unless you have pretty carefully surveyed all of those people you really can't choose one file format over another.
In other words, you're asking the wrong question. Instead of trying to figure out what your employees can standardize on, you will first need to find out what what the majority of your vendors have standardized on.
Of course you'll have problems. HTML or PDF are hor
Right motivation, wrong question... (Score:3, Insightful)
What matters is that the data you own is readly transformable into a Fully Open and documented format independant of your chosen platform, normally (but not necessarily) this will mean your native format is Fully Open and documented. This includes all data, styling, formatting, metadata and interrelationships. Bascially you should be able to quickly jump ship, even if your vendor has been wiped of the earth or there are legal/technical issues preventing you from running the original platform, without loss or 'damage' of any information. There must be at least one other clear route to all your information, completely bypassing the original platform.
As an example
Similarly prior to it's released as open source software and even immediately after
There are grey areas such as databases, which have no common datafile format but do expose Fully Open interfaces such as ODBC or JDBC.
With this in mind I would argue that forcing everyone to save documents in 'basic' formats such as HTML and RTF is counterproductive, they lack wide support for features such styling and precise page layout. Any format will do as long as you can readily, fully & demonstratably extract all your information, independantly of the platform that created it.
Alex
"Vendor Neutral"???!!! (Score:5, Insightful)
HTML is only vendor neutral if you don't use any vendor-specific extensions. So you can't just say, "Everybody save your files as HTML". You also have to forbid anybody using apps (such as Word) that save to a non-standard HTML.
In theory, you can create an XML-based format that looks the same in Word, OpenOffice, FrameMaker, and any other XML-aware app. But doing so means designing a schema in extreme nit-picking detail, and writing a lot of transformations to get that XML in and out of all the apps that need to read or write it. It's a lot of work, and nobody does it unless they have a specific application that requires highly-structured information. Like if you have a huge set of technical documentation that you need to update a lot. (I was involved in just such a project -- and the politics of converting all those documents to XML cost me my job.) Or if you have invoices or similar business documents that need to go into or out of a web services app.
But for the big mass of unstructured documents, there just isn't a vendor-neutral solution, and nobody has any real incentive to create one. The solution remains the same: standardize on certain specific applications. Which boils down to using OpenOffice if you hate giving money to Bill and/or want a platform-neutral solution. Otherwise you standardize on Microsoft Office, because it's what everybody knows how to use.
Re:"Vendor Neutral"???!!! (Score:2, Informative)
That it is not.
RTF does contain, in theory, sufficient control words to describe everything that Word 2000 can do, but it's hardly a direct translation and things get lost a lot. Furthermore, RTF contains a few control words that Microsoft didn't put there: such as \collapsed (added by NeXT to describe paragraphs that had been hidden by the user).
Re:"Vendor Neutral"???!!! (Score:2)
And although it's easier to find documentation for RTF than for Word native, the latter does exists. You
NEXTSTEP is now Mac OS X (Score:1)
I mean, if an unsuccessful platform is your best example of non-Microsoft development of RTF-based software
Unsuccessful my ass [apple.com]; learn why [slashdot.org].
Re:NEXTSTEP is now Mac OS X (Score:2)
Because OpenStep is now Cocoa (Score:1)
How many NextStep applications have migrated to OS X?
Depends on whether the developer is still around. Mac OS X implements the Mac OS Toolbox API as "Carbon" and the OpenStep API as "Cocoa". If the developer still has the source code and wants to reach thousands of Mac users, porting starts with a recompile. But if your developer has gone out of business, on the other hand...
Re:NEXTSTEP is now Mac OS X (Score:2)
RTF (Score:2)
> RTF does contain, in theory, sufficient control words to describe
> everything that Word 2000 can do, but it's hardly a direct translation and
> things get lost a lot.
What gets lost?
Examples please.
Easy (Score:1, Funny)
That is what i do... (Score:1)
You're asking the wrong questions (Score:5, Insightful)
Are they doing this to save money? to clamp down on the uppity workers? because the CEO got emailed an AppleWorks attachment with no file extension from some Mac user? to avoid the risks of single vendor lock-in?
Many documents formats can be converted back-and-forth with some degree of effectiveness. Yes, if you open a document from WordPerfect in Microsoft Office, the word spacing may change a little. However, this happens if you move from a machine connected with a HP4000 printer to a HP2100 printer as well. However, some formats give different feature capabilities; saving from DOC to RTF will cause (as an example) tables to shift about a bit. TXT format is readable by most anything, but the formatting capabilites are nigh nonexistant. (Ooh! Tabs!) While WordPerfect and Word will each open the others documents, they aren't so good for saving in open formats
What formats are currently used? Why are they needed? Will everyone need to be able to write to them, or are pay-writer/free-reader combos acceptable? And, *ARE* there any "vendor neutral" formats out there? (For desktop publishing, the real answer is "no". Publisher is a joke, and while Adobe and Quark maintain some import compatibilties, the formats AREN'T neutral.)
For myself, working in a small department, "Let a thousand flowers bloom" is just fine. I accept that I will occaisionally get forwarded an e-mail with an attachement that the user can't figure out how to open-- usually Mac/PC file extension name issues solved easily by renaming. Once in a blue moon I have to explain to someone that no, not everyone has FooBarBaz market research organizer, since for most the $800 license cost for it would be more beneficially used for other things, and they will probably need to examine such data files once in their career, if that.
Perhaps a list of universally accepted formats-- that is, formats that must be used for wide distribution-- would be more appropriate, after considering what features are needed in said formats. After all, Photoshop .PSD documents are harder to view outside Photoshop, but far more useful for subtle graphics work than JPEGs.
I suspect you are being sent out on a project inadequately considered. Depending on the pointy-hairyness of the person who assigned it to you, you may find some substantial benefit to reconsidering the ground assumptions.
Re:You're asking the wrong questions (Score:1)
I know it's illegal, but there was a torrent for the latest FooBarBaz on SuprNova just before it got shot down... you may be able to still find it out there.
Re:You're asking the wrong questions (Score:2, Informative)
The types of files we are talking about are essentially textual documents, spreadsheets, databases, etc. 2 of the 3 OOo provides, but I have a pretty good idea of how our user base would resp
Re:You're asking the wrong questions (Score:3, Informative)
For free access to documents by citizens, PDF is pretty good. There are viewers for most platforms (I don't know about BSD or Solaris, but Mac/PC/Linux all are OK); and there are non-Acrobat print-to-PDF knockoffs at economical prices. Requiring PDF publication of all publicly available printed documents in, say, PDFv1.2, PDFv1.3 or PDFv1.4 would be a useful and not overly onerous step. (Adding forms-completion ability to the PDF requirement might well be too much.
Re:You're asking the wrong questions (Score:2)
I realize a lot of people do not like PDF; but any other format is asking for grief from end-users.
A company I currently do a lot of work for is slowly migrating towards PDF, each step a long the way has been pretty smooth. It's easy enough for the users to understand they 'print to PDF' to make a presentation version of a document.
I don't believe intermediate documents (works in process) should be stored in open formats. Not enough open formats support enough features, you would simply end up with a half
Permanence (Score:2)
I guess how permanent is permanent? Its very hard to store data electronically long term and have it be accessible years later. How many computer techs today could even deal with a 9 track data tape (a state of the art archival format 20 years ago)? While PCs can handle Bus and Tag data streams the adapter card is $3k per. No one 30 years ago would have conceived of having individual users not connected in any meaningful way to operations center.
I've done a lot of work tak
Re:You're asking the wrong questions (Score:2)
LaTeX (Score:2, Insightful)
Re:LaTeX (Score:2)
if you can't do that, it's not worth his time.
Re:LaTeX (Score:1)
Re:LaTeX (Score:2)
Got a link for "possible"?
Infer what? (Score:1)
True, but given an RTF using visual formatting, how can a program know in advance which font size was meant to be "heading level 1", which was meant to be "heading level 2", whether italics represent emphasis or the title of a work, etc?
Re:Infer what? (Score:2)
Number one: the office tells them. I.e., "use everything that's size 14 as Heading 1, use italics as italics, etc."
Number two: write a program to figure it out. This could be done in Office VB to apply and redefine headings for any given document.
What is your point? (Score:2)
In such a situation why would they need to do such conversions?
Re:What is your point? (Score:3, Interesting)
This may result in dropping MS Office entirely -- or it may just result in changing the default "save as" settings for every install of Word, or the creation of an "archive and share" custom function that takes DOCs or WPSs or whatever and turns them into the new neutral format.
Bad Assignment (Score:3, Insightful)
I always try and use portable files (Score:4, Informative)
Well, for CAD, its a screwed up world. The best/most portable format is probably IGES, except its such a huge specification that nobody's IGES file is compatible with anybody else's. I'm an engineer and for myself I use Turbocad 10 professional at home. It reads/writes AutoCAD files and numerous other formats, and is somewhere in between AutoCAD and Pro/Engineer in terms of its capabilities. You'll have a tough time convincing any corporation to use TurboCAD though.
For text documents, HTML would be good, except MS products tend to produce the most screwed up HTML files I've ever seen. All I can recommend is to use PDF files for important and official documents because they are essentially immutable and tend to produce consistent hardcopies from any computer.
OpenOffice formats are nice, and if I were starting up a new business I would of course set up Linux workstations to use OO exclusively, and put a Windows machine down in the IT room so the IT staff could convert any troublesome documents that come through the email.
For Visio, there is no equivalent, other than exporting the visio file as a DXF or maybe a WMF. Windows MetaFiles never seem to load right in other apps though so thats something to think about. SVG files will probably be the future here if Dia starts using them.
The Shot Heard Round the World (Score:2, Informative)
There are no "StarOfficeisms" in the OASIS XML Open Document file format specification. Least ways not any we know of. By December of 2004, when the OASIS TC submitted the XML file format specification to ISO, all known references and anachronisms that might be called starisms were changed. Neutralizing changes were even made to such things as the file format extensions and mime type registrations. We even changed the name from OASIS Open Office to OASIS/ISO Open Document.
Separating the file format fr