Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage

Object Storage and POSIX Should Merge 66

storagedude writes: Object storage's low cost and ease of use have made it all the rage, but a few additional features would make it a worthier competitor to POSIX-based file systems, writes Jeff Layton at Enterprise Storage Forum. Byte-level access, easier application portability and a few commands like open, close, read, write and lseek could make object storage a force to be reckoned with.

'Having an object storage system that allows byte-range access is very appealing,' writes Layton. 'It means that rewriting applications to access object storage is now an infinitely easier task. It can also mean that the amount of data touched when reading just a few bytes of a file is greatly reduced (by several orders of magnitude). Conceptually, the idea has great appeal. Because I'm not a file system developer I can't work out the details, but the end result could be something amazing.'
This discussion has been archived. No new comments can be posted.

Object Storage and POSIX Should Merge

Comments Filter:
  • by 14erCleaner ( 745600 ) <FourteenerCleaner@yahoo.com> on Friday August 21, 2015 @06:36PM (#50366775) Homepage Journal
    "If we support POSIX, then we'll support POSIX".
  • The article seems to go into solutions that let you access S3 as a fuse module, but it's failing to consider that you can go the other way. Gluster, Ceph, and probably others let you access data both as a filesystem, and as an object store. It's a little more complex to setup and maintain than what this article seems to be envisioning, but it can offer a lot of flexibility. I suppose it's not as cheap to run these yourself as to use S3 in most cases.

    • by Lennie ( 16154 )

      And a lot of less failure tollerant, because when you start adding all kind of extra features you need lots more and more locking. And more locking makes it much more brittle.

      I think the author should just try using Fuse.

  • by Anonymous Coward on Friday August 21, 2015 @06:42PM (#50366805)

    Because I'm not a file system developer I can't work out the details, but the end result could be something amazing.'

    But you can put the check in my name.

  • by godrik ( 1287354 ) on Friday August 21, 2015 @06:56PM (#50366881)

    I do not understand this at the highest level. How is this an improvement over POSIX? My understanding is that object storage is essentially a dumbed-down file system where you have to read the entire object (file) at once. Or have to write the object (file) at once. Why does it improve anything? Is it just because the "address" can be a url? Just write that as a specific file system so that you can read/write to /dev/url/http/slashdot.org/ and be done with it.

    What am I missing?

    • by radish ( 98371 ) on Friday August 21, 2015 @07:04PM (#50366927) Homepage

      It allows the implementation to make a lot of assumptions & simplifications, which in turn makes things like S3 possible. There's no way you could practically offer POSIX style FS access in a cloud-like environment.

      • by PPH ( 736903 )

        no way you could practically offer POSIX style FS access in a cloud-like environment.

        NFS? [wikipedia.org]

      • by Salamander ( 33735 ) <`jeff' `at' `pl.atyp.us'> on Friday August 21, 2015 @08:55PM (#50367413) Homepage Journal

        That is very untrue. I'm on the GlusterFS team, and we've had users providing "POSIX style FS access in a cloud-like environment" for years. Amazon recently started doing the same with EFS, and there are others. It's sure not easy, I wouldn't say any of the alternatives have all of the isolation or ease of use that they should, but it's certainly possible.

        • by radish ( 98371 )

          I never said it was impossible :) I'm glad EFS is coming along, it's certainly a welcome addition. But, the fact it's still in preview - many years after S3 become commonplace, and it costs more than 10x as much as S3 - tells me it's not easy.

          What I was trying to get across was that the reason for the popularity of the object storage model is that it benefits the storage provider - not the client.

          I apologize if I misrepresented your efforts!

      • HTTP has range-headers for both PUT and GET requests so if only services like S3 would use those standard headers POSIX style FS access would be more than possible.
    • by Dutch Gun ( 899105 ) on Friday August 21, 2015 @09:07PM (#50367469)

      Think for a moment about how much network overhead it would require to:

      a) Open a specific file
      b) Seek to a specific location
      c) Read or write data down to the byte level an arbitrary number of times
      d) Close the file.

      Each one of those operations needs back and forth communication across the internet, with error-checking and encryption overhead. Now, remember that these operations need to by synchronized across multiple machines, probably in multiple data centers across the world as well.

      Compare that to atomic per-object operations, and how much more straightforward that is for network-intensive operations. In the end, it's probably much more efficient to simply send an entirely new file than trying to change a single byte inside a file.

      Besides, if you really need byte-level access to remote storage... we have that already. It's called a database.

      • Yeah, but that's all latency, none of it is throughput. Maybe I buy your argument if you're talking about changing a few bytes in a file of size 4K or so, but if your file is megabytes or gigabytes in size (like a bigass complex-valued double precision matrix), then I don't think you necessarily want to shuffle all of it across the wire and back.
        • I think my overall point is that I'd bet that cloud-object storage is designed around the premise of per-object atomic transactions at a very fundamental level. If you look at some of the most popular applications, like backup solutions, off-site storage, or data synchronization, this model makes a lot of sense. Essentially, the per-object model is prioritizing file transfer efficiency and design simplicity over flexibility. A POSIX interface would be making the opposite tradeoff. As such, I think it's

      • by godrik ( 1287354 )

        So what is the difference with this new fancy API.
        You will GET the entire file.
        Then you do whatever locally.
        And then PUT it back.

        How is this ANY different? There is no specification in POSIX that prevent you from doing that in the background while still exposing a POSIX API.

        Now if the argument is that, if you do something like "grep something s3://foo/bar; grep something_else s3://foo/bar" and that is inefficient because it requires the transfer of the file twice. First you could cache a hash to avoid redo

      • by sjames ( 1099 )

        Fundamentally, all filesystems are object based, it's just that the object is called a 'block'

      • by spitzak ( 4019 )

        The intention is to have the database update when the close() is done, not on every write().

        It is pretty obvious that the desired functionality could be done by fuse, where a get() is done on open and a put() is done on close if write was ever called.

        I think the modern day applications that only write a part of a file are nearly non-existent (and in fact partial update where another program can see your unfinished writing, is usually a bug, not a feature). So there is no need for any api other than put().

        Th

    • "I do not understand this at the highest level. How is this an improvement over POSIX? "

      I'll go one further. What does this have to do with POSIX [wikipedia.org]? One could add an Object Storage API to POSIX, but that wouldn't be "merging", and POSIX says nothing about the underlying filesystem/storage implementation. Why? Because, agfain, POSIX is an API and the purpose of an API is to provide a layer of abstraction that hides tyhe underlying details of the implementation.

    • by Salamander ( 33735 ) <`jeff' `at' `pl.atyp.us'> on Friday August 21, 2015 @09:26PM (#50367571) Homepage Journal

      It's because they throw out a lot of POSIX features/requirements - e.g. nested directories, rename, links, durability/consistency guarantees. In other areas, such as permissions, they have their own POSIX-incompatible alternatives. These shortcuts do make implementation easier, allowing a stronger focus on pure scalability. The theory is that the combined complexity of POSIX semantics and dealing with high scale (including issues such as performance and fault handling) is just too much, and it becomes an either/or situation. As a member of the GlusterFS team, I strongly disagree. My colleagues, including those on the Ceph team, probably do as well. The semantics of object stores like S3 have been designed to make their own developers' lives easier, and to hell with the users.

      Not all POSIX features are necessary. Some are outdated, poorly specified, or truly too cumbersome to live. On the other hand, the object-store feature set is *too* small. I've seen too many users start with an object store, then slowly reimplement much of what's missing themselves. The result is a horde of slow, buggy, incompatible implementations of functionality that should be natively provided by the underlying storage. That's a pretty lousy situation even before we start to talk about being able to share files/objects with any kind of sane semantics. You want to write a file on one machine, send a message to another machine, and be sure they'll read what you just wrote? Yeah, you can do that, but the techniques you'll have to use are the same ones that are already inside your distributed object store. Even if both their implementation and yours are done well, the duplication will be disastrous for both performance and fault handling. It would be *far* better to enhance object stores than to keep making those mistakes . . . or you could just deploy a distributed file system and use the appropriate subset of the functionality that's already built in.

      A semantically-rich object store like Ceph's RADOS can be a wonderful thing, but the dumbed-down kind is a disgrace.

      • by mcrbids ( 148650 )

        +1 If I had mod points you'd get one!

        There have been a *lot* of smart people in the history of Computer Science over a very *long* period of time, and the best of the best of their innovations we now call "classic solutions". Solutions like SQL, POSIX, etc.

        It has become popular to decide that such solutions are "antiquated" in the face of some new "great thing".

        Remember NoSQL? Well, yeah. There actually *is* a very small set of problems for relating data not best served by SQL. But even those cases often co

      • I still don't understand exactly what this Object-storage "fad" is all about.
        Is it a a limited special purpose filesystem that hooks into a webserver instead of the normal VFS or other disk subsystems ?
        Now isn't the Range HTTP header used to effectively "fseek/read" in a file.
      • Couldn't agree more. That is why I am building a new kind of object store from the ground up. See it in action at http://youtu.be/2uUvGMUyFhY [youtu.be]
    • On cloud services, storing all your files as "objects" is much cheaper than renting a filesystem to store them on. The gist of this article is, "if S3 allowed block-level access, it would be as cheap as S3 and as flexible as a filesystem."

      The most powerful sentence in the article is "I can't work out the details." I can't imagine any cloud-services engineer reading this article and thinking, "ooh, I'd never thought of adding block-level access!" I think block-level access is the most-requested feature since

  • If file systems allowed arbitrary attributes per folder/file, then file systems could serve as both CMS's and light-duty CRUD storage. Most intranet CMS content is just lists of documents and links, with a few notes. They could be queried via SQL or an SQL-like language[1], along with the usual file-oriented techniques.

    In addition to the arbitrary attributes, a set of common attributes would be reserved, at least by convention:

    * title (file/folder name)
    * synopsis
    * content (file bytes)
    * type (type of content

    • If file systems allowed arbitrary attributes per folder/file

      mac os was doing this back in 1984

    • You mean like extended attributes, which have been around for decades.

      • by Tablizer ( 95088 )

        They seem implemented with arbitrary and inconsistent limits or usage steps per OS or file-system version. This typically makes any product that uses them married to a platform. It will probably require a "blockbuster" product to make anyone care enough to clean them up.

    • Working on it. And unlike the "storagedude", I do have the expertise to implement it. I have it about half implemented so far. I have created an object store where every object can have lots of attributes or tags attached to them. Unlike extended attributes, you can actually find things based on them quickly. For example, I can create a container and put 100 million of my data objects in it (photos, mp3s, software, documents, etc.) and find anything and/or everything in just a few seconds. If I had 10 milli
      • That all works fine and dandy if you "tag" all your data properly in the first place.
        Now you have two data stores to manage: the actual filesystem and the database of metadata.
        • That might be how others have implemented an object store, but not how I am doing it. There are not two separate systems to manage in my case. I don't use a file system to store the unstructured data and store all the metadata in a database and thus have two different systems to try and keep in sync. Instead, I built a new system from the ground up (actually from the block pool or disk partition up). It stores structured data and unstructured data natively in a very unique fashion.
  • by PPH ( 736903 )

    Byte-level access, easier application portability and a few commands like open, close, read, write and lseek could make object storage a force to be reckoned with.

    Got all of that already. Perhaps not well defined by the POSIX standard. But only because certain implimentors whined and cried that they would be cut out of the party if they had to support real O/S standards.

    But this isn't 'object storage' (unless all your objects are bytes). Object storage is an extension of higher level record access that VMS and other (mainframe) systems have had for years (decades). But now combined with object method storage. Starting to sound like RPC (server run) or write once, ru

  • You had an idea. Just implement it. If it is of any value, people will pick it and you will get famous (and perhaps rich if you can leverage on that)
  • by QuietLagoon ( 813062 ) on Friday August 21, 2015 @08:27PM (#50367303)
    There I fixed the title for you.

    .
    Aside from trying to leverage the huge portability of POSIX by using its name, what exactly is the benefit if the merger would occur?

    .
    Why not standardize and implement Object Store across many different operating systems (working code would be required for each OS), and then submit Object Store to be a part of the POSIX standard?

  • I always was under the impression that POSIX has something close to byte access with lseek().

    • by Trepidity ( 597 )

      Yes, I think it's saying that object storage should get byte-range access, not that POSIX should; POSIX, as well as basically any local filesystem API, already does.

      A lot of object-storage systems do already have byte-range access, though, implemented via HTTP range requests. They're not nice seekable streams, but if the specific functionality you want is to retrieve a range of bytes from a file, that's already here.

      • They're not nice seekable streams

        Many network file systems don't have seekable streams at the protocol level; it makes more sense to transmit range requests and keep the "stream position" and the rest of the stream abstraction on the client.

  • AWS offer Object Storage for its scalability. Cloud file services sit on-top of that & only accept "complete" uploads.

    The only happy medium I know of is www.Bitcasa.com [bitcasa.com] which implements POSIX (most of it) atop S3 in the form of a virtual drive. Their Linux client is only for corporate users due to a lack of focus consumer-side, but their Windows & Mac clients offer virtual desktop.

    Ref: I work for Bitcasa

  • by Anonymous Coward

    "POST (dds an object using HTML forms — similar to PUT but using HTML)"

    What does that even mean? Evidently, the author meant "HTTP".

    In regards to "merging" obejct storage and POSIX, that's been done. That's what the Joyent people did with their Manta object storage: you operate on the objects using standard *nix tools. They've recently open-sourced it under a free and GPLv2-compatible license (MPLv2).

  • What really is needed is a hybrid file system that contains some of the aspects of object storage and some of the aspects of a POSIX file system. ... Add byte-level access in some fashion to object storage

    As in byte serving and the range header? https://en.wikipedia.org/wiki/... [wikipedia.org]

    or object storage functions to POSIX storage.

    You could read and write entire files easily in POSIX, last I checked. You know, as in Python "open(filename).read()".

  • by HockeyPuck ( 141947 ) on Saturday August 22, 2015 @07:56AM (#50369479)

    One of the great advantages that allows object storage to be scalable is that it's completely stateless. A single command has no dependency on the previous or next command. There's no modification of existing objects, no "seek then write" commands either. This allows object storage to maintain one of the key tenants of being a cloud storage, it's not to provide high availability of a given instance, but to guarantee that the "retry" or the "allocation of a new resource" always succeeds. For example, VMs can go down at anytime, but there should never be an instance whereby you cannot create a new VM to replace the one that just died. While VMs can die at anytime, the VM service (EC2, Nova) can never go down.

    With this crap like "seek", "open" then "read" that the author proposes you now have commands that are dependent on each other and thus create state. Something we want to avoid.

Programmers do it bit by bit.

Working...