Ask Slashdot: Is It Time To Replace File Systems? (substack.com) 209
DidgetMaster writes: Hard drive costs now hover around $20 per terabyte (TB). Drives bigger than 20TB are now available. Fast SSDs are more expensive, but the average user can now afford these in TB capacities as well. Yet, we are still using antiquated file systems that were designed decades ago when the biggest drives were much less than a single gigabyte (GB). Their oversized file records and slow directory traversal search algorithms make finding files on volumes that can hold more than 100 million files a nightmare. Rather than flexible tagging systems that could make searches quick and easy, they have things like "extended attributes" that are painfully slow to search on. Indexing services can be built on top of them, but these are not an integral part of the file system so they can be bypassed and become out of sync with the file system itself.
It is time to replace file systems with something better. A local object store that can effectively manage hundreds of millions of files and find things in seconds based on file type and/or tags attached is possible. File systems are usually free and come with your operating system, so there seems to be little incentive for someone to build a new system from scratch, but just like we needed the internet to come along and change everything we need a better data storage manager.
See Didgets for an example of what is possible. In a Substack article, Didgets developer Andy Lawrence argues his system solves many of the problems associated with the antiquated file systems still in use today. "With Didgets, each record is only 64 bytes which means a table with 200 million records is less than 13GB total, which is much more manageable," writes Lawrence. Didgets also has "a small field in its metadata record that tells whether the file is a photo or a document or a video or some other type," helping to dramatically speed up searches.
Do you think it's time to replace file systems with an alternative system, such as Didgets? Why or why not?
It is time to replace file systems with something better. A local object store that can effectively manage hundreds of millions of files and find things in seconds based on file type and/or tags attached is possible. File systems are usually free and come with your operating system, so there seems to be little incentive for someone to build a new system from scratch, but just like we needed the internet to come along and change everything we need a better data storage manager.
See Didgets for an example of what is possible. In a Substack article, Didgets developer Andy Lawrence argues his system solves many of the problems associated with the antiquated file systems still in use today. "With Didgets, each record is only 64 bytes which means a table with 200 million records is less than 13GB total, which is much more manageable," writes Lawrence. Didgets also has "a small field in its metadata record that tells whether the file is a photo or a document or a video or some other type," helping to dramatically speed up searches.
Do you think it's time to replace file systems with an alternative system, such as Didgets? Why or why not?
Yes it is (Score:3, Interesting)
Re:Yes it is (Score:4, Interesting)
Didn't Microsoft's WinFS [wikipedia.org] fail?
Re: (Score:2)
Didn't Microsoft's WinFS [wikipedia.org] fail?
The WinFS idea failed several times because MS could not make it work well and they really tried. That is the reason why MS still uses the dog-slow NTFS, which suffers from a really bad VFS layer. Still a modern filesystem and still pretty good overall.
Re:Yes it is (Score:5, Informative)
Personally, I don't think tagging is the solution.
I basically have raccoon like tendencies when it comes to personal organisation. I don't stick all files on my desktop, because for rather obvious reasons that doesn't work for coding, where you need to have makefiles, resources etc placed relatively. But for many other files. ~ is a good home, as is ~/Downloads. But since I suck at personal organisation, I sure as shit ain't going to tag files, in the same way I never tag emails.
Immutability: sure? But some fs can already do that (ext4). Sure root can unset the immutable bit, but root can also access the raw partition, so protecting from root isn't really a thing anyway.
Searching by file type is interesting, but how does it work? What's responsible for setting the type, and what do you do when there is more than one type for a file? Would a search for zip files bring up JARs, for example? Would "P2 1 1 255 1" be a text file or an image, or both? If I have a "blob of data" type of file, then install a new tool which can now recognise and process that data type, how does the FS reflect that?
As for unique idenfitiers/moving files... hard links solve that problem already. Much of the time, when I refer to a file by name, I want whatever the latest thing with that name is, i.e. the symbolic name given by the path (usually a relative path for sanity) is the key, I don't want it locked to the contents at the time I make the connection. e.g. in video editing, I'll import a bunch of assets (cards, overlays etc) from files, and kdenlive will refer to those files by relative path. When I re-generate the assets with slightly different settings (because I'm pedantic AF and my 4k videos must be pixel perfect with preisely the right colours), I want that to be automatically reflected in the editor. If I had to re-import everything each time that would be a huge pain in the neck.
For something like a presentation where you can link to a video rather than import, it could be useful, except I always fully import videos, because while having the pointees disappearing is annoying, without the full import it's an exercise in frustration trying to use the presentation on another machine.
Now, naturally, my workflows have grown up around the tools as they are, so I'm not going to claim that the existing state of the art is the be-all and end-all of filesystems. Apart from having a stock FS which doesn't suck with millions of files in one directory I don't feel hindered (and I don't care enough to use ReiserFS).
Re: (Score:3)
The reason tagging has mostly failed for files (with a few notable exceptions) is because it's extra work for the user. The only time tags stick is when the overhead is very low, e.g. when you rip a CD the software can look up all the metadata in an online database do you don't have to type it in yourself.
Computers are reaching the point where they can tag stuff automatically. Google Photos and image search does it, if you search for "cat" it will find all the photos of a cat because their AI can recognize
Re: (Score:3)
Re: (Score:2)
The problem with that concept is that it's too open ended to get much value out of it You have a fairly narrow set of generally applicable ways to reference a chunk of data. Turns out that a filesystem pretty well covers the utterly common/generic scenarios. If you get more domain specific you can get value out of storage, but trying to cover arbitrarily many domain specific storage aspirations in a single generic design is not going to go well. Besides, the actual demand is pretty low too, there's not
Re:Yes it is (Score:5, Insightful)
Did we learn nothing from the massive, read-only (at least when it didn't matter) single-point-of-failure that is/was Windows Registry??? Must a multi-bit (in a single byte) error cause complete and utter system degradation/destruction?
MUST those who refuse to learn the lessons of history relearn them in screaming agony???
Learning from history (Score:3, Informative)
MUST those who refuse to learn the lessons of history relearn them in screaming agony???
No, some earn the Darwin Award before the pain reaches their brain cells.
Re: (Score:2)
It's the users (Score:2)
MUST those who refuse to learn the lessons of history relearn them in screaming agony???
No, they learn the lessons of history while their users are screaming in agony.
Re: (Score:3)
Re:Yes it is (Score:4, Insightful)
That you compare the windows registry to a fully functional database is frankly terrifying.
He's not exactly wrong that the registry was a bad fucking idea. Most programs in the dos days were self contained and contained their config data in ini files in their directory. They were also super easy to migrate by simply copying the directory. Microsoft wanted to hamstring casual piracy and so we got the shit that was the windows registry. There's a lot of bad design in windows.
Re: (Score:2)
But the design is actually very good and helps address DLL hell as well as configuration management.
[citation needed]. It has always seemed the exact opposite to me.
Re: (Score:3)
That's my point, the registry was widely abused. From Vista onwards the ability of apps to put crap in there and to read/write existing stuff was curtailed.
Copy protection in games only used the registry to the extent of looking for things like virtual CD drive software. The copy protection itself was done by looking at the response times from optical drives to see if they were real or emulated, and by having bad sectors that couldn't be copied on the disc.
Most are hierarchical databases (Score:3)
Typical filesystems are the most common case of hierarchical databases. What general class of database architecture do you have in mind?
> the objects can be stored as anything, like... an email message, or an image, or an email attachment
That's true of virtually any filesystem.
> You could reference every part of your information with it
I'm not sure what you mean by that.
> you could tag and manage permissions much more granularly
More granularly than per-file?
You are proposing adding permissions on
Re: (Score:3)
No, why make it so much harder to repair and rescue?
Every time, every damn time, someone comes up with "thing better than last thing", and all that comes of it is more data loss.
Here's what I'd do, and please take this with a grain of salt, as it's off the cuff, and based on 30 years of experience fixing other peoples messes:
1. Store every file as a binary blob like so:
128-bit UUID - Header (true filename, true directory structure, implied permissions, thumbnail, length of file) - File binary data
2. Disk jo
Maybe rolodex's will get popular (Score:2)
Keep it simple... (Score:5, Insightful)
The one who wrote the article has no f****ng idea what a filesystem is.
He tried to make us believe that a "database" is a filesystem alternative... poor human.
Re: Keep it simple... (Score:2)
Thus creating.... (Score:5, Funny)
It could be. Some databases donâ(TM)t even need a filesystem, and can take raw block devices and allocate table space directly on the blocks, like Oracle ASM.
A FILE SYSTEM with markers in the DATABASE as to the location of WHICH BLOCKS are WHICH FILE.
AMAZING.
Re: (Score:2)
100%
Re: (Score:2)
I seem to recall reading that Microsoft was going to do this about 20 years ago.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:3)
AS/400 treats the filesystem as a database
Re: (Score:3)
Yeah, this isn't about file systems as we understand them. This is about the how people interact with file systems. The whole file/folder concept. You know, the single most useful thing the average user should to learn about computer operation. Curiously, he doesn't abandon this, he just adds some useless features.
I RTFA, and I can assure you that Andy "DidgetMaster" Lawrence is a moron and his system is idiotic. Even the name is stupid.
This reminds me of one of those old late night infomercials. You
Re: (Score:3)
The one who wrote the article has no f****ng idea what a filesystem is.
He tried to make us believe that a "database" is a filesystem alternative... poor human.
It however is. Nothing "poor human" about it. And he's not the first to propose it. He's not even the first to have a working example of it. https://en.wikipedia.org/wiki/... [wikipedia.org]. WinFS failed due to performance issues they couldn't resolve but it worked and did address the searching and traversal problem very well. Oracle's DBFS exists too, as does Google's GFS (though the latter focuses more on its distribution capabilities rather than it's database structure).
The only true poor human is the one who believes
Re:On the other hand... Re:Keep it simple... (Score:5, Insightful)
I mean, he's absolutely correct in many ways. The problem with the idea of replacing "crusty old technology" with "shiny new thing" is that we've spent decades building on top of that technology. The classic filesystem is damn near the bedrock of modern computing, whatever flaws it has. So he kind of ignores the staggering amount of reworking or re-engineering one would have to do to replace the "antiquated" file system with a new database with all the bells and whistles we'd like to see from a user's or application's perspective.
The proposal is not a deep dive at all. It's a wishlist stating "look how much more convenient my database is than a traditional filesystem", and for the use cases he lists, he's certainly correct. But the world isn't so kind to new technologies that one can just sweep aside existing technology to make way for the new. I mean, look at IPv6 adoption. That whole process is literally taking decades. That's what happens with foundational technologies. They don't like to move much, and if they do move, it's with slow, painful deliberation.
Technology that's "good enough" tends to have enormous inertia, because no one wants to replace hardware or software just because someone has a shiny new toy. Especially if they have to pay for it.
Re: (Score:2)
To go with your example; IPv6 adoption is nontrivial; but it's on track to, however unevenly, happen sooner or later. Something like the 'semantic web', on the other hand, that tried to bring ontology into scope, does not look nearly so hopeful.
Re: (Score:2)
I'm not convinced than any of the problems he mentions are problems that anyone has ever had or which are actually solved by his system.
Re: (Score:2)
BTW, it's called an inode (Score:5, Informative)
> I can assure you that I definitely know the difference between a database and a file system.
So then you understand that a filesystem *is* a database.
It's the most common example of a hierarchical database.
What you've proposed is adding an index on mime type.
You've also proposed another great idea. You proposes files could be identified by a 64-bit number, rather than a full path. So it's still the same file if it gets moved. You didn't mention another cool thing that allows - one file can have more than one name. You thought that was new because you forgot that's what existing filesystems do (since 1978 anyway). It's called the inode. Check out this cool trick:
echo a >one.txt
ln one.txt two.txt
echo b >one.txt
cat two.txt
Yeah, one.txt and two.txt are two names for the same file.
The actual file ID is the inode. The file ID has been separate from any of the (potentially many) names it might have since 1978.
Hans Reiser (Score:5, Funny)
Re: (Score:2)
You're just begging for a dupe [slashdot.org], aren't you?
Re: (Score:2)
I do remember a blog post he wrote many years ago that hard mocked this kind of idea (and the people who had the idea).
The desktop is enough for storing everything (Score:5, Funny)
My kids store everything on the desktop so why would anything else be needed?
The entire filesystem could just be reduced to a single folder called 'Desktop'.
Re: (Score:2)
My kids store everything on the desktop so why would anything else be needed?
The entire filesystem could just be reduced to a single folder called 'Desktop'.
I knew people who were following this approach back in the Windows 95 days.
If I'd paid attention, I'd bet they were doing it in the Windows 3.1 days too.
Re: (Score:2)
If you were paying attention, you'd know that you couldn't store files on the desktop in Windows 3.1.
Re: (Score:3)
Exactly.
It doesn't matter how large my hard drive is. I just need to upgrade my monitor periodically.
What a maroon! (Score:5, Informative)
This guy reinvented the database, but is calling it a filesystem. They semantics and performance characteristics are totally different between the two. Serious users and developers will recognize this as, not a bad code smell, but a bad architecture smell.
I can't believe we have a dozen or so comments already and nobody else has commented on this yet.
Re: (Score:2)
Yep. You are absolutely on point.
Re: (Score:2)
Oh, come on, mods... I was going for +5 Funny and yet you saddled me with +5 Informative. Slashdot is really slipping these days!
Re: (Score:3)
Not to worry, somebody down-modded you!
Re: (Score:2)
Are we still using ancient filesystems? (Score:3)
Re: (Score:2)
And if you use a proper utility on windows, searches are great too. I use "Everything" on my windows computer and it is great for such, but there are a few others too.
What I do not understand is why Microsoft, that clearly cannot do a search file system tool that is even halfway decent has not bought one of those tolls or companies making the tools..
Re: (Score:2)
Can’t recommend Everything enough. Searches are as fast as your typing.
No (Score:3)
Yet, we are still using antiquated file systems that were designed decades ago
Speak for yourself.
Re: (Score:2)
Re: (Score:3)
Antiquated? The wheel is antiquated, but we still use it. Beer was old even by Egyptian times, but we still quaff it.
As for filesystems designed decades ago, I can point to any recent filesystem, be it APFS, btrfs, ZFS, ReFS, even NTFS, that always get maintained with new features added often. For example, ZFS 2.0 has zstd, which is an excellent performer for compression.
Finally, filesystems are the last thing I ever want to replace willy-nilly. Of all the things that has to work perfectly, no matter wh
Re: (Score:2)
I don't quaff anything I didn't find lying on the floor in the dungeons of doom.
Re: (Score:2)
Yet, we are still using antiquated file systems that were designed decades ago
Speak for yourself.
Indeed. This idiot wants to throw out hammers simply because they are "antiquated" because they were "designed decades ago". He is probably unaware that we are still using computer systems that were essentially "designed" decades ago. Maybe we should throw them all out and start over? Surely not.
Poster does not understand... (Score:5, Insightful)
... the computer is not a god device.
File systems are designed around the limitations of the CPU, bus and memory. There are "some" bottlenecks that can be slightly enhanced (aka the move to SSD's) but in general. The poster does not sound like he understands that the reason why things are as they are, are not merely because hard drives, CPU and RAM was slower 20 years ago.
The reality is they had to try to balance performance against what the CPU was capable of processing. When you write or read file to disk and from memory, those are based on the limitations of the system itself.
Windows wanted a deeper file system based on databases, but they found the "data explosion" (aka performance) was awful, the more data you store about data, the longer it takes to parse.
You can all take a look here at WinFS (the cancelled "next gen" file system). So the idea that people haven't wanted to improve file systems before is naive.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Parsing data is not some trivial operation, when you read directories or files from a disk, your CPU is literally reading and parsing data.
There is definitely a case to be made for storage and filesystems itself to become "it's own seperate computer" at some point in the future, but that would increase costs. For the home user who is not a computer enthusiast.
Things like network attached storage and RAID are there to multiply backup/read speeds.
The more information you want about objects, the more CPU and memory that takes up. You can have infinite abstract details about any file. That is why file systems of the future will essentially mean that an entire computer will have to be dedicated to the file system if one is to have anything like serious performance.
File systems are designed around... (Score:2)
WinFS may have failed because such a thing is impossible, or it may have failed because of lack of experience.
How about this: put tags associated with files in a SQLite database, one database per account, off of $HOME in an existing ext4 file system - many of whose features we won't use. Make an API and a file manager that gives a way to create and use tags. Have file formats with headers tha
Re: (Score:2)
the limitations of both hardware and system architects' imaginations, circa the era when the hard drive was invented.
The filesystem as a user facing interface is based around the limitations of the users which have not really 'evolved', the on-disk format backend has evolved greatly since those days and explicitly has been designed around the changes and advances in technology.
WinFS was a non-starter because it's a solution looking for a problem, and it was supremely messy concept to try to use for anyone but a developer, and even then, a developer might struggle. Even advanced concepts in 'old fashioned' filesystems can
Re: (Score:2)
WinFS was a non-starter because it's a solution looking for a problem, and it was supremely messy concept to try to use for anyone but a developer, and even then, a developer might struggle.
It's an absolute nightmare. It dramatically over-complicates just about everything while adding no obvious value. Why someone didn't shut that down in the "napkin" stage, I'll never figure out.
Re: (Score:2)
How about this: put tags associated with files in a SQLite database, one database per account, off of $HOME in an existing ext4 file system
How do you extend that to files that can be read by more than one user, especially in directories that may have different combinations of group read and execute (traverse) permissions on top of different group ownership and ACLs?
Re: (Score:2)
In the system he describes, tags are stored in the DB, not the files. Things will work fine until someone moves or renames a file, then it's chaos.
Re:Poster does not understand... (Score:5, Insightful)
The poster does not sound like he understands that the reason why things are as they are, are not merely because hard drives, CPU and RAM was slower 20 years ago.
Oh, it's worse than that.
20 years ago was 2002. Laptops had 20GB hard disks at the time, desktops sporting 40GB drives were common as well. CPUs were already comfortably in the 1.5GHz range, and 128MB of RAM wasn't unheard of. While 1TB wasn't yet a common capacity of a single drive, SCSI arrays with that much storage could be acquired and still leave half a 42U rack free for the servers accessing it.
Meanwhile, it's not like NTFS got coded back on NT 3.5 and everyone packed up and went home; the file system has undergone plenty of revisions over the years, and while it's still got its problems (enumerating folders with more than a thousand files gets tedious quickly), it's not like there's been no work done on it, even since 2002. While ReFS is starting to make inroads in the more recent releases of Windows, NTFS has had modifications and upgrades that make it entirely possible to leverage it on petabyte volumes without much of a problem. On the flip side, WinFS was promised to do exactly what is being described in TFS, but not only was it unable to make its way out of Longhorn beyond the earliest of beta builds, but it didn't make it into Vista, 7, 8, 10, or 11...meaning that it either was fundamentally flawed or the wrong solution to the problem.
On the Linux/BSD side, however, there's patently no way to substantiate the claim. EXT4 handles massive file systems just fine, btrfs is making inroads and achieving maturity, and OpenZFS has limits on files and folders that well surpass physical constraints. These file systems are used day in and day out in environments very small and very large, so clearly, something is being done correctly already if AWS and Google and Facebook and Oracle have made these file systems work already.
So, I don't know whether the poster is for some absurd reason still running FAT16 partitions, or whether the goal is to optimize something that is only a bottleneck in the most extreme cases, or if this is simply a thought exercise regarding how to reinvent the wheel, as if a whole lot of extremely smart people haven't dedicated a whole lot of thought as to how the existing systems currently work.
Re: (Score:2)
and 128MB of RAM wasn't unheard of
In 2002, 128MB would have been on the low-end. Hell, you could get a graphics card with that much. 256 and 512 were common. A friend of mine spent way too much money to put together a 1GB machine sometime in late 2000 or early 2001.
It was crazy how fast things were moving then. My 486 66 was upgraded from 8 to 32mb of ram, 8mb at a time, between 1994 and 1997. It was finally replaced out of necessity with a K6-2 266 with 64mb of ram in 1998. I don't think that one made it even 3 years before being rep
Re: (Score:2)
There is definitely a case to be made for storage and filesystems itself to become "it's own seperate computer" at some point in the future, but that would increase costs. For the home user who is not a computer enthusiast.
Like the disk drives used on the Commodore PET, Vic-20, and C64. They were complete computers with their own CPU, RAM, and ROM that served files to the main system using a custom filesystem kernel. These disk drives ended up costing as much money (and eventually more) as the main computer!
What's really amusing is that demo coders often use the disk drives as coprocessors, for things like streaming music.
Re: (Score:2)
That is why file systems of the future will essentially mean that an entire computer will have to be dedicated to the file system if one is to have anything like serious performance.
Yes, pretty much. Just like you often dedicate a separate computer to a database system these days. In many cases, it will not be worth it alt all. The cases where it will be are already nicely addressed by DB systems today. Hence I do not see that happening anytime soon. It significantly increases complexity overall without any real gains and for the few cases where it actually is needed, it can already be done and done well.
Re: (Score:2)
There is definitely a case to be made for storage and filesystems itself to become "it's own seperate computer" at some point in the future
The 1541 disk drive was exactly that.
No. Period. (Score:2)
This article is just a follow up from another slashdot article published a few weeks ago about millenials having everything stored in a single storage pool in their phones and not being familiar with a proper filesystem. The answer is no - a global object store creates more chaos than it solves. Yes, tags are useful but, if implemented, they should work within a well-structed folder structure which is the only sane answer to managing the chaos on our hard drives.
You may be right. (Score:2)
It may be that the millennials would be even more disorganized without their tags.
Re: (Score:3)
Put tags on an existing file system and see if anyone still cares about subfolders.
KDE did that with the introduction of the semantic desktop, and it turned out to be so annoying that I think all access to it was eventually removed (I can't find it on my systems anymore). It was, and still is, a terrible idea that no one wants. I think even both people who used it shrugged off the removal as "nothing of value was lost."
Re: (Score:2)
MacOS did that in 1991, and OSX still does it today. It hasn't caught on to any considerable degree, but it's kind of cool.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Don't worry, Lennart Pottering will... (Score:5, Funny)
Don't worry. Lennart Pottering will add something like this to systemd next week.
tautology (Score:2)
A local object store that can effectively manage hundreds of millions of files and find things in seconds based on file type and/or tags attached is possible.
Yes, we call such things a filesystem.
File systems are usually free and come with your operating system, so there seems to be little incentive for someone to build a new system from scratch, but just like we needed the internet to come along and change everything we need a better data storage manager.
So ext3, ext4, jfs, iso9660, BTRFS, ZFS, venti, etc, etc, ad nauseam are just figments of our imagination?
So, the question is "Is it time to replace file systems with other file systems?"
No (Score:2)
File systems have evolved? (Score:2)
Has the file system been perfected? Or simply exhausted its possibilities for improvement? Because I can't help dreaming of something less awkward than nested subdirectories.
Be File System (Score:2)
Be File System, the file system of BeOS had a lot of modern features when it was introduced 24 years ago. One of them was database-like extended attributes which allowed it really fast file queries, and a bunch of other stuff.
You can read more on this Ars article:
https://arstechnica.com/inform... [arstechnica.com]
Re: (Score:2)
There's also a book on BeFS. PDF: http://nobius.org/~dbg/practic... [nobius.org]
no (Score:2)
Big Blue (Score:2)
Didn't IBM try a Database (ish) tech as a primary file system at one stage?
No (Score:2)
It's the last thing we need. (Score:2)
What files should be saved? Which files are useless? What do we store in the cloud and what do we keep locally? How do we merge-purge storage media? Where can we organize photos? etc. etc.
Take a typical phone user. Look at how often they scroll and scroll. What a bloody waste of time. We need a filing syste
Re: (Score:2)
We need a filing system that is intuitive so that they can find the pictures they want just by describing what they are looking for... etc. etc.
I like this idea but have no idea how it might be made.
Self-plug (Score:2)
I think he's frustrated that no one seems to care about his blog. It's a slog to read through and I'll confess I didn't have the willpower to get through much, but at a glance I feel like he misses the fact that modern filesystems do have a lot of what he'd think novel, but he just does it in an incompatible way.
In at-scale storage, 'object store' is a buzzword with some meaning. Basically it's 'just enough filesystem', without some of the burdens associated with network/cluster filesystems. The thing is
Question is Good but Misdirected (Score:2)
The question isn't whether we should replace filesystems, but rather if we should move core file system services *into* the filesystem. That is, should we embed all of the things that locate does into the filesystem? My answer would be "no" (I prefer single-task entities where possible), but making a filesystem "hook" wouldn't be bad (i.e., trigger X when a file is updated, where X might be an indexing operation). Perhaps we should standardize more metadata, where it is stored, and how it is accessed. T
Re: (Score:2)
but making a filesystem "hook" wouldn't be bad (i.e., trigger X when a file is updated, where X might be an indexing operation)
For what it is worth, that exists, in the form of inotify. An application can register to be notified should a number of events occur relative to files that the application cares to subscribe for. Technically the events *could* have more data (e.g. the IN_MODIFY only tells you that *something* changed and you have no indication of offsets/how much changed so you have to pessimistically assume the whole file changed) but it's not really worth it.
This guy has no fucking clue what a FS is (Score:4, Informative)
/Oblg. Those who don't understand file systems, such as BeOS's BeFS [wikipedia.org]. are doomed to re-implement it.
The reason we don't use tags for filenames is simple: How do you handle multiple files with the same tag?
The same reason we don't use 64-bit hashes for filenames is how do you QUICKLY iterate over a sub-set of files??? Second, how does this schema handle filenames, specifically globbing where an user selects a sub-set of files via wildcards such as "*.txt" ? The reason we even use filenames in the first place is because people NAME things. i.e. "Basket Weaving: Dead career or Undervalued skill?"
If you can't find files then you have a shitty organizational structure. Blaming or changing the FS is NOT going to fix that.
A file name along with the directory path provides a UNIQUE name. This allows you to have filenames with the same name in DIFFERENT directories. i.e. The ubiquitous README.TXT in many projects.
Every file system needs meta-data. The problem is meta-data is NOT static. Every year there are new types of file. How does this schema handle that?
Apple tried File Types back with its ProDOS [wikipedia.org] FS back in 1983 on the Apple 2 which allowed for 256 types. Mac OS extended this to a 4CC (four character codes) which caused constant annoyance at programs could read one file type but not the other. Unix "solved" this problem by using file extensions which is infinitely more flexible.
Just because a File System "looks" like a database doesn't mean it is on. Conceptually at a high level they are similar but under the hood there are fundamental differences.
Windows NTFS has piss poor performance [superuser.com] when many files are in the same directory. i.e. More than 5,000 files. The solution is NOT to switch to a new unproven File System but to organize your data better, namely, use the first 1 or 2 characters as a key and make sub-directories. i.e. 26 sub-directories for A-Z, etc.
The author needs to go take a Comp. Sci. Operating System 300 or 400 class because they have no fucking clue about how or why File Systems are designed the way they are such as modern features as journaling, data integrity, pooling, TRIM support, slack, etc. Or at the very least read up on ZFS design [fujitsu.com].
Now, get off my LAN. /s
Re:This guy has no fucking clue what a FS is (Score:4, Informative)
CP/M used *.ext, MS-DOS popularized it. Unix-like OSes, have no concept of a file extension. You don't need a dot. A filename is a single string. (Unix does have special files, eg, device files, named pipes, etc, which could be called types, but real, normal files containing 0 or more bytes are all the same type.)
Re: (Score:2)
The reason we don't use tags for filenames is simple: How do you handle multiple files with the same tag?
A filename is just a tag. Sometimes the filename gets so mangled that your terminal can't represent it and you have to play stupid tricks to even refer to the file (like referring to it by inode.) But then again, a filesystem is just a [typically] hierarchical database.
Some file systems solve huge # of files (Score:2)
Don't believe me?
ZFS supports:
- L2ARC which can be used as a read cache for metadata only
- Metadata vDevs, which can be used for small files, metadata or both
Either can be SSD via SATA / SAS, or NVMe drive. (Or even exotic PCIe storage.)
To be clear, the default L2ARC configuration is to cache both metadata & data. But, you can restrict it to metadata.
That of course over looks the AR
Short answer: no (Score:3)
People don't organize files today. They won't fill in reasonable sets of tags for an object-store system. They'll end up with the same problems locating stuff either way.
The complaint about the overhead of directory structure is a valid one, but that's more about the implementation of the directory structure than a question of filesystem vs. something else. I notice too that one of the things the proposed object store does is sacrifice available metadata for space. That seriously limits how people can search for objects. And really, most people don't search for metadata about files, they search based on the content of the files. The only solutions for that for binary files like images are tagging to describe what's in the image, seeing as image-recognition systems are too complex and too compute-intensive for the average desktop system. For text though it's easy enough to scan files and build a database of content vs. file path. Unix "locate" does that already. And yes the "locate" database is big. Do you expect any other database needing to store the same information to be any smaller? That's one of the trade-offs I have to make any time I build a database: speed of access vs. storage needed for the indexes that speed things up. Thankfully storage is relatively cheap so I can usually give up space for speed without running out of space. Same holds true for the "locate" database, if I have a 20TB drive then giving up 10% for the database probably won't put me in a bind as far as space goes.
What's really needed isn't a new approach, it's a new implementation of a directory structure that's speed-efficient at handling very large numbers of files in a single directory. Because frankly the average person isn't going to do more than rudimentary filing of files in any organized way so the storage system is just going to have to deal with it.
No (Score:2)
The gains are just not worth the time it would take for such a change to be fully accepted and implemented. I am not even convinced that there are any gains. Just different ways to look at something. I feel this is really just some developers out there looking to "disrupt" something. Long story short, if it isn't broke, don't try to fix it.
Re: (Score:2)
The gains are just not worth the time it would take for such a change to be fully accepted and implemented.
Indeed. If history is any guide, even snazziest all-singing, all-dancing database-filesystem would be hamstrung by the fact that people need to transfer their files to other filesystems without losing metadata every time they do so. Which means people would avoid the cool filesystem-specific functionality in favor of remaining portable, so it would go largely unused and eventually it would be abandoned in favor of a simpler, faster, traditional filesystem implementation.
didgets (Score:2)
I went to look at the website this is all about, very few details about the implementation.
https://didgets.substack.com/p... [substack.com]
Is it a graph? A tree? An index with metadata on top of existing filesystems?
Re: (Score:2)
Wife: New Shimmer is a floor wax!
Husband: No, new Shimmer is a dessert topping!
Now, improved new Shimmer is also a sandwich spread!
No, and stop calling for stupid things (Score:2)
And do not forget that MS tried this several times so far and failed resoundingly each time. There is a reason for that. That reason is that there is absolutely nothing "antiquated" about moden file-systems. Even the dog-slow MS NTFS is far superior to the alternatives and the truly modern filesystems found in the Unix-world do an excellent job and are quite mature.
As to searching the contents of file-systems, that is solved and has been for a long, long time: Just bild and maintain a database of file conte
Microsoft have been trying this for ages (Score:2)
We were told there would be something like this in Longhorn. Perhaps when Longhorn, suitably renamed, finally arrives, perhaps it will have a database FS. But hierarchical FSs are good for many things, and aren't about to disappear any time soon.
Re: (Score:3)
I never said that Didgets was a perfect system and it solves every problem
will anyone look at the data? (Score:2)
Re: (Score:2)
The percentage someone will look at will decrease, but the chance that the information someone wants will be in there will increase.
We'll just put everything in the cloud... (Score:2)
So sad (Score:2)
Ha, his file system uses 64 Bytes per file, what a waste. If only he used 64 bits per file it would use even less space for 200 Million Files.
Afford? (Score:2)
Huge sized SSDs are still too expensive compared to huge sized HDDs.
File systems already get replaced regularly (Score:2)
NTFS, FAT, exFAT, HFS Plus, EXT. These are just some of the file systems we've seen over the past 3 decades.
In reality, file systems ARE a type of database, known as a hierarchical database. Not all databases are SQL.
Cloud services offer things called "blob storage" that "replace" file systems. Relational databases like SQL Server, Oracle, Postgres have blob storage. The trouble is, most software can't use the blobs directly. And worse, the performance sucks compared to actual file systems, because database
Bullshit arguments. (Score:3)
In a Substack article, Didgets developer Andy Lawrence argues his system solves many of the problems associated with the antiquated file systems still in use today. "With Didgets, each record is only 64 bytes which means a table with 200 million records is less than 13GB total, which is much more manageable," writes Lawrence. Didgets also has "a small field in its metadata record that tells whether the file is a photo or a document or a video or some other type," helping to dramatically speed up searches.
Do you think it's time to replace file systems with an alternative system, such as Didgets? Why or why not?
The arguments given are that this filesystem makes searching for that file easier.
But i have two problems with this philosophy.
1. If you're a light user then you won't have many files to look for. A 'brute' force approach works pretty well and this FS is not really helpful.
2. If you're a heavy user then you will be organizing things much better and won't need the facilities offered by this new FS.
If you're in the middle between these just realize that organizing your files is the only good solution to the problem of not organizing your files.
Re: (Score:2)
I work on Windows platform stuff and end up searching for a bunch of stuff the Windows search doesn't index.
I use dir /a/s/b c:\ > c:\filelist.txt and then grep that.