Submitted by joelhalse on
OpenEdit's Joel Halse believes that file based applications and not the database driven systems are the next evolution for information management. The below article is written by Joel Halse and explains his reasons for why he has reached this conclusion.

Lets say you need to organize 2000 people on a football field. A relational database would create 2000 little boxes and make everyone stay in their little box. If someone needed to move around, they would first need to inform the administrator so that the administrator doesn't lose track of everyone. A file based system on the other hand would hand out a cell phone to everyone and tell them to have fun. If someone needs you, we'll give you a call. Just make sure you don't lose your cell phone. Beyond that, have a great day.

A relational database was a good system. It was also created in a time where searching a million files took more than milliseconds. It was a product of limitations. It wasn't necessarily the ideal solution, but it was a good solution given the tools at hand. Those limitations are gone. Those limitations are in the past. New technology and mind boggling search capabilities have opened the door for new options that weren't available 20 years ago.

File based applications are the next evolution for information management. Especially for the web.

Why? Because it's easier understand. It's not that you aren't smart enough to understand a database. It's that you don't have to understand a database. Especially when you already understand how to use a file based system.

Who is YOU?! That's the key to my argument. YOU is everyone!

To run a database driven web application you'll need a DBA. If you don't know what that stands for you aren't alone. It stands for Database Administrator. But if you want to work with a file based system, all you need is you. If you can find files on your computer, you can find the files within the your file based web application.

First let me highlight two historical examples of emerging technologies that changed the face of data management, and then let me draw the parallel to the emergence of file-based systems. Notice that each example required a few iterations to achieve true success but the magnitude of their impact is enormous.

The first example is Windows. Windows crushed DOS as the operating system of choice because it focused on people. All of a sudden the common user could touch and feel files. Move them around. Drag and drop. Everything became accessible because you could wander around and find what you were looking for. This concept has revolutionized the way people interact with their digital information. Just imagine if you started your computer this morning and had to interact with your files using a command prompt. ACK!! The idea is unimaginable for the vast majority of the computer market place. The average computer user knows nothing about developing software and writing code. Keep in mind that the average person publishing to the web is the same average person who owns a personal compueter. Most people publishing to the web aren't programmers.

My next example is Google. Google dominates the search engine marketplace because it focused on people. Google is like a person. A very reliable and friendly person whose greets everyone with the same comforting approach that 50% of the WHOLE WORLD trusts to find what they are looking for online. That's a heck of a lot of people and a heck of a lot of trust. Google gained our trust with a very simple approach. If Google were a person, Google would probably look like a friendly librarian that instantly makes you feel relaxed and less stupid. And we all hate feeling stupid. Google would say something like, "Hi. Just type what you are looking for in this magic little box and I'll go find it. Don't worry about being wrong either because I'll only take 0.22 seconds to find 34 000 000 options. I'll sort these options for you too so what you are looking for will probably be one of the first ten options on the list. If I don't find the right thing then just ask me again. I'll keep looking. How much? *laughs quietly* Oh my dear boy, you don't have to pay me a red cent. I'll be here 24 hours a day, 7 days a week. If you need something, just ask." Incredible!

What has this got to do with file based web applications? Well, let me explain how you do a simple change within a file based system vs a relational database.

Let's say we want to make a change to a file. For this example the file is a web page. Specifically, the web page is http://demo.openedit.org/folder/file.html

Now let's make a change to this file or web page.

File Based:

This is the location of the computer: http://demo.openedit.org

This is the location of the file on the computer: /folder/file.html

Now go ahead and change the file. Use any tool you like. OpenEdit's file manager is one option. FTP is another option. You can also use the operating system running on the host machine to locate and change the file. That's it! Go make a change to that file. No caveats. It's that easy.

Relational Database:

Umm.. I don't know how to make changes to files within a database. I know how they work but if I want to figure out how to actually use one, I would have to start reading books about databases. But I don't want to learn about databases. I want to create web applications. I've been developing web applications for over 4 years and I know pretty much nothing about managing a database.

I can create something that works like a database. I can organize over 100 000 files into an organized, fully indexed structure of files that can be searched much the same as Google. But I can't run a database. Why should I? The goal isn't to have a cool database. The goal is to have organized and easily accessible information. I want to put all my files somewhere, have them organized, and then be able access them when I need them. No rules. Just let me find my files and give me access to them when I need them.

A relational database can't give you direct access to your files. Notice I didn't say won't. A relational database would give you direct access if it could. But it can't.

Here's why.

In a relational database everything needs to be in a specific spot and giving you direct access to the file can have dire consequences. A database doesn't understand that file.html is in a folder called /folder on a computer called http://demo.openedit.org. Instead, a database understands the location of file.html in relation to everything else around it. Recall the football field example? For the database to work you have to ask the database to go get the file for you. In fact, if you went and deleted the file without telling the database you risk corrupting the entire thing. If box 1276 on our football field had the wrong person in it then you probably can't be sure that the rest of your boxes were correct either. If file.html isn't where it's supposed to be than the database often gets confused and can't find anything. When a database gets confused it can lose track of everything. And that's bad. Really bad. So bad they created a term for it. Corrupt. Being corrupt is so bad that every relational database requires an administrator function that acts as the gate keeper to ensure this very bad thing never happens.

It is this requirement that causes so much grief for database management. Despite the enormous amount of effort that industry has invested into making these administrator functions easy to work with, they are a REQUIREMENT. A relational database requires an administrator function and can never allow you direct access to the file system. It's part of system's architecture.

Until now this was the best way to manage large amounts of information. New technology keeps changing the rules and we are in the middle of another big change.

If you want to organize over a million files you no longer need a database. You can keep your files unlocked. You can say goodbye to database administrators. You can regain control.

This new reality is probably terrifying for the database industry but it's proving to be nothing short of earth shattering for me!

CMS Topics: 

Comments

You Must Be Joking

Lawrence Salberg's picture

At least you qualified this as an "opinion", but this was the most hilarious article I've read in some time - and coming from a source that should know better.

Be bold and keep it online so we can all write back in 2010 and laugh even harder. I find it almost stammeringly confounding that the same quarter in which Sun buys MySQL for one billion dollars, Joel makes the stunning observation that databases are dying and that the database industry should be fearful of the future.

And why does he believe all of this? Besides a hogwash analogy of historical perspectives (which are highly skewed to his own recollection and worldview), the central reason for his thoughts here are that, well, it's just dog-gone easier for him to deal with flat files.

How wonderful. Hey, it's not exactly like databases are super tough these days. But he's completely rogue if he thinks that flat file data storage has anywhere near the power and scalability of flat file storage. In fact, there's not a single major web application out there that uses flat file storage. And any minor ones that do will be upgrading to a database server as soon as they get the one thing they crave: users.

Still, for a Monday morning, I can't think of a more encouraging article. It shows that we web developers are going to be in business for a long time to come. There are still, quite obviously, a huge gamut of "should-ers" out there - those who believe things should just work the way they believe - regardless of history, regardless of raw data and facts, and regardless of their understanding. And no, I'm not speaking of Apple fanboys, but they run a close second here.

Complex Rollups

Matt Farina's picture

How do you handle complex rollups, aggregation, files (as opposed to text) and other data manipulations in your system? I guess I fail to see how this can work when your situation goes beyond simple static pages.

Can you explain further?

Still worth thinking about it

JC's picture

First, thanks for the article ! It gives some basic insights of what we can think about databases "complicated utility". I mean, database management is really hair-pulling stuff !(I don't have a degree in computer science...)

Because nowadays, we can build websites without having other knowledge than typing Word documents ! I am learning Python and stuff related to what I want to do, but it is still a awful lot of time invested compare to the majority of 'bloggers' !

To complement the comment of Joel Hasle, I would like to say that when thinking about personnal website, having a file directory or a database has the same utility for the webmaster. Of course, if it is static content ! And it is still the main part of what I am looking for, content !

A database is less handy because you have to control it. It is not expandable when you do software, but what I think we often forget is, a website is not a "software". Or not necessarily.

Forums, wikis, blogs, are software made to edit online content. They (as far as I know) have always a database, in order to control concurrency of editing, relationship between posts/comments, etc.

But when I am thinking of publishing online my stuff, why would it have to be interactive ? If it is really stuff for which I want to control the content directly, then flat file system is good. I mean, really good, simple, easy to visualize and update, etc. No need of database if the only content published is the one published by me ! No concurrency, no DBM, no need to even think about it ! As a beginner in programming, it is like "thinking outside of the box", after having been reading so much stuff about information technology !

So, to me, yes the "death" of database is really hypothetical, but its ominous presence for regulating any web content is quite exaggerated. I think that a database is a tool, not "the big solution to do the internets". But when we fray in "website programming", it is kind of hard to know why database exist, what are their specific functions, and for what job they are made.

Oops, way too long ! thanks for reading

File v D/B systems

Philip Daniels's picture

What 'Da!

How come you knew that the data you wanted was in a file named "file.html", in a directory named "folder" which is on the root of a computer called "demo.openedit.org".

I have about 1.1M files on my system, no way can remember them all. So I use search engines that find stuff based on name (Locate32) and content (AimAtFile) both of which use databases - whether they are "relational", "plex" or "hierchical" databases I've no idea, nor do I care.

A URL yields one and only one access path to the data. Under your premise if I wanted multiple access paths I would use file system links (hardlinks, symlinks, junctions etc), which many professionals don't understand and few if any end users understand.

Some observations

Nir Gavish's picture

First of all, @Lawrence Salberg: You gave away the age of your career :) some of the largest, most robust, longest-running applications in banking and publishing work using flat files, and they do a hell of a job.

With that said, as long as we are talking about web apps, i'd have to agree, most of the article was wishful thinking, rather than a solid argument.

The problems with indexing, sorting and searching weigh heavily on a file-based application. if you're trying to keep it simple - an external indexing process is just the opposite of simple, and it will have to be relational anyway (you could buy the yellow abomination from google, and run a kickass indexer locally, it's only 30k).

As far as scalability is concerned, flat-file has an upside and a downside, so i can't agree that flatfile systems do not scale: it absolutely depends on the needs of your app, an update-intensive app will scale horribly, while a read-intensive app will scale (and preform) like no database you've ever seen.

I have a lot of experience building flat file apps, and to be honest, the point of failure (every time) is sorting/ordering and searching. if your app doesn't need to maintain arbitrary order of entities, you can make it work with a simple indexing service. but in all other cases, a database is still the only way to go.

if, however, we just keep the *metaphor* of a filesystem, while doing away with it's actual limitations - and develop our management systems to behave as closely as possible to a filesystem while using the strength of both filesystem and relational DBs, we will end-up with the best apps in the world.

Note: if you're looking for good ways to store abstract or arbitrary data, look into native xml databases, they rock.

Mostly Wrong, Partly Right

Anonymous's picture

The OP is definitely out of his depth when suggesting that dbs are passe. Just about any developer knows that database will smoke a flatfile system in just about any data-intensive application.

Where Joel is right is in inferring that dbs are not usually essential for website CMSs. Despite the fantasies of their owners, most website content changing hourly (some NEVER change...). Even for a dynamic news-type site, the most active content is only changing maybe every few minutes, and of course the site structure, templates, CSS, js, etc, change only infrequently. So, in this application, as long as there is sensible caching going on, a db won't necessarily produce a faster website, or speed up the task of content management.

A proper CMS should hide all of the above, and just plain work, at any rate.

I don't see XML as a magic bullet for dbs. XML combines the speed of flat files with the ease of writing software ;-)