Home | Hire | Products | News | Support | Training | Contacting Us | Freelance | Links | Faq

Why filesystems need optimizing for the data they’re storing

This is an attempt to explain why filesystems need to know what files they’re storing and why storing different sized items results in inefficient storage. I’ll be using the Library as an analogy of what happens.

Let’s take a big disk drive.

It might be an Autodesk Stone array or an Apple Xserve RAID on an Xsan or just a huge firewire drive.


2


3
Now the file system is like an old library. All the book details (metadata) are stored on filing cards and put in a filing card box. All they do is tell the librarian which shelves the books are on, who authored them, when they were last taken out etc. This is the filesystem.

 

 

4The books (data) are stored on separate shelves – sometimes in the same room (on the same disk), or maybe in a separate one (metadata drive). When the filsystem is new, the index cards are empty. There are only a finite number of them and they have to reference the entire library. So we have a clean index and no books on the shelves.

Let’s put some data in. In our library we can fill the shelves with big book volumes or lots of little ones. If we have big volumes, the shelves will fill up quickly but as the physical size of the volumes is big, then it only takes a few index cards to refer to the entire contents of the library.

 

5

Now the library is full and all the volumes have an entry on the index cards.

 

 

 

Now consider what happens if we delete some of the files, i.e. remove some of the books.

6

 

We get a library shelf that looks like this.

 

 

Any new books we put in have to go in the spaces left by the ones we removed. Obviously if you’re the librarian putting these on and off the shelves you’re going to be physically moving between shelves far more often than moving the books and that takes time. Obviously the fuller the library gets the more difficult and time consuming it becomes to find room for the books and index them. This is why full disks have truly awful read and write speeds.

A sensible librarian might re-arrange books in their spare time so that gaps are kept to a minimum (de-fragmenting). Alternatively they can be sensible and only put books where there is useful continuous space for them (optimizing). The Librarian may also have a trolley next to their desk where the most commonly used books are held because they come and go more often (caching). If the power fails and the lights go out, they might have a torch handy in order to see the cards to amend them (battery backup).

Now if our library is optimized for lots of big volumes and we start storing lots of smaller sized books, we’re going to run out of index cards before we run out of shelf space. Our library isn’t very good at storing lots of small books. We’d need a lot more index cards – but that wasn’t how the library index was set up. In filesystem terms this means that 7 the filesystem index is full, but the storage isn’t. This is why video volumes are optimized for HD or SD and mixing resolutions can result in seeing a lot of free space that can’t be used.

If we set up our library for lots of small books, we need a huge number of index cards. As these need room (memory) they require a lot of overhead. Thus a small library is more efficient for small volumes and a big library for big ones.
Just bear this in mind when mixing file sizes on large disk drives.

This applies to xsan, bright drives, stone and wire, windows, mac, linux and so on.

Stone and wire stores clip information (index) in /usr/discreet/clip on the local disk and audio/video on the stone array.
Stornext  (Xsan, Bright, Quantum SNFS) stores user data and metadata on separate drives.
OS disk filing systems (NTFS, HFS+, FAT32) store indexing and data on the same drive.

 

back to top

Partners
XTFX Partners

XTFX
About Us | Site Map | Privacy Policy | Contact Us | ©2007 XTFX Ltd
Flame Rental UK