NextPrevious

Watch folders, not a substitute for good integration

It seems that every integration project I see at the moment calls for some kind of ‘watch folder’ as the preferred mechanism for moving content from one system to another. However, watch folders may not be the best solution. To find out why, keep reading.

But I’ve seen watch folders, so they must be good, right?

You have probably seen watch folders before. Maybe it was in iTunes, maybe it was Adobe Media Encoder, or one of countless other applications. These watch folders even work. Life is better with watch folders.

blog-logo-2

You can start up your favourite desktop application, confident in the knowledge that it will show you a simplified view of all the content in your “My Music”, or “My Video”, or “My Photos” folders. It’s great – no more importing, just drop it in the folder and everything else is taken care of.

You can even see the application “indexing”, or “adding” in the user interface. In the world of desktop applications the watch folder is king.

So, why can’t {insert any platform or service you can think of} support watch folders too?

Well, it’s because most platforms and services are no longer standalone desktop applications.

Are watch folders so bad?

You have some software that is monitoring a folder. You drop files into that folder, said software notices and does something with those new files. What could be simpler? What could possibly go wrong? Well, there are a few things to consider.

It’s a slow way to move data around

Let’s start at the beginning. You drop a file in a watch folder and, most likely, the file will now be copied to the watch folder. Copying files takes time, and for large video files it often takes a long time.

The software or application watching the folder has to wait for the file to finish being written before processing it. Usually the file is copied elsewhere, so by using a watch folder, we have just doubled the time it takes to process an incoming file. If you are processing big video files, this is slow and could cause a problem.

When is the file ready?

When you drop a file into a folder, the new copy is created immediately. You can see the file in the watch folder, even though it isn’t completely copied yet. On the desktop, you can watch the copy progress bar to see when the file is ready, but the watching software, probably across the network somewhere, doesn’t have the benefit of a the progress bar. We try to get around this by periodically checking if new files aren’t being written to, for example. But this is not an exact science and cannot be 100% reliable.

Sidecar data files

Most media that we deal with comes with an external set of data that describes the file. This metadata is almost always in another, much smaller file. Usually, your integrated system needs both the metadata and the media file so you put both files in the watch folder. Right?

This is where watch folders might not be sufficient for any serious system. We are now a long way from the simple experience you might be familiar with from the likes of iTunes.

Let’s say a pair of files are dropped into the watch folder. One is a big video file and the other is a small XML file with some descriptive information about the video.

The first problem the watcher has is to know which two files go together for each pair. After all someone might drop 10 XML files and 10 video files at the same time. You could give both files the same name but different extensions, or maybe the XML file will contain the name of the video file. Whatever rule you use, the watcher must have prior knowledge of it and must be able to be customized to make this relationship work.

The next step is to try and figure out when the two files are ready – this means putting in a lot of checks to deal with possible error combinations. This is becoming much more difficult to implement than we first thought…

What happens when something goes wrong?

If you pull a memory stick out of your computer while in the middle of copying a file from it you’ll see an error message relating to the action of removing the memory stick.

What if someone does the equivalent with a watch folder? They copy a file, then moments later decide it was the wrong file and then delete it from the watch folder. One minute the watching application is happily processing the new file, the next it’s gone! It has no idea what happened or why, to the application it looks bad, maybe even “hard-disk failure” bad.

A desktop application can notify the user of a fault but an enterprise application on a remote server has no way to contact the actual user who may understand and can explain what happened. All the application can do is log the unexpected error and perhaps notify an administrator. Quite often, these “unexplained errors” end up consuming lots of support time and can even undermine confidence in a perfectly good system.

What if there are no folders?

The lack of feedback that should be the nail in the coffin for watch folders, for anything but desktop applications with a single user. But if that is not enough to convince you that watch folders are not the integration solution you are looking for there is one more thing.

Where we are going there are no folders. Cloud scale infrastructure is built on blob or object-based storage, a very different world to the desktop we grew up with, where folders and directories exist in a tree. In the cloud there are no folders to watch.

What is the better way?

I hope that watch folders are starting to sound more like a message in a bottle than a reliable way of notifying one software system about new content. To do better, what we need is the equivalent of a signed for delivery by a reliable courier service.

Luckily there is such a thing – in fact the document you are reading was likely delivered using it! For over 24 years the Hypertext Transfer Protocol (HTTP) has been used to connect software together. To start with HTTP was exclusively used for serving up web pages, but the protocol has proven to be extraordinarily adaptable and resilient and is now used as the plumbing for much of the Internet infrastructure that we take for granted every day.

Using this HTTP, we can quite easily design a reliable process where application A posts some XML to application B. The XML sent by application A could contain the location of a source media file and application B could read the file directly from its present location. With no intermediate copy needed, we have just doubled the ingest speed. HTTP provides lots of useful result codes that can be used to report problems, and it is easy to extend the process so that progress and success information is also returned.

What I’m describing here is known as REST. Integrating systems using REST does require some level of development or at least scripting, but the benefits you get compared to the watch folder model are huge.

If you are planning a system’s architecture, involving several different software components, set your standards high. There are faster, more secure processes than using watch folders and, to put it frankly, if you are a professional, you need something more reliable.

Leave a Comment

Your email address will not be published. Required fields are marked *

NextPrevious