Best Practices for Data Management

A while back, a production company I work with asked me to help them improve the way they backed up and managed the many terabytes of data their productions created.

The following is the procedure I suggested they implement.  Note that while this is written for smaller productions that will end up with only a single pair of cloned hard drives, it can be scaled for larger productions by simply applying the same principles to each pair of cloned hard drives.

Data Management Basics

The above graphic shows the basic process for handling production data.  It is based on the multi-industry 3-2-1 backup standard, which states there should be 3 copies of data on 2 different storage mediums with 1 copy stored off-site.  This provides multiple levels of redundancy and protects data in almost any conceivable situation.

The basic process is as follows: on set, cards are copied to two production drives.  After wrap, one drive is sent to the editor, and the other is copied to a RAID 5 array.  Once post is completed, all post assets are copied to the RAID 5 array and the other production drive.  One of the drives is then stored at a separate location from the other drive and RAID array.

While this might seem like a lot of cost and effort, remember that this data is the sole product of the production and the production company has spent many thousands or even millions of dollars to produce it.  The cost of a few hard drives and a few work-hours is insignificant compared to the cost of having to reshoot a project because the footage was lost.

On-set Data Management

The first step in proper data management begins before even a single byte of data is recorded: the purchasing of a pair of hard drives.  These drives should be bought new at the beginning of every production and both be the exact same type of drive.  New drives are used for two reasons.  The first is that, once archived, it makes going back and finding the data from the production extremely easy; there is no searching through multiple hard drives trying to find which one has the footage from which production on it.  The second is that while most hard drives will last over 4 years, every time a drive is used is an opportunity for it to fail, be physically damaged or for data to be unintentionally deleted; reusing hard drives only creates more opportunities for something to go wrong with a drive and if it fails, you’ve now lost data from two productions.  The cost of a new hard drive is insignificant compared to the cost of having to reshoot a project because the footage was lost.

If a production does need to save money on drives, the best way to do that is by selecting ones that are appropriate for the production’s needs.  A web series being shot on DSLRs does not need a 4TB RAID 0 drive and could easily get by on a 1TB mobile drive, even if shooting on multiple cameras for several days.  However, larger productions, shooting on cameras that produce greater amounts of data, will benefit from large RAID 0 drives with high-speed I/O ports and the savings will be in not having to pay the DIT extra overtime to babysit cards as them dump at the end of the day.

Once production begins, all footage should be copied to each of the two drives by the DIT or DMT.  Because corrupted data won’t necessarily show up as a change in the number or size of the copied files, and it’s unrealistic for the DIT to watch all the footage looking for errors, it is important that she or he uses software that creates a checksum when copying the data.

Post Production Data Management

At the end of every production, one of the drives is sent to editorial where the editor and any other post personnel should have their own backup procedure to protect any data they create. On simple projects, this may just mean saving the project file to the production drive, their local hard drive, USB flash drive and/or a cloud drive.

The other production drive is then copied to a RAID 5 array.  This fulfills the requirements of the 3-2-1 backup standard.  There are now three copies of the footage (the two production drives and the RAID array), on two mediums (hard drives and RAID 5), with one stored offsite (the hard drive with the editor).  Even though a RAID 5 array is composed of hard drives, it qualifies as a second storage medium because the way it stores data is significantly different from the way stand-alone hard drives store data.

Archiving Completed Projects

Once the editor returns his or her drive to the production company, all the post files are copied to the RAID array and the other hard drive.  All three storage devices now each have all the data created by the production.  The last step is to take one of the hard drives and store it off-site.

At this point, the production company could discover that there was a disgruntled employee at the hard drive manufacturer when both the hard drives simultaneously fall apart, or that new intern could “borrow” a hard drive so he can download every episode of Game of Thrones in ProRes 4444, or a meteorite could hit just the storage room where the archived hard drives and RAID array are kept, or any number of other likely or unlikely disasters could take place and there would still be at least one copy of the data from the production.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s