Since I haven't written a blog post for a while, thought I'd do one about the small project I just finished. This is something we're not particularly intending to release publicly because it's probably not useful to anybody else, but it might be slightly interesting to other large-scale Moodle users. Or not. Let's see.
Basically we had two problems with our Moodle system that this is trying to solve:
- People host large files (e.g. 1GB) on the system, but Moodle isn't optimised for serving large files; this isn't causing a problem now but we're concerned it might do in future.
- Our Moodle filesystem is quite large and rapidly increasing in size. Certain maintenance operations, like when we take a copy of the entire system onto our acceptance test servers each time we test a release, are getting slower.
The OU happens to have an EMC Atmos file store, a commercial product which is obviously designed specifically to store files, so the suggestion was that we should move large files out of Moodle and into Atmos. That way we don't have to include them in the copy for acct (we'll give our acct system read-only access to the filestore so they will 'still be there') and serving the files can also be passed off to Atmos.
Conveniently, there is a new feature in Moodle 2.3 which allows files to be stored in the Moodle filesystem by 'reference' - instead of storing the actual file, it stores a reference to an external repository system.
Using this feature, I built a repository that, during cron, finds all files larger than 10MB. It checks (based on contenthash) to see if they are already in the Atmos system and, if not, transfers them to Atmos, deletes them from Moodle and replaces with the reference. Then, when users click to download a file, it redirects to an Atmos shared link (complete with security key to prove they can access it; we don't lose security by this process, it only happens after the Moodle security checks).
This wasn't very difficult; one of the 'gotchas' is that you have to make sure you set your reference's content to nothing (empty string); if you let the reference keep its original contenthash then Moodle won't delete the file.
A few things in Moodle meant the implementation is not quite as nice as it should be. First there's no way to have a repository that only lets you create one instance, or only lets you create it at system level. I hacked it so that in the creation process it throws an error if you try to create two. Second I couldn't find a way to stop this repository showing up on the file picker for admin users. For everyone else, it doesn't show because I made a capability and didn't give it to anyone.
Overall, though, neither of these are serious problems and the process appears to work well. Although I should admit - I finished the code but it hasn't been through testing yet.
Just by the way - this is only part of our 'filesystem too large' solution. I think this will take out about a third of the filesystem size; another third will come by removing course backup files. Those can be huge and they appear to persist forever; we'll probably do a local plugin that deletes them after a month or something. (We don't use those files for backup purposes.) So anyone concerned about filesystem size should also check how many backup files there are in course/user areas - for many of our courses, one backup can be well over a gigabyte, so these add up quickly.