Since I haven't written a blog post for a while, thought I'd do one about the small project I just finished. This is something we're not particularly intending to release publicly because it's probably not useful to anybody else, but it might be slightly interesting to other large-scale Moodle users. Or not. Let's see.
Basically we had two problems with our Moodle system that this is trying to solve:
- People host large files (e.g. 1GB) on the system, but Moodle isn't optimised for serving large files; this isn't causing a problem now but we're concerned it might do in future.
- Our Moodle filesystem is quite large and rapidly increasing in size. Certain maintenance operations, like when we take a copy of the entire system onto our acceptance test servers each time we test a release, are getting slower.
The OU happens to have an EMC Atmos file store, a commercial product which is obviously designed specifically to store files, so the suggestion was that we should move large files out of Moodle and into Atmos. That way we don't have to include them in the copy for acct (we'll give our acct system read-only access to the filestore so they will 'still be there') and serving the files can also be passed off to Atmos.
Conveniently, there is a new feature in Moodle 2.3 which allows files to be stored in the Moodle filesystem by 'reference' - instead of storing the actual file, it stores a reference to an external repository system.
Using this feature, I built a repository that, during cron, finds all files larger than 10MB. It checks (based on contenthash) to see if they are already in the Atmos system and, if not, transfers them to Atmos, deletes them from Moodle and replaces with the reference. Then, when users click to download a file, it redirects to an Atmos shared link (complete with security key to prove they can access it; we don't lose security by this process, it only happens after the Moodle security checks).
This wasn't very difficult; one of the 'gotchas' is that you have to make sure you set your reference's content to nothing (empty string); if you let the reference keep its original contenthash then Moodle won't delete the file.
A few things in Moodle meant the implementation is not quite as nice as it should be. First there's no way to have a repository that only lets you create one instance, or only lets you create it at system level. I hacked it so that in the creation process it throws an error if you try to create two. Second I couldn't find a way to stop this repository showing up on the file picker for admin users. For everyone else, it doesn't show because I made a capability and didn't give it to anyone.
Overall, though, neither of these are serious problems and the process appears to work well. Although I should admit - I finished the code but it hasn't been through testing yet.
Just by the way - this is only part of our 'filesystem too large' solution. I think this will take out about a third of the filesystem size; another third will come by removing course backup files. Those can be huge and they appear to persist forever; we'll probably do a local plugin that deletes them after a month or something. (We don't use those files for backup purposes.) So anyone concerned about filesystem size should also check how many backup files there are in course/user areas - for many of our courses, one backup can be well over a gigabyte, so these add up quickly.
Comments
New comment
Hi Sam, I am facing some problems of the same nature, maybe you can give me a clue. I am creating a new Moodle site in a non-profit organization, that promises to be quite big. It will be accessed by up to 270 different cities in Africa. In the first course, we will have a lot of huge media files, audios and videos with high resolution to be added to the lessons (they need to be big, it's a course in critical listening, among other audio/video stuff). The problem is that most of the places where people will be going to access the site have terribly poor internet connections. So, I am thinking in a way to overcome it. I thought about creating a file structure in the clients where, when the student click to play the media, our Moodle site checks first if the files are stored locally. If yes, it plays the local file. Then, if not, it downloads the file from the server and stores it permanently in the client machine, to be available next time. Questions: 1) Do you think this solution is suitable? Could you give me some tips about how to implement this solution? I was a programmer long time ago, but never did anything on Moodle. I am studying the HTML5 File API, but still not sure about how to implement it. 2) If it's not, do you have an alternative to solve our problem? I am very thankful for any help that you can give me. Thanks! Patricia Moura (pkmoura@gmail.com)New comment
Patricia: Sorry, don't think I can help!
Moodle repositories are server-side, so I don't see how they would solve this problem. You would need to do special client-side code to transfer the data to the student, while checking if they already have it. I'm not sure if it is possible to write this kind of application in HTML5 but it might not be, in which case you'd need to write a client-side program using whatever suitable programming language and it will be harder to integrate with a website.
We have something like this at the OU where there is a new (custom) app for tablets that will download media files if the students don't already have them.
Without doing code, one option would be to provide the files for download rather than direct play. Obviously the interface is less good but then the students can 'manually cache' the files by leaving them in their download folder. In a 'Now play Video 4. If you don't have video 4 yet, download it with this link' kind of way.
--sam