OU blog

Personal Blogs

Backup and restore shenanigans

Visible to anyone in the world
Edited by Sam Marshall, Wednesday, 23 Oct 2013, 11:21

This blog post is about the work that the Open University has funded (and I've coded) to improve backup and restore in Moodle 2.6.

I was going to talk about this in the Moodle developer meeting, but decided to skip my item because it isn't that critical for developers to know about, I had problems connecting to the technology, and the meeting would have run late. So I'm writing a blog post instead!

There were a number of problems that meant backup and restore was not reliable for large courses. These included:

  • Backup and restore didn't display any progress during the operation - you'd just see the browser waiting for a page to load. This wasn't a very reassuring user interface, and also comes with other problems:
    • If the backup or restore took longer than an hour it would fail due to PHP timeout.
    • On load-balanced/clustered systems with multiple webservers and a front-end server that shares load between them, the front-end server typically has a timeout of a few minutes if connections do not send any output; on these systems, even small backups would fail.
  • Before you get to the actual backup and restore, some of the interface pages can also take long time to load when you have a very big course. Again this is a bad user interface and causes problems for systems with front-end servers. In addition, because the default PHP time and memory limit for these pages was low, they would fail for courses with a large number of activities, even on a 'normal' Moodle installation with one server.
  • Backup files were limited to 4GB in size due to restrictions in PHP's support for the zip file format. (Some of our courses contain more than 4GB of files.) In addition, even where it created a larger file, it wasn't possible to actually download files larger than 2GB due to limitations of a function used in Moodle file download code.
  • If there are minor problems in backup or restore, the code may add a warning message in the backup/restore log. In Moodle 2.5 these logs are never shown to users, so this is not very helpful.
  • There wasn't a simple mechanism for developers to create a standardised large course for testing backup, so while we were seeing major problems in our real usage, it was difficult for developers to deal with the equivalent situation.

To solve these problems I made a number of changes, mostly listed under MDL-38189.

My basic approach was to start with the last problem - using the excellent 'generator' mechanism already in Moodle, I built a simple admin tool to create large courses for testing. The tool had options for different sizes ranging from XS to XXL (this last size I never tried; it may possibly be stupidly big!) with L designed to be similar to the largest courses on our system, and XL designed to be larger than those largest courses on our system.

After that I simply tried to backup, and then restore, courses at each size. The XS and S sizes worked immediately, but the M size failed. Once I got the M course to work I tried with the L course and finally the XL course.

The changes I made that correspond to the above problems are:

  • I added a progress reporting API inside backup and restore. This displays a progress bar and also a wibbler bar underneath it. The progress bar shows 'known' progress as a percentage (when the code knows this is step 47 out of 300) and the wibbler bar just wibbles (changes colour) to indicate that progress is happening when the code doesn't know how many steps there are going to be.
    • The PHP timeout is reset each time progress is displayed, so the backup/restore will never time out as long as something is happening and reporting progress.
    • Because HTML code is frequently output to the browser (usually once per second), any front-end server will not time out.
    • I had to change many areas of backup code (that can potentially take time) to use the new progress reporting API so that this works. I also had to change API in some other areas of code that the backup/restore uses (most notably the file packer) to add a mechanism for reporting progress in that area too.
    • You don't normally have to add progress reporting to individual plugins such as activity modules, if they handle backup/restore in the usual sort of way, because I already put progress reporting calls in the lower-level mechanisms that they generally use - for example in the restore, progress reporting happens automatically while it's reading the XML file with your plugin's data.
    • If you are writing custom code that calls backup and restore programmatically, you can add your own callback to get the progress information - you could use the standard 'progress bar + wibbler' display, or your own custom mechanism.
  • I added and used a similar progress reporting mechanism for the steps included in the backup/restore user interface pages before backup starts (e.g. preparing the large forms for selecting activities; unzipping the backup file at the start of restore). These too can display progress as the pages are prepared for display. To avoid unnecessary distraction, these progress bars don't appear unless it takes at least five seconds to prepare the page. (I also increased the memory limit for these pages so they don't run out of memory.)
  • I made a new backup format that supports files larger than 4GB - you have to turn this on via experimental settings. The files are still called .mbz, but if you turn on the new format, internally it uses the Unix standard .tar.gz compression format. (You can rename the file to .tar.gz if you want to unpack it using WinZip, or the Unix 'tar xf' command.) When restoring the system supports either type of file automatically, regardless of the experimental setting. I also changed Moodle file download so that it works to download large files. (In some cases this might require that the servers are running in 64-bit mode, although I'm not actually certain of that.)
  • When a backup or restore completes, if there are any messages in the log, these are displayed on the last screen of the backup/restore (the one which previously just had a 'continue to course' button). This gives the backup system and plugins a chance to report problems and may see more use in future now that the log actually displays to users.
  • As noted above, there is a 'Make test course' tool in the administration block within the Development folder for developers who want a large course to test against. Other people have enhanced this further since I started it, too!

What's next? Well, I'm not planning any more large changes. But first, it's still possible there might be regressions caused by some of this code which hopefully will be spotted before 2.6 release. And second, I expect there will be some areas of backup and restore which weren't exercised by my large test courses, and which can still potentially time out - so we might need to add progress reporting calls to more areas of the code that I didn't yet spot. The general principle is that if some operation might take longer than, say, a minute, it's probably necessary to find a way of calling into the progress API periodically during that operation.

Hopefully this is more than anyone wanted to know about the backup/restore changes. smile For most people I expect all they'll notice is that they'll get a progress bar when they do backup or restore.

Finally, thanks to all the developers who helped with peer reviews and integration reviews for the many changes related to this! And also to those who worked on (or are still working on) their own separate changes to improve it.

 

Permalink 3 comments (latest comment by Estefano Ramirez, Monday, 31 Aug 2020, 16:01)
Share post

This blog might contain posts that are only visible to logged-in users, or where only logged-in users can comment. If you have an account on the system, please log in for full access.

Total visits to this blog: 256328