One of the great things about Moodle is that it can be used with a number of different database systems. It can use popular open source databases such as MySQL or Postgres, or commercial offerings from Oracle or Microsoft.
The great thing about offering this level of flexibility is that it can make the adoption of Moodle into an existing IT infrastructure a whole lot easier. If you have an IT department which is Microsoft centric, then adopting Moodle or slotting it into an existing IT infrastructure might not cause too much upset. Similarly, if your IT department uses Linux and has a dedicated database server that runs Postgres, offering choice of back end technologies can make things easier for system administrators.
This post is all about exploring how Moodle works with so many different database systems. The keywords that I am starting with is database abstraction layer. Wikipedia defines a database abstraction layer as ‘an application programming interface which unifies the communication between a computer application and databases’. In some cases, a database abstraction layer can also help to maintain good application performance by caching important data, avoiding the need to repeatedly request data from a database engine.
Here are my questions: how does a Moodle developer save stuff to and get stuff from a database? Does Moodle have a database abstraction layer? If it does, how might it work? Finally, are there other database abstraction layers or mechanisms out there that could be used? Let’s begin with the first question.
Getting stuff in and out
What instructions or mechanisms do developers used to get data into and out of Moodle, or a database that Moodle is using? My first port of call is the Moodle documentation. After a couple of clicks I find something called the Moodle Database Abstraction Layer. This looks interesting but way too complicated (and initially confusing) for me to understand in one go. What I’m interested in is an example.
I turn to the Moodle codebase and using my development environment I perform a text search (or grep) with the word SELECT, which I know to be a frequently used part of the SQL database language which underpins most relational database systems, and browse through the results. I quickly uncover a function called get_record_sql which seems to be the way to send SQL language commands to a database.
Another search reveals that the function is defined within a file called dmlib.php. This library is said to contain all the Data Manipulation Language functions used to interact with the DB. Comments within the file are reported to be ‘generic’ and work with a number of different databases. A link to a documentation page is also provided, but seems to be describe functions that relate to the development version of Moodle, not the version that I am using (version 1.9).
It seems that functions named get_record_sql, get_record_select and update_record (amongst others) are all used to write to and read from a database that is used with Moodle. To write new Moodle modules requires a developer to know a vocabulary of abstraction functions.
The second question can be answered relatively easily: Moodle does seem to have a database abstraction layer. Judging from the documentation it seems to have two different types of abstraction layers: one for the usage of a database, another for the creation of database structures. I’ll try to write something about this second type in another post.
How does it work?
How does the Moodle abstraction layer work? How does it act as an intermediary between the Moodle application and the chosen database engine? There seems to be a magic global variable called $db, and the abstraction layer code seems to be replete with comments about something called ADOdb. Is ADOdb the magic that speaks to the different databases?
Another search for the phrase '$db =’ yields a set of interesting results, including files contained within a folder called adodb (lib/adodb). This seems to be a database access library for PHP. I uncover a link to the ADOdb sourceforge project from where the code originated and I’m rudely confronted with some sample code.
At this point, it seems that Moodle uses different two layers to 'get' and 'set' data. It begins with the Moodle-world functions (the database manipulation language functions). Calls are then passed to ADOdb, where they are magically ushered towards the database.
Other questions come to mind, such as: why did the Moodle developers choose ADOdb? This question does not have an answer that can be easily uncovered.
Other abstraction layers
A quick glance at two of my PHP books point towards different database (or data) abstraction layers. My copy of Programming PHP, for example, emphasises the use of a library called PEAR DB (named after the PHP Extension and Application Repository). Clicking this previous link tells me that the PEAR DB library has since been replaced by something called MDB2. My PHP Cookbook, on the other hand emphasises the use of PDO, which is a part of PHP 5 (a version of the PHP engine that the Moodle community has only relatively recently adopted).
So, why did the Moodle developers choose ADOdb when there are all these other mechanisms on offer? I haven't managed to uncover forum discussion that explains the precise motivation for the choice. Moodle release notes go back to May 2005, but the earliest forum discussion I can find that relates to ADOdb dates back to 2002. Perhaps the choice could be put down as a happy accident of history and one that has facilitated amazing database interoperability.
One thing is clear: PDO is the (relatively) new kid on the 'database abstraction' block, and other software developers are asking the interesting (and difficult to answer) question of 'ADOdb or PDO: which is better?' In trying to answer this question myself, I uncovered a slideshare presentation and a blog post that tries to compare the two technologies by using benchmarks to see which is faster. PDO, it seems, is a central part of PHP 5 and has been written in 'native code' which might explain why is reported as being faster.
The debates about which database interface technology is better are interesting but don't directly arrive at a clear conclusion. Different technologies may do similar things in slightly different ways, and sometimes a choice of one or the other may boil down to what the programmers have used in the past. Unpicking the subtle advantages and disadvantages of each approach needs lots of time and determination. And when you have an answer, affecting a change may be difficult.
I recently uncovered a really interesting Moodle forum discussion on the topic of database abstraction (amongst other things). Subjects included differences between various database systems, the possibility of using stored procedures, the difficulty of mapping object-oriented data structures to relational database engines and so on. All great fun for computer scientists and application developers!
One thing bugs me about the Moodle database abstraction layer is that it is very shallow. It requires module developers to know a lot of stuff about things that ideally they shouldn't need to know about. To add courses and modules, you have to know a little about the structure of the Moodle database and how to work with it. There is very little code that separates the world of SQL statements (passed on to databases using DML and ADOdb) and the interfaces that are presented to users.
It could be argued adding additional layers of abstraction to more firmly manage data flow between Moodle application code and the database would place additional demands on the Moodle programmers. In turn, it this could make it harder for occasional contributors, particularly those working within academic institutions to make contributions to the code base. I strongly disagree with this argument. Creating a more sophisticated (or layered) database abstraction approach may open up the possibility of making more effective use of functions of different database engines and make the Moodle code base easier to understand (if the abstractions are designed correctly).
One way to consider ways about how the abstraction layer might be improved is to look at how other open source projects solve the same problem. I was recently told about the Drupal database abstraction layer. One useful activity might be to investigate its design and learn about what decisions have helped to guide its development.
Databases can be a tough subject. Creating an application that can work with different database engines effectively and efficiently is a big software engineering challenge. This challenge, on the other hand, can make things a lot easier for those people who are responsible for the management and operation of IT services. Providing application choice can increase the opportunities for an application to be used.
What is certain is that the database abstraction mechanisms that are currently used in Moodle will change as Moodle evolves and database engines are updated. At the time of writing work is underway to further develop the Moodle database abstraction layer. I look forward to seeing how it changes.
Image acknowledgement: pinksherbert, from Flickr.