The term ‘software testing’ can be associated with a very simple yet essential question: ‘does it do what it supposed to do?’
There is, of course, a clear and obvious link to the topic of requirements, which express what software should do from the perspective of different stakeholders. A complexity lies in the fact that different stakeholders can have requirements that can sometimes conflict with each other.
Ideally it should be possible to trace software requirements all the way through to software code. The extent to which formal traceability is required, and the types of tests you need to carry out will depend on the character of the software that you are building. The tests that you need for a real-time healthcare monitor will be quite different to the tests you need for a consumer website.
Due to the differences in the scale, type and character of software, software testing is a large topic in software engineering. Chapter 5 of SWEBOK v4, the software engineering body of knowledge highlights different levels of test: unit testing, integration testing, system testing, and acceptance testing. It also highlights different types of test: conformance, compliance, installation, alpha and beta, regression, prioritization, non-functional, security, privacy, API, configuration, and usability.
In the article, The Practical Test Pyramid, Ham Vocke describes a simple model: a test pyramid. At the bottom of the pyramid are unit tests that test code. These unit tests run quickly. At the top, there are user interface tests, which can take time to complete. In the middle there is something called service tests (which can also be known as component tests). Vocke’s article is pretty long, and quickly gets into a lot of technical detail.
What follows are some highlights from some Software Engineering radio episodes that are about testing. A couple of these podcasts mention this test pyramid. Although testing is a broad subject, the podcasts that I’ve chosen emphasise unit testing.
The first podcast concerns the history of unit testing. The last podcast featured in this article offers some thoughts about where the practice of ‘testing’ may be heading. Before sharing some personal reflections, some other types of test are briefly mentioned.
The History of JUnit and the Future of Testing
Returning to the opening question, how do you know your software does what it supposed to do? A simple answer is: you get your software to do things, and then check to see if it has done what you expect. It is this principle that underpins a testing framework called JUnit, which is used with software written using the Java programming language.
The episode SE Radio 167: The History of JUnit and the Future of Testing with Kent Beck begins with a short history of the JUnit framework (3:20). The simple idea of JUnit is that you are able to write tests as code; one bit of code tests another. All tests are run by a test framework which tells you which tests pass and which tests fail. An important reflection by Beck is that when you read a test, it should tell you a story. Beck goes on to say that someone reading a test should understand something important about the software code. Tests are also about communication; “if you have a test and it doesn’t help your understanding … it is probably a useless test”.
Beck is asked to explain the concept of Test Driven Development (TDD) (14:00). He describes it as “a crazy idea that when you want to code, you write a test that fails”. The test only passes when that code that does what the test expects. The podcast discussion suggests that a product might contain thousands of tiny tests, with the implication that there might be as much testing code as production code; the code that implements features and solves problems.
When considering the future of testing (45:20) there was the suggestion that “tests will become as important to programming as the compiler”. This implies that tests give the engineers useful feedback. This may be especially significant during periods of maintenance, when code begins to adapt and change. There was also an expression of the notion that engineers could “design for testability” which means that unit tests have more value.
Although the podcast presents a helpful summary of unit testing, there is an obvious question which needs asking, which is: what unit tests should engineers be creating? One school of thought is that engineers should create tests that cover as much of the software code as possible, also known as code coverage. Chapter 5 of SWEBOK shares a large number of useful test techniques that can help with the creation of tests (5-10).
Since errors can sometimes creep into conditional statement and loops, a well known technique is known as boundary-value analysis. Put more simply, given a problem, such as choosing a number of an item from a menu, does the software do what it supposed to do if the highest number is selected (say, 50)? Also, does it continue to work if the highest number just before a boundary is selected (say, 49)?
Working Effectively with Unit Tests
Another podcast on unit testing is SE Radio 256: Jay Fields on Working Effectively with Unit Tests. Between 30:00 and 33:00, there is an interesting discussion that highlights some of the terms that feature within Vocke’s article. A test that doesn’t cross any boundaries and focus on a single class could be termed a ‘solitary unit test’. This can be contrasted with a ‘sociable unit test’, where tests work together with each other; one test may influence another. Other terms are introduced, such as stubs and mocks, which are again mentioned by Vocke.
Automated Testing with Generative AI
To deliberately mix a metaphor, a glimpse of the (potential) future can be heard within SE Radio 633: Itamar Friedman on Automated Testing with Generative AI. The big (and simple) idea is to have AI helper to have a look at your software and ask it to generate test cases for you. A tool called CoverAgent was mentioned, along with an article entitled Automated Unit Test Improvement using Large Language Models at Meta (2024). A point is: you still need a software engineer to sense check what is created. AI tools will not solve your problems, since these automated code centric tools know nothing of your software requirements and your software engineering priorities.
Since we are beginning to consider artificial intelligence, this leads onto another obvious question, which is: how do we go about testing AI? Also, how do we make sure they do not embody or perpetuate biases or security risks, especially if they are used to help solve software engineering problems.
Different types of testing
The SWEBOK states that “software testing is usually performed at different levels throughout development and maintenance” (p.5-6). The key levels are: unit, integration, system and acceptance.
Unit testing is carried out on individual “subprograms or components” and is “typically, but not always, the person who wrote the code” (p.5-6). Integration testing “verifies the interaction among” system under test components. This is testing where different parts of the system are brought together. This may need different test objectives to be completed. System testing goes even wider and “is usually considered appropriate for assessing non-functional system requirements, such as security, privacy, speed, accuracy, and reliability” (p.5-7). Acceptance testing is all about whether it is accepted by key stakeholders, and relate back to key requirements. In other words, “it is run by or with the end-users to perform those functions and tasks for which the software was built”.
To complete a ‘test level’ a number of test objectives may need to be satisfied or completed. The SWEBOK presents 12 of these. I will have a quick look at two of them: regression tests, and usability testing.
Regression testing is defined as “selective retesting of a SUT to verify that modifications have not caused unintended effects and that the SUT still complies with its specified requirements” (5-8). SUT is, of course, an abbreviation for ‘system under test’. Put another way a regression test check to make sure that any change you have made hasn’t messed anything up. One of the benefits of unit testing frameworks such as JUnit is that it is possible to quickly and easily run a series of unit tests, to carry out a regression test.
Usability testing is defined as “testing the software functions that support user tasks, the documentation that aids users, and the system’s ability to recover from user errors” (5-10), and sits at the top of the test pyramid. User testing should involve real users. In addition to user testing there are, of course, automated tools that help software engineers to make sure that a product deployment works with different brewers and devices.
Reflections
When I worked as a software engineer, I used JUnit to solve a very particular problem. I needed to create a data structure that is known as a circular queue. I wouldn’t need to write it in the same way these days since Java now has more useful libraries. At the time, I needed to make sure that my queue code did what I expected it to. To give me confidence in the code I had created, I wrote a bunch of tests. I enjoyed seeing the tests pass whenever I recompiled my code.
I liked JUnit. I specifically liked the declarative nature of the tests that I created. My code did something, but my tests described what my code did. Creating a test was a bit like writing a specification. I remember applying a variety of techniques. I used boundary-value analysis to look at the status of my queue when it was in different states: when it was nearly full, and when it was full.
Reflecting Beck, I appreciated that my tests also told a story. I also appreciated that these tests might not only be for me, but might be useful for other developers who might have the misfortune of working with my code in the future.
The other aspect of unit testing that I liked was that it proactively added friction to the code. If I started to maintain it, pulling apart function and classes, the tests would begin to break. The tests became statements of ‘what should be’. I didn’t view tests in terms of their code coverage (to make sure that every single bit of software was evaluated) but in terms of simple practical tools that gave alternative expressions of the purpose of my software. In turn, they helped me to move forward.
It is interesting and useful to reflect on the differences between the test pyramid and the SWEBOK test levels. In some respect, the UI testing of the pyramid can be aligned with acceptance testing of the SWEBOK. I do consider the integration and system testing to be helpful.
An important point that I haven’t discussed is the question of when should a software engineer carry out testing? A simple answer is, of course, as soon as practically possible. The longer it takes to identify an issue, the more significant the impact and the greater the economic cost. The ideal of early testing (or early problem detection) is reflected in the term ‘shift-left’ testing, which essentially means ‘try to carry out testing towards the left hand side of your project plan’. Put even more simply: the earlier the better.
Returning to the overriding aim of software testing, testing isn’t just about figuring out whether your software does what it supposed to do. It is also about managing risk. If there are significant societal, environmental, institutional and individual impacts if software doesn’t work, you or your organisation needs to do whatever it can to ensure that everything is as correct and as effective as possible. Another point is that sometimes the weak spot isn’t the code, but in the spaces where people and technology intersects. Testing is socio-technical.
To conclude, it is worth asking a final question. Where is the software testing heading? Some of these podcasts suggest some pointers. In the recently past, we have seen the emergence of automation and the engineering of software development pipelines to facilitate continual deployment or delivery of software. I do expect that artificial intelligence, in one form or another, will influence testing practice, but AI tools can’t know everything about our requirements. There will be testing using artificial intelligence and testing of artificial intelligence. As software reaches into so many different areas of society, there will also be testing for sustainability.
Resources
JUnit is one of many bits of technology that can help to automate software testing. Two other tools s I have heard of are called Cucumber which implements a language called Gherkin, a formal but human-readable language which is used to describe test cases. I’m also aware of something called Selenium which is “a suite of tools for automating web browsers”.
Since software testing is such an important specialism within software engineering, there are a series of industrial certifications that have been created by the International Software Testing Qualifications Board (ISTQB). As well as offering foundation level certifications, there are also certifications for specialisms such as agile, security and usability. Many of the topics mentioned in the certifications are also mentioned in Chapter 5 of SWEBOK v4.
I was alerted to a site called the Ministry of Testing which shares details of UK conferences and events about testing and software quality.
One of the points that I picked up from the podcasts was the point that, when working at forefront of an engineering subject, there is a lot of sharing that takes place through blogs. A name that was mentioned was Dan North who has written two articles that resonate: We need to talk about testing (or how programmers and testers can work together for a happy and fulfilling life), and Introducing BDD (BDD being an abbreviation for Behaviour Driven Development).
Acknowledgements
Many thanks to Josh King, a fellow TM354 tutor, who was kind enough to share some useful resources about testing.