Integration Tests Data Cleanup in C#

When writing integration tests against a database, we have to be think about the following three points:

  • Test Data Handling – Create test data and clean it up after the test run
  • Test Isolation – Ensure that no other test cases influence the current one
  • Execution Performance – All tests should execute within reasonable time

In the next sections I want to look at some of the common strategies, solutions and why they might not always be a good fit.

Random assortment of cables and computer parts, not very well integrated at all

Transaction & Rollback

One of the most common strategies when it comes to test isolation is to use transactions, encapsule the test with it and rollback any database changes at the end. C# offers a nice solution for this, namely the TransactionScope class in combination with an using scope.

using var scope = new TransactionScope();

This can then also be neatly packed into an abstract class and used as base class for every test class. Meaning at the start of the test ([TestInitialize]) the transaction scope is created and at the end ([TestCleanup]) the scope is discarded and consequently the transaction is rolledback.

This works very well if you have just one database you have to take care of.

Once you move to multiple databases, it becomes more complicated, as your transaction doesn’t just apply to this one database anymore, but it needs to be distributed across multiple databases. In order for this to happen, Microsoft makes use of the Microsoft Distributed Transaction Coordinator (MSDTC), meaning if you’re on Windows, things should continue to work just fine, but if you try to run the test on Linux, it will fail, as no Distributed Transaction Coordinator can be found1).

There are workarounds to prevent distributed transactions by setting enlist=false on the connection string, essentially saying that transactions on this connection string shouldn’t be distributed. This works for simple cases, but has two potential issues:

  • If you’re using transactions in your application, the transaction might not be properly respected
  • If you have close coupling between multiple databases, you can’t easily ensure that the data is cleanup and have to manually write cleanup statements

1) Looks like there is a MSDTC for Linux these days, so maybe one just needs to configure that one properly.

Restore Database / Snapshot

One of the next obvious step is to simply restore the whole databases or at least a snapshot of it every test run. This ensures on a fundamental level, that no test can influence the other.

It’s a bit more tricky to implement than a TransactionScope, but still rather straight forward .

However, this comes with two issues:

  • It’s painfully slow. Given that you’d want to do this for every test case and thus it might only be useful for a selected few cases, which don’t work with enlist=false
  • When a test run is halted in a way that the cleanup code isn’t executed, you can end up with half restored databases or lingering snapshot files

In-Memory Database

On paper this often reads like a silver bullet, as it’s supposedly rather fast and also fundamentally ensures test isolation. If you use Entity Framework, you’ll need EF Core and .NET Core or newer, which is a non-option for all the .NET Framework projects out there.

Additionally, you’re not running your test against a real DBMS (Database Management System) and thus won’t be noticing potential quirks of a real DBMS. Plus some specific magic just might not be supported with in-memory databases. Jimmy Bogard wrote a more detailed post about this topic, in short if you really want to test your code using a database, you should be running the test against the actual database.

Disable Foreign Keys & Truncate the Tables

Deleting the contents of a database table can be very slow, as it requires checks on every foreign key and potential cascading deletions. The solution would be to truncate the table instead, but in order to do that, you’ll need to disable foreign key constraints first, otherwise the truncation will simply fail.

This works well and can be a viable solution in some cases. With larger test sets however, each foreign key disabling can cost you somewhere around half a second, which for 3000 tests for example would already cost you 25min without having any tests run. It’s still a lot faster than using snapshots for every test, but the cost might be too high for some projects.

Test Database & Delete the Tables

While foreign key constraints are certainly slowing down delete statements, this is only really true, if you have a lot of data in your tables. A deletion on an empty table or a tables with a handful of entries is actually quite fast. This means that if you create a database backup that doesn’t contain any data or only important reference data, you can still utilize deletions against a so called test database.

In order to have data isolated between tests, you won’t just use mostly empty tables, but you actively delete all the non-reference data before every test execution. That way you can be sure that every test starts with a clean slate and builds up its own tests data.

The deletion of that data however still needs to follow the foreign key constraints and so it can be quite tedious work to track the correct order of deletes.

Arguments can be made that using separate test databases is a downside, but I’ve noticed the opposite, as it separates the database used to develop against from the databased used to run the tests against. So when you change some test data during the development and then run a test, your data won’t influence the test and it won’t be deleted by the test. Additionally, you can utilize the same exact database back ups for running tests locally and running tests in your CI pipeline.

Track Foreign Keys & Delete the Tables

One trick on top of using separate test databases and deleting tables is to automatically track foreign key constraints and delete the various tables in smart way, starting from tables without or fewer relationships and working inwards until all the tables have been deleted.

This is one of the fastest solutions that I’ve come across, as it doesn’t require foreign key disabling or deletion and thus beats truncating measurably. Additionally, since the deletions happen at every test start, you have a higher certainty that the database is in a clean state, where other methods tend to only clean up after the test execution, which can leave some lingering data if the test is aborted mid run.

I’m not crazy enough to implement this smart foreign key tracking myself, but I instead used the Respawn library, which is dead simple to use and works great, so I can highly recommend it!

Conclusion

The point of this post isn’t to argue that there’s just one perfect way to handle data in tests, but to show the benefits and downsides of each option, so you get to choose which option works best for your own project setup.

When we moved from a mix of truncating, manual deleting and use of snapshots to smart deleting with Respawn, we saved around 10-15min of test execution time, but we also saved ourselves a lot of maintenance overhead and required brainpower to write tests. Now, you don’t have to think about how the tests data is tracked or cleaned or anything (in 99% of the cases), but you just write tests and the magic of Respawn will keep the database clean.

Leave a Comment

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.