Chaos has been caused by the Government using Excel as a repository for Covid-19 test results in England.
Records have been lost and not reported, and there has been huge reputational damage to the Government and a loss of confidence in the official figures. The methodology used has become the butt of many jokes on social media.
The BBC (https://www.bbc.co.uk/news/technology-54423988) puts the blame on Public Health England for using an old file format (.XLS) which only allows 65,536 rows as opposed to the new post 2007 file format (.XLSX) which allows 1,048,576 rows, which is a huge difference.
The source data was provided by a number of commercial firms and the transfer medium was CSV (comma separated value) files. Bearing in mind the purpose of the data, these files could be very large, sometimes larger than 65,536 rows.
Excel is very useful for consolidating data and presenting it in any easy to read dashboard format, using charts and summaries, but this is only as good as the integrity of the imported data.
One of the problems with using the old XLS format is that even if you load the file into a post 2007 version of Excel, it still remains at 65,536 rows unless you do ‘Save As’ and save it in the new XLSX format. This looks like it was not done.
I suspect that the developers ‘cannibalised’ an old XLS dashboard without considering the lack of row capacity. This probably had VBA code to automate loading the CSV files into Excel.
If VBA is used to load in a CSV file into an XLS file, there is no warning that the number of rows has been exceeded and the data truncated. I have tested this in Excel 2003 using a very large CSV file (1m records) and using VBA code to open the file, no warning message is shown that the data is truncated.
It is easy to see why the error has occurred. It is basically down to users / developers not being aware of the row constraint on XLS files.
The whole import procedure was probably working quite well until one day a CSV file came along that was over the limit of 65,536 rows, and data was lost.
How this could have been prevented
In the import code within VBA, once the CSV file was imported, a copy should have been saved off as a CSV file under a different name. VBA could then be used to check the copy file size against the original. If the sizes did not agree, then this would be flagged up as a danger.
Using a reference in VBA to Microsoft Active Data Objects (ADO), VBA could link directly to each of the CSV files and use each file as a database table. An ADO recordset could be created using SQL (Structured Query Language) for each CSV file, number of rows would be checked against the recordset ‘RecordCount’ property and the worksheet ‘Rows.Count’ property, and then using the ‘CopyFromRecordset’ method the data would be copied into the worksheet.
Using ADO, it is also possible to connect back to the current workbook and use each worksheet as a separate table, provide that the information is tabular with column headings at row 1. This enables full SQL commands to be used to manipulate the data with enormous flexibility, and effectively turns Excel into a full database application.
Working on the assumption that the Excel workbook was ‘cannibalised’ from an earlier application, it would have been better if the Excel application had been developed from scratch using XLSX files. However, I suspect that there was a time constraint here and the whole thing was rushed through without time to create an original application.
Excel can be used as a database, having the advantage of being able to service a dashboard within the same application.
The problem here is poor design and implementation which did not cover every possible import file size. Testing was also lacking – large CSV files should have been used for testing and checks made to see that all data was appearing.
It is possible that the developers of the application did not understand the full capabilities of Excel, particularly using ADO, and sadly, this has reflected badly on the use of Excel workbooks for solving problems like this.
This article was commissioned from one of our senior consultants at The Excel Experts. Further inquiries please email firstname.lastname@example.org with the subject line #CallTheExperts .