Last weekend I saw a tweet mentioning Steve Stedman’s DataBase Corruption Challenge (DBCC) where he publishes a database that’s been corrupted and offers a challenge to the community to recover the database (hopefully) without any data loss. What a great idea! I had to enter and after some digging around figured out a way to bring the database online with all data intact.
Of the 21 successful entries there were many different solutions. Steve wrote that one even involved modifying the database files on disk using a hex editor. A hex editor?! Which crazy person would do that? Well… umm…. actually… that was me. This post covers how I found the problem and what I did to fix the corruption.
(You’ll note in my final solution I don’t open the database files in a hex editor however most of my workings out and modifications used one. For simplicity I’ve switched to using DBCC commands as they’re easier to repeat.)
Steve’s original blog covers how to get the corrupt database restored and ready for troubleshooting so I wont mention that; so let’s start with the DBCC CHECKDB error message instead.
|
Msg 8944, Level 16, State 13, Line 2 Table error: Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data), page (1:280), row 3. Test (ColumnOffsets <= (nextRec – pRec)) failed. Values are 3139 and 288. Msg 8944, Level 16, State 13, Line 2 Table error: Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data), page (1:280), row 3. Test (ColumnOffsets <= (nextRec – pRec)) failed. Values are 3139 and 288. Msg 8928, Level 16, State 1, Line 2 Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data): Page (1:280) could not be processed. See other errors for details. Msg 8976, Level 16, State 1, Line 2 Table error: Object ID 245575913, index ID 1, partition ID 72057594040614912, alloc unit ID 72057594045857792 (type In-row data). Page (1:280) was not seen in the scan although its parent (1:281) and previous (1:246) refer to it. Check any previous errors. CHECKDB found 0 allocation errors and 4 consistency errors in table ‘Revenue’ (object ID 245575913). CHECKDB found 0 allocation errors and 4 consistency errors in database ‘CorruptionChallenge1′. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (CorruptionChallenge1) |
The first line of the DBCC error message is very helpful as it quickly identifies the page that was corrupt (file 1 page 280) and the record in that page (record 3).
The object with object ID
245575913 is the table
Revenue and we can use DBCC IND to find out that page 1:280 is part of the clustered index. Unfortunately this is a pain – because it’s a clustered index then a simple rebuild won’t fix the problem. Instead let’s look into the header of record 3 and try to figure out what’s wrong.
First we can view the page structure using DBCC PAGE:
|
dbcc traceon(3604) -- Display dbcc page output in console dbcc page('CorruptionChallenge1',1,280,2) |
At the end of the page is the record offset array which is used to locate individual records. From the offset array we can see that record 3 starts at offset 960 (0x3c0). (Note: row numbers actually start from 0. This is important as otherwise we will be wasting our time figuring out what’s corrupt about a non corrupt row. Not that that happened of course!)
|
... rest of page ... 5 (0x5) - 1536 (0x600) 4 (0x4) - 1248 (0x4e0) 3 (0x3) - 960 (0x3c0) 2 (0x2) - 672 (0x2a0) 1 (0x1) - 384 (0x180) 0 (0x0) - 96 (0x60) |
Before looking at the actual corrupt record, a quick recap of the SQL Server record structure. A record comprises of the record header followed by the fixed length data, the NULL bitmap then finally data relating to the variable length columns (see Paul Randal’s blog for more information).

As we know the table where the corruption was found we can look at its schema to see how an individual record should look.
|
CREATE TABLE [dbo].[Revenue]( [id] [int] IDENTITY(1,1) NOT NULL, [DepartmentID] [int] NULL, [Revenue] [int] NULL, [Year] [int] NULL, [Notes] [varchar](300) NULL ) |
The four fixed length columns are all 4 byte integers so the fixed length portion of the record takes up 20 bytes.

Note: I’ve deleted the last two columns for simplicity
Now let’s look at the corrupt record in detail. Below is the relevant part of the output of DBCC PAGE for record 3 in page 280.

We can transpose the red portion of the record onto our table. Note that you need to read multibyte values from right-to-left. So a 2-byte value of
1400 in the DBCC PAGE output is actually
0x0014 . In my diagram and description below I’ve reversed the bytes so it’s easier to understand, but we’ll need to re-reverse them later when writing the fix back to the page.

The fixed length portion of the record isn’t interesting so I’ll skip it.
Bytes 20-23 contain the total number of columns in the record. The value is
0x6161 which is 24929 in decimal. There’s definitely something fishy as we know from looking at the table DDL that there are five columns, right?
Well…. not quite. Because the clustered index is non-unique there’s an additional uniquifier column which takes the total number of columns per record to six (to confirm this you could look at bytes 20-23 of row 2 and see the value
0x0006 (6 in decimal)).
So let’s change the value from
0x6161 to match the actual number of columns (
0x0006 ).

To make the change we can use DBCC WRITEPAGE. This has the benefit of recalculating the page checksum for us as Steve was clever enough to make sure the page checksum was valid after his modifications. WRITEPAGE takes the file and page ID and also the offset in that page. In our case this is 980 bytes – 960 bytes to the start of the 3rd record and then 20 bytes into that record.
IMPORTANT NOTE! Please be careful running DBCC WRITEPAGE. It blindly trusts the input you give and therefore can very easily corrupt a database. Do not run it in production unless you fancy a career change!
|
DBCC WRITEPAGE('CorruptionChallenge1',1,280, 980,2,0x0600,0) |
Calling WRITEPAGE returns an error but maybe that fixed the problem?
|
USE CorruptionChallenge1; DBCC CheckDB() with no_infomsgs; Msg 8951, Level 16, State 1, Line 58 Table error: table 'Revenue' (ID 245575913). Data row does not have a matching index row in the index 'ncBadNameForAnIndex' (ID 3). Possible missing or invalid keys for the index row matching: Msg 8955, Level 16, State 1, Line 58 Data row (1:280:3) identified by (id = 31 and UNIQUIFIER = 0) with index values 'Year = 2008 and id = 31 and UNIQUIFIER = 0'. Msg 8952, Level 16, State 1, Line 58 Table error: table 'Revenue' (ID 245575913). Index row in index 'ncBadNameForAnIndex' (ID 3) does not match any data row. Possible extra or invalid keys for: Msg 8956, Level 16, State 1, Line 58 Index row (1:286:1) with values (Year = 2008 and id = 31 and UNIQUIFIER = 0) pointing to the data row identified by (id = 31 and UNIQUIFIER = 0). CHECKDB found 0 allocation errors and 2 consistency errors in table 'Revenue' (object ID 245575913). CHECKDB found 0 allocation errors and 2 consistency errors in database 'CorruptionChallenge1'. repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (CorruptionChallenge1). |
Hmm. Something’s still not right. The error message still mentions record 3 of page 280 so let’s carry on looking at the broken record.

Byte 24 contains the NULL bitmap itself (in blue). The bitmap uses one bit per column so the total size should be number of columns/8. With six columns the bitmap will take up just a single byte. In our corrupt row the value is
0x61 which in binary is
11000010 . This took some investigating but in SQL 2014 unused parts of the NULL bitmap are set to 0. As there are only 6 columns this means the leftmost 2 bits should be
00 , but they’re not.
Reconstructing the NULL bitmap involved some guesswork. I decided to copy the NULL bitmap from row 2 (value
0x00 ) but it was possible that row 3 had some null columns which in row 2 weren’t null. In hindsight I could have used one of the techniques in Steve’s answer to view the data using a different index but I didn’t think of that at the time.
Adding the extra byte to DBCC WRITEPAGE results in the following command:
|
DBCC WRITEPAGE('CorruptionChallenge1',1,280, 980 ,3,0x060000,0) |
This time no error. Let’s run CHECKDB:
|
USE CorruptionChallenge1; DBCC CheckDB() with no_infomsgs; ------ Command(s) completed successfully. |
Looking better! Can we retrieve data from the table?

Yes – we can even see the record we fixed (in red).
The corruption has been fixed and the database is back online – it’s as simple as that!
Conclusion
Of course this is not something you want to be doing with production data. It’s one thing playing around on a Sunday afternoon but is a completely different activity under pressure from users to get their system online. In my environment I’d definitely be reaching for a recent backup instead!