ÿþ<html> <title>antapex.org : Solving SQL Server corruption</title> <body> <h1>Some notes on handling SQL Server corruption</h1> <B>Version</B> : 1.1<br> <B>Date</B> : 23/09/2010<br> <B>By</B> : Albert van der Sel<br> <B>Type of doc</B> : It's just a few notes on SQL Server corruption in very simple words. It's no more than "entry level".<br> <B>For who</B> : For anyone who likes a short orientation on the subject.<br> <hr/> <br> <font face="arial" size=2 color="black"> <br> When you face a serious corruption in a SQL Server database, you would normally grab your<br> most recent backups.<br> <br> Secondly, if you cannot repair it, or you can't apply backups, you ofcourse should contact Microsoft Support at once.<br> <br> If, for some reason, the above is not possible, you might consider the idea's which are presented in this note.<br> Be warned though, that the material below, is just slightly above "entry level". So please don't expect<br> tremendously clever ideas. It's all just "average" stuff.<br> <br> It's indeed true that some commercial third-party repair tools exist, which I think do a good job.<br> I recommend you to do a little research on those tools, and maybe you then consider to purchase one.<br> However, this document will not refer to, or describe those tools, in any way.<br> <br> In a sense, this document might be considered as a (lightweight) demo for certain simple recoveries of corruption.<br> <br> Although you do not need to be an expert, it is assumed that you have at least a "reasonable" level of knowledge of SQL Server.<br> <br> Also, I did my best to put trustworthy and correct information in this note, but due to the "nature of the game",<br> use it at your own risk.<br> <br> Chapter 1 just describes the creation of a Demo database which will be used to simulate several sorts of corruption.<br> But, as of Chapter 2, real scenario's are described.<br> <br> <B>Contents:<br> <br> Chapter 1. Creating the example database, and getting a hex editor.<br> <br> -- <I>Notes on object- or data page corruption:</I><br> Chapter 2: "Soft" corruption on (ordinary) data and index pages (corrupt table).<br> Chapter 3: Severe corruption on (ordinary) data and index pages (corrupt table).<br> <br> -- <I>Notes on file- or admin page corruption:</I><br> Chapter 4: Inaccesible or Suspect database due to a Missing or Corrupt Transaction log.<br> Chapter 5: Inaccesible or Suspect database due to Severe corruption on a .ndf file.<br> <br> -- <I>Other notes:</I><br> Chapter 6: Overview of Microsoft SQL Server Repair commands.<br> Chapter 7: Using queries to salvage data from corrupt tables.<br> </B> <br> <br> <h2>Chapter 1. Creating the example database, and getting a hex editor.</h2><br> <B>Example Database:</B><br> <br> In the following sections, we will use the following example database.<br> With the script below, it's easy to create the demo database. Note that we have several filegroups and multiple logfiles,<br> in order to "simulate" a certain level of "complexity".<br> Also, we create a few sample tables, and index, on different filegroups.<br> <br> To follow this note, you do not need to create this sample database. You can just browse through, or just read this note.<br> But, if you like to repeat the actions yourself, it is really advisable to create it.<br> And, since we have a script, it's not too bad if you hose up the database. Creating it again is just a matter of seconds.<br> <br> I really advise you to create the database on SQL Server 2005 or 2008, and follow along with me, starting at Chapter 2,<br> and work your way to each following chapter.<br> <br> <font face="courier" size=2 color="black"> create database SALES<br> on PRIMARY<br> (<br> name='SALES',<br> filename='c:\mssql\data\SALES.mdf',<br> size=40MB,<br> filegrowth= 10MB,<br> maxsize= 100MB<br> ),<br> FILEGROUP SALESDATA01<br> (<br> name='SALES_DATA_01',<br> filename='c:\mssql\data\SALES_DATA_01.ndf',<br> size= 40MB,<br> filegrowth= 10MB,<br> maxsize= 100MB<br> ),<br> FILEGROUP SALESINDEX01<br> (<br> name='SALES_INDEX_01',<br> filename='c:\mssql\data\SALES_INDEX_01.ndf',<br> size= 40MB,<br> filegrowth= 10MB,<br> maxsize= 100MB<br> )<br> LOG ON<br> (<br> name='SALES_LOG_001',<br> filename='c:\mssql\data\SALES_LOG_001.ldf',<br> size= 40MB,<br> filegrowth= 10MB,<br> maxsize= 100MB<br> )<br> <br> ALTER DATABASE SALES<br> MODIFY FILEGROUP SALESDATA01 DEFAULT<br> GO<br> <br> USE SALES<br> GO<br> <br> CREATE TABLE dbo.EMPLOYEE<br> (<br> EMP_ID INT NOT NULL,<br> EMP_NAME VARCHAR(20) NOT NULL,<br> SALARY DECIMAL(7,2)<br> )<br> ON [SALESDATA01]<br> <br> CREATE TABLE dbo.EMPLOYEE2<br> (<br> EMP_ID INT NOT NULL,<br> EMP_NAME VARCHAR(20) NOT NULL,<br> SALARY DECIMAL(7,2)<br> )<br> ON [PRIMARY]<br> <br> USE SALES<br> GO<br> <br> -- insert data into the EMPLOYEE table in Filegroup SALESDATA01 (the .ndf file)<br> insert into EMPLOYEE<br> values<br> (1,'Harry',2000.50)<br> <br> insert into EMPLOYEE<br> values<br> (2,'Nadia',3000.00)<br> <br> insert into EMPLOYEE<br> values<br> (3,'Albert',5000.00)<br> <br> -- insert data into the EMPLOYEE2 table in Filegroup PRIMARY (the .mdf file)<br> insert into EMPLOYEE2<br> values<br> (1,'Harry',2000.50)<br> <br> insert into EMPLOYEE2<br> values<br> (2,'Nadia',3000.00)<br> <br> insert into EMPLOYEE2<br> values<br> (3,'Albert',5000.00)<br> <br> CREATE NONCLUSTERED INDEX indx_employee_empname ON EMPLOYEE(EMP_NAME) ON SALESINDEX01<br> GO<br> <br> <br> <font face="arial" size=2 color="black"> Ok, we have a database right now.<br> <br> <B>Get a Hex editor:</B><br> <br> Next, to emulate several crashes, we need a <I>hex editor</I>, by which we can open a database file,<br> and "corrupt" the file manually.<br> There are several choices here. You might consider the free utility XVI32, which you can download from<br> the site below.<br> <br> http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm<br> <br> You only have to download the zip file, and extract it. Then you are ready to run it.<br> There exists many alternatives too. Just "google" on "hex editor".<br> But I really recommend xvi32, due to its simplicity and capabilities.<br> <br> <font face="arial" size=2 color="black"> <br> <h2>Chapter 2: A "soft" corruption on an ordinary data page (1).</h2> This section primarily deals with a "soft corruption" based error, in an ordinary data page,<br> and how to understand it. In this section we used SQL Server 2008 32bit.<br> This type of corruption is not very severe. We will do the corruption ourselves, by using<br> a hexeditor, and change some value in a datapage while the database is offline.<br> In this case, we do not "destroy" a page by killing a pageheader or other critical parts.<br> This section only deals with a <I>data</I> alteration, but which will be detected by SQL Server<br> and it will throw an errormessage (if the "page verify" option is higher than "none').<br> <br> The involved error message is:<br> <br> <font face="courier" size=2 color="black"> Message 824: logical consistency-based I/O error: incorrect checksum <br> <br> <font face="arial" size=2 color="black"> As we have seen in Section 1, we have put three records in the EMPLOYEE table. This table is stored<br> in the SALESDAT01 filegroup, which consist of the "c:\mssql\data\SALES_DATA_01.ndf" file.<br> <br> In section 2.1, we will explore where those records are located, that is, in which page the records are stored.<br> Various methods exists. Here we will use the DBCC PAGE() function.<br> <br> In section 2.2, we will use a hexeditor (like xvi32), to emulate a soft corruption, by altering<br> an Employee name, directly in the file itself. Thus we bypass SQL Server, (using the hexeditor) while the sales database is offline.<br> Here we alter a 'non critical page', meaning that we just alter a datapage.<br> What is the effect?<br> <br> <h3>Section 2.1: DBCC PAGE()</h3> Now, the three records of the EMPLOYEE table, are likedly to be located at the "first" usable datapage<br> in the "SALES_DATA_01.ndf" file, which (in my case) is page 8.<br> <br> You know that <B>the very first pages</B> in any database file, are used for administrative purposes, like the "fileheader page" (page 0),<br> the "Page Free Space (PSF) page" (page 1) etc.. It's easy to check that our records are in the 8th page, <br> using the DBCC PAGE() command. Let's try that.<br> <br> First use "DBCC TRACEON (3604)" in your database, in order to view output of the DBCC PAGE() command,<br> otherwise we won't see any output at all.<br> <br> <font face="courier" size=2 color="black"> DBCC TRACEON (3604)<br> GO<br> <br> <font face="arial" size=2 color="black"> Since a database usually consists of several datafiles, a page in a database, is uniquely identified using the "file_id" <br> <I>and</I> the "pagenumber" (in that file), like for example (1,355), meaning page 355 in file number 1.<br> let's first see which file_id's exists in our sales database. This is shown in the figure below.<br> Just use the query <B>"select * from sysfiles"</B>.<br> <br> <img src="sqlcorr1.jpg" align="center"/> <br> So, let's take a look at the database page in file number 3 (SALES_DATA_01.ndf), which contains the three records.<br> We are going to use DBCC PAGE() for that. The command takes a number of parameters, which are:<br> <I>database, file_id, page number, output_mode.</I><br> So, the following would be our statement, to view the contents of page 8:<br> <br> <br> <font face="courier" size=2 color="black"> DBCC PAGE('sales',3,8,3)<br> <br> <font face="arial" size=2 color="black"> This command produces a lot of output, of which below a small sample is shown:<br> <br> <font face="courier" size=2 color="black"> ..<br> Slot 0 Column 2 Offset 0x14 Length 5 Length (physical) 5<br> EMP_NAME = Harry <br> Slot 0 Column 3 Offset 0x8 Length 5 Length (physical) 5<br> SALARY = 2000.50 <br> ..<br> etc..<br> <br> <font face="arial" size=2 color="black"> Anyway, we have shown that the records are indeed stored in page 8.<br> Also, we have learned how to use the DBCC PAGE(0) command.<br> <br> <h3>Section 2.2: Creating a soft corruption in the page:</h3> Now, with our hexeditor, we are going to open the "c:\mssql\data\SALES_DATA_01.ndf" datafile.<br> First, ofcourse, we put the sales database offline, using the Management Console (SSMS), or with the following<br> SQL command:<br> <br> alter database sales set offline<br> <br> So, now start your hexeditor (like "xvi32") and open "c:\mssql\data\SALES_DATA_01.ndf"<br> Since page 8 starts at 8 x 8192= 65536, which is "10000" in hexidecimal, let's scroll down to offset 10000.<br> Below you can see the contents of the file, at that position.<br> <br> <img src="sqlcorr2.jpg" align="center"/> <br> As you can clearly see in the right pane, you see the names of our records, like "Albert".<br> <br> Now, let's "emulate" a soft corruption in that page. Navigate to the string Albert and replace that string<br> with "Nelson". What we have done here, is just replace a piece of text, with some other characters of the same length.<br> Now, save the file, and close your hexeditor.<br> Secondly, try to put the database online, using "alter database sales set online".<br> <br> As you see, the database will open without any complaints. But, actually we <I>have</I> altered a page!<br> Why does SQL Server does NOT throw an error message at us?<br> <br> It would have done that right away, at the moment it was trying to set the database online, if we would have altered<br> a <B>critical page</B>, like page 0 (the fileheader), or the bootpage (page 9 in the .mdf file).<br> Now, we cannot expect SQL Server to check all pages of the database. This is just an ordinary datapage.<br> But it will detect the "corruption" at the moment it accesses that page.<br> <br> <B>Suspect Mode:</B> If you have some sort of corruption, and for example, after a restart SQL Server needs<br> to <I>rollforward or rollback transactions</I>, but it can't for some reason (for example, due to one or more corrupt tables),<br> it will likely place the database in <B>"Suspect"</B> mode.<br> There are other reasons too, as to why SQL Server might place a database in "Suspect" mode. In later chapters, we will see<br> how to deal with this state of a database.<br> <br> Let's try to query the EMPLOYEE table:<br> <br> <font face="courier" size=2 color="black"> SELECT * FROM EMPLOYEE<br> <br> <font face="arial" size=2 color="black"> This will SQL Server accesses the page, and it will discover the alteration we have done.<br> The following output appears:<br> <br> <font face="courier" size=2 color="black"> Msg 824, Level 24, State 2, Line 1<br> SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x794965ef; actual: 0x724e6c66). <br> It occurred during a read of page (3:8) in database ID 5 at offset 0x00000000010000 in file 'c:\mssql\data\SALES_DATA_01.ndf'.<br> Additional messages in the SQL Server error log or system event log may provide more detail. <br> This is a severe error condition that threatens database integrity and must be corrected immediately.<br> <br> <font face="arial" size=2 color="black"> Note that indeed the page involved was page 8 (in file no 3). At the moment that SQL Server accessed the page, it was discovered<br> that the current checksum is different from the checksum stored in the pageheader.<br> Indeed, by altering the Employee name, we altered (a newly computed) checksum as well !<br. <br> Ok, but how to recover from this situation.<br> <br> Did you noticed how SQL server only complained about the wrong checksum?<br> Physically, there is nothing wrong with the page and it's data. We only changed an Employee name.<br> <br> Take a look at this. In SSMS, rightclick the sales database, and chooce "properties".<br> In the "Options" pane, you will see an option called "page verify". The possible options are:<br> Torn Page Detection, Checksum, None.<br> <br> <img src="sqlcorr3.jpg" align="center"/> <br> <br> Now, just change the "page verify" to None, and run the query again.<br> <br> This time, SQL Server does not care anymore on changed page checksums, so it just returns the records from the table,<br> although "formally" a corruption exists.<br> <br> It's also instructive to run the DBCC CHECKTABLE() command on the EMPLOYEE table.<br> <br> While the page verify option is still none, DBCC CHECKTABLE(EMPLOYEE) comes with the following output.<br> <br> <font face="courier" size=2 color="black"> DBCC CHECKTABLE(EMPLOYEE)<br> <br> DBCC results for 'EMPLOYEE'.<br> Msg 8951, Level 16, State 1, Line 1<br> Table error: Table 'EMPLOYEE' (ID 2073058421). Data row does not have a matching index row in index 'indx_employee_empname' (ID 2). Possible missing or invalid keys for the index row matching:<br> Msg 8955, Level 16, State 1, Line 1<br> Data row (3:8:2) identified by (HEAP RID = (3:8:2)) with index values (EMP_NAME = 'Nelson' and HEAP RID = (3:8:2)).<br> Msg 8952, Level 16, State 1, Line 1<br> Table error: Table 'EMPLOYEE' (ID 2073058421). Index row in index 'indx_employee_empname' (ID 2) does not match any data row.<br> Possible extra or invalid keys for:<br> Msg 8956, Level 16, State 1, Line 1<br> Index row (4:8:0) with values (EMP_NAME = 'Albert' and HEAP RID = (3:8:2)) pointing to the data row identified by (HEAP RID = (3:8:2)).<br> There are 3 rows in 1 pages for object "EMPLOYEE".<br> CHECKTABLE found 0 allocation errors and 2 consistency errors in table 'EMPLOYEE' (object ID 2073058421).<br> repair_rebuild is the minimum repair level for the errors found by DBCC CHECKTABLE (SALES.dbo.EMPLOYEE).<br> DBCC execution completed. If DBCC printed error messages, contact your system administrator.<br> <br> <font face="arial" size=2 color="black"> <br> Note also, how the command found that the index entry (of Albert) does not match the table data (which is Nelson).<br> <br> In fact, with this type of soft corruption, you can "repair" the table (actually page 8), and the index entry, by using<br> <br> <font face="courier" size=2 color="black"> DBCC CHECKTABLE(EMPLOYEE,repair_rebuild)<br> <br> <font face="arial" size=2 color="black"> This command will repair the checksum, and makes sure that the index is in sync with the tabledata.<br> If you change the "page verify" option back to CHECKSUM, you can query the table and no inconsistencies are reported.<br> <br> I admit that this section was more for demonstration purposes only, than for a real recovery operation.<br> Anyway, it learned us a lot about certain details of corruption.<br> <br> Suppose you encounter a situation similar as above, with many checksum errors on many pages.<br> Then, by just changing the "page verify" database option, you might be able to select all records of the effected tables.<br> But that condition is a bit "disturbing", ofcourse. You might consider:<br> <br> <B>1: Use DBCC CHECKTABLE(table_name,repair_rebuild)</B><br> Just try the upper command on all effected tables.<br> <br> <B>2: "rebuilding" the effected tables into new tables.</B><br> Essentially this means putting the data into <i>new pages</I>, with correct checksums.<br> You can simply insert the data from the old tables into new tables (if "select *" works, with page verify=NONE)<br> You can use statements like:<br> <br> SELECT * INTO table_b <I>-- create table_b "on the fly"</I><br> FROM table_a<br> <br> INSERT INTO table_b <I>-- table_b need to exist already</I><br> SELECT * FROM table_a<br> <br> (Note: it sounds easy, but could be complicated due to PK-FK relations, renaming tables, triggers etc..)<br> <br> <B>3: Restore individual pages from backup.</B><br> In principle, if your database uses the "Full Recovery model" (or Bulk-logged model), then you<br> are able to restore an individual page, in case that page is corrupt.<br> The syntax is like in the following example:<br> <br> RESTORE DATABASE SALES PAGE = '2:155' FROM DISK='d:\backups\sales.dmp' WITH RECOVERY<br> <br> Here, we restore the pagenumber 155 in fileno 2, from the diskbased backup.<br> <br> <br> <h2>Chapter 3: Severe corruption on ordinary data pages.</h2> In section 2, we only changed an employee name in a data page, using a hex editor.<br> In fact, it wasn't a severe form of corruption, but it was corruption allright.<br> <br> In this section, I will put a lot of records into the EMPLOYEE table. Then, when the database is offline,<br> I will use xvi32 to wipe out pages N and N+1. I will start somewhere "in the middle" of page N, all the way<br> through to the middle of page N+1, and fill that region with "all zero's".<br> Actually, <I>this is</I> a severe corruption because I cross pageboundaries and also destroy a page header.<br> <br> What will be the effect, and how can we resolve the situation?<br> <br> If you want follow along the exercise, then delete the sales database, and create it again using the script of section 1.<br> Then, only create the EMPLOYEE and EMPLOYEE2 tables, but don't insert the 3 records per table.<br> Also, do not create the index (the last statement).<br> <br> Ok, let's insert a bunch of records in the EMPLOYEE table.<br> Use the following script to insert 9999 records in the EMPLOYEE table<br> <br> <font face="courier" size=2 color="black"> DECLARE @i INT<br> <br> SET @i=1<br> <br> WHILE (@i<10000)<br> BEGIN<br> INSERT INTO EMPLOYEE<br> VALUES (@i,'Harry',1000)<br> SELECT @i=@i+1<br> END<br> <br> <font face="arial" size=2 color="black"> Thanks to the @i, we have unique records in the table, although the employee name and salary are same in all records.<br> Actually, for our purpose, it does not matter at all, as long as we are able to destroy a few datapages,<br> with good pages on the "left" and "right" of pages N, and N+1.<br> Ofcourse, if you like you can insert many more rows, by just varying the while clause, like for example using<br> (@i<1000000), which will insert a million rows.<br> <br> Actually, how many pages are allocated to the EMPLOYEE table right now? Various options exists, but we can use<br> the DBCC PAGE() command again, and this time request information from "page 1" in "filenumber 3" (sales_data_01.ndf).<br> This is the Page Free Space (PSF) page, which stores information about the allocation and free space of pages.<br> So, let's try that:<br> <br> <font face="courier" size=2 color="black"> DBCC TRACEON (3604)<br> GO<br> <br> DBCC PAGE('sales',3,1,3)<br> GO<br> <br> Output:<br> <I>(skipping a lot of lines, only showing the last lines...)</I><br> <br> PFS: Page Alloc Status @0x4416C000<br> <br> (3:0) - (3:3) = ALLOCATED 0_PCT_FULL<br> (3:4) - (3:5) = NOT ALLOCATED 0_PCT_FULL<br> (3:6) - (3:7) = ALLOCATED 0_PCT_FULL<br> (3:8) - = ALLOCATED 100_PCT_FULL Mixed Ext<br> (3:9) - = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext<br> (3:10) - (3:16) = ALLOCATED 100_PCT_FULL Mixed Ext<br> (3:17) - (3:23) = NOT ALLOCATED 0_PCT_FULL<br> (3:24) - (3:50) = ALLOCATED 100_PCT_FULL<br> (3:51) - = ALLOCATED 50_PCT_FULL<br> (3:52) - (3:5119) = NOT ALLOCATED 0_PCT_FULL<br> <br> <font face="arial" size=2 color="black"> What we see here, is that for example pages 3:24 to 3:50 are allocated to the EMPLOYEE table, and are filled with records.<br> So, if I kill pages 30 and 31, we have the situation that we want to investigate.<br> <br> Next, put the sales database offline.<br> Start xvi32, and open the "c:\mssql\data\SALES_DATA_01.ndf" datafile. <br> In xvi32, we want to go to database page 30. And 30x8192=245760, or in hexidecimal that would be 3C000.<br> The 31st page, would start at 3E000. So, I will go to somewhere in between 3C000 and 3E000, and fill<br> a region with all 0's, crossing the boundary to the next page.<br> <br> Save, the file, and close xvi32.<br> Next, put the sales database online. As expected, the SQL Server does not complain at all.<br> <br> Now let's query the EMPLOYEE table.<br> <br> <font face="courier" size=2 color="black"> SELECT * FROM EMPLOYEE<br> <br> <font face="arial" size=2 color="black"> We will see a number of records pass the screen, as SQL Server will just read the table from the beginning.<br> Then, it will try to read the corrupt page, and an error message appears:<br> <font face="courier" size=2 color="black"> <br> Msg 824, Level 24, State 2, Line 1<br> SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x90ee5853; actual: 0x949c5001).<br> It occurred during a read of page (3:30) in database ID 5 at offset 0x0000000003c000 in file 'c:\mssql\data\SALES_DATA_01.ndf'.<br> This is a severe error condition that threatens database integrity and must be corrected immediately.<br> Complete a full database consistency check (DBCC CHECKDB).<br> <br> <font face="arial" size=2 color="black"> This time, it will <B>not help</B> to put the database option "page verify" from "CHECKSUM" to "NONE", since this time<br> we not dealing with just a "soft" corruption. No, this time, two database pages are truly destroyed.<br> <br> How can we recover from this situation?<br> Let's try to repair the table with:<br> <br> Method 1. using SQL server repair methods like DBCC CHECKTABLE(table_name, repair_clause).<br> Method 2. Our own "monkey" method where we try to salvage as many rows as possible.<br> <br> <h3>Method 1: DBCC CHECKTABLE (table_name,repair_allow_data_loss)</h3> Let's first SQL Server check the integrety of the table, using the DBCC CHECKTABLE(EMPLOYEE) command.<br> <br> <font face="courier" size=2 color="black"> DBCC CHECKTABLE(EMPLOYEE)<br> <br> <I>A lot of output appears, but the most important lines are the following:</I><br> <br> ..<br> repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (SALES.dbo.EMPLOYEE).<br> <br> <font face="arial" size=2 color="black"> Ok, so if we want SQL Server to repair the table, our only chance is by using <br> the DBCC CHECKTABLE (EMPLOYEE,repair_allow_data_loss) command.<br> <br> Here, SQL Server warns us that we may experience possible dataloss. In our case, the data in a few pages<br> is <I>really lost</I>. But if we can save the majority of rows in the EMPLOYEE table, the damage is somewhat limited,<br> so let's try that command.<br> <br> First, in order to use the command, you need to put the database in "Single User Mode". After you did that, we try the SQL Server repair method:<br> <br> <font face="courier" size=2 color="black"> DBCC CHECKTABLE (EMPLOYEE,repair_allow_data_loss)<br> <br> <I>A lot of output appears, but the most important lines are the following:</I><br> <br> ..<br> Repair: The page (3:30) has been deallocated from object ID 2073058421, index ID 0,<br> partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data).<br> Repair: The page (3:31) has been deallocated from object ID 2073058421,<br> index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data).<br> ..<br> Msg 8928, Level 16, State 1, Line 1<br> The error has been repaired.<br> ..<br> <br> <font face="arial" size=2 color="black"> Yes, we have succes! SQL Server essentially deallocated pages 30 and 31 (as expected), and moved some pointers<br> so that all pages of the table have pointers to the former and next page.<br> The table is repaired, only we miss the data from pages 30 and 31, which was unavoidable ofcourse.<br> Again, we can perform queries on the EMPLOYEE table in the usual way.<br> <br> The following figure shows clearly what is missing:<br> <br> <font face="courier" size=2 color="black"> SELECT * FROM EMPLOYEE<br> <br> <I>skipping a lot of records</I><br> ..<br> 3987 Harry 1000.00<br> 3988 Harry 1000.00<br> 3989 Harry 1000.00<br> 3990 Harry 1000.00 <- here the "gap" starts<br> 4561 Harry 1000.00 <- here the "gap" ends<br> 4562 Harry 1000.00<br> 4563 Harry 1000.00<br> 4564 Harry 1000.00<br> 4565 Harry 1000.00<br> 4566 Harry 1000.00<br> 4567 Harry 1000.00<br> ..<br> <br> <font face="arial" size=2 color="black"> From the 9999 records, we miss 571 records. That's the penalty of killing 2 database pages.<br> But since our records <B>are so small</B>, many records will fit in a database page. So, with many other<br> true production tables, you might "only" miss just a few records.<br> I am not saying that it's perfect, but at least you have a sound table back, with the majority of records.<br> <br> <h3>Method 2: Selecting "around" corrupt pages, just using queries.</h3> In the unlikely event, that "method 1" does not work for some reason, and you still have the corrupt table,<br> you might consider this method:<br> It's also possible to "select around" the corrupt pages. In effect, you just use SQL to select as much rows<br> as you can, and skipping the rows on the damaged pages.<br> This "trick" usually is not easy. It certainly helps if the table uses a unique key, and that an index is defined<br> on that key.<br> <br> Still, using no more than common sense, we can imaging that we can just select rows, and put them in a copy table<br> as long as SQL Server does not "touches" the damaged pages.<br> <br> Here is a trivial (wrong) example that intuitively might look allright (but most often don't work well)<br> in order to try to salvage 'sound" rows from the corrupt table.<br> <br> <font face="courier" size=2 color="black"> <br> begin try<br> declare @i int<br> set @i=0<br> <br> while (@i<100000)<br> begin<br> insert into employee_copy<br> select * from employee where emp_id>@i and emp_id<@i+10<br> select @i=@i+10<br> end<br> end try<br> <br> begin catch<br> print @i<br> end catch<br> <br> <font face="arial" size=2 color="black"> As you can see, we try to start at "the top" of the table, and work our way right trough it, copying rows<br> to a copy table. Until SQL Server meets the corrupt page ofcourse.<br> Then we could restart this game (setting @i at an appropriate value) to process the records "below" the<br> damaged pages.<br> Well, it could work. But the above code is way too simple and often does not work.<br> That's why Chapter 7 will go into some details of this method.<br> <br> <h3>Recap:</h3> In Chapters 2 and 3, we have seen how we can handle corruption on ordinary datapages.<br> <br> <ol> <li>We have seen what happens if a "soft" corruption in a tablepage have occurred, which we solved without dataloss.</li><br> <li>We have seen what happens if a "hard" corruption in a few tablepages have occurred, which we solved with some dataloss.<br> In the latter case, a few table pages were truly destroyed, so, some dataloss was actually unavoidable.</li><br> </ol> <br> => <U><B>For finding problems, we used: </B></U><br> <br> <B>DBCC CHECKTABLE(table_name)</B><br> <br> This will not repair or alter anything, but the output will show you (in most cases) what's wrong, and might give<br> a hint on which minimum level of DBCC command should be used next.<br> <br> In order to check the whole database in one run, you can use:<br> <br> <B>DBCC CHECKDB(database_name)</B><br> <br> Here too, only problems will be reported, but not repaired. Since it checks the whole database, possibly multiple<br> objects are reported.<br> Ofcourse, it's recommended to check your databases on a regular basis.<br> <br> => <U><B>For solving the softcorruption of (a) tablepage(s) we used:</B></U><br> <br> (1): Put the "PAGE_VERIFY" database option to NONE, and see if you now can select the whole table.<br> If you can retreive all records, multiple options exists to rebuild the data (like INSERTING rows into a copy table).<br> But you likely have no unsurmountable problems, so you just might as well try to repair the table.<br> <br> (2): In order to repair the table, we used:<br> <br> <B>DBCC CHECKTABLE(table_name,repair_rebuild)</B><br> <br> This will not lead to dataloss, since it will only rebuild checksums, pointers, and likely indexentries too.<br> <br> Here too, a similar "database wide" DBCC CHECKDB command exists as well. it's:<br> <br> <B>DBCC CHECKDB(database_name, repair_rebuild)</B><br> <br> However, if you have multiple corrupt objects, I still would recommend to handle them one a the time.<br> <br> => <U><B>For solving the hard corruption of (a) tablepage(s) we used:</B></U><br> <br> <B>DBCC CHECKTABLE (table_name,repair_allow_data_loss)</B><br> <br> It does not neccessarily lead to dataloss. In the example we have used, it was a brutal attack on datapages, fully<br> destroying them. Obviously, some dataloss after repair was unavoidable.<br> In general, it's likely (but not always so) that the "bad" pages gets deallocated from the object.<br> <br> Here too, a similar "database wide" DBCC CHECKDB command exists as well. it's obviously:<br> <br> <B>DBCC CHECKDB(database_name, repair_allow_data_loss)</B><br> <br> However, if you have multiple corrupt objects, I still would recommend to handle them one a the time.<br> <br> As another option for repair, is the use of "smart" queries (the monkey method) where we try to select "around"<br> the bad pages. However, certain conditions should be in place, like a unique key.<br> This will be the subject of Chapter 8.<br> <br> Please note that all of the above handle ordinary datapages. If a "special" page in a database file<br> get corrupted (like page 0: the header page, or page 1: the PFS page, or page 2: the GAM page, etc..)<br> we have a larger problem, and in many cases, DBCC commands will not be of help.<br> <br> <I> Note: if you would have determined that the object is "just" an index, you might just DROP and CREATE the index again.<br> An index is just "derived" data, and can be rebuild without loss of tabledata. You only have to keep in mind<br> that the index might support a "constraint". If that's true, there are some follow-up actions.<br> </I> <br> <br> <h2>Chapter 4: Inaccesible or Suspect database, due to a Missing or corrupt transaction log. Msg 945.</h2><br> In this chapter, we will explore what we can do when the Transaction log is missing, or is corrupt.<br> <br> -If the Transaction Log is missing, your database will be "inaccesible". You will get an errormessage that will<br> inform you to that fact.<br> -If the Transaction Log is really corrupt, the database might be inaccesible, or it will be put in "Suspect" mode.<br> -If you have corrupt objects like tables and indexes, and SQL Server needs to rollback or rollforward data<br> related to those tables, the database will be placed in Suspect mode. Thats why after a crash, a database might<br> show up as Suspect. In chapter 3 we have seen, that even if some ordinary <B>datapages</B> (table or index) are hard corrupted, but they<br> do not need to be accessed during the start (at the recovery phase), SQL Server will just start normally.<br> <br> As said before, in this chapter we will see what we can do if the Transaction log is corrupted or missing.<br> Ofcourse, such a small note as this one, cannot list all possible solutions. But three realistic solutions<br> are presented here, which are:<br> <br> <B>(1) ALTER DATABASE DATABASE_NAME SET EMERGENCY</B> <I>not recommended.</I><br> This allows you to SELECT all tables, and thus you can salvage all data. But this method is not recommended,<br> since below two methods are listed that might repair your database.<br> However, the method is often used to get out of the Suspect state, and to go to Emergency state.<br> <br> <B>(2) ALTER DATABASE DATABASE_NAME REBUILD LOG ON (NAME=logical_name,FILENAME=physical_name)</B><br> This method is recommended. In many cases, if the transacionlog files were gone, it will rebuild<br> a new transaction logfile. If it succeeds, you have an operational database.<br> <br> <B>(3) CREATE DATABASE DATABASE_NAME ON (FILENAME = 'path to mdf file') FOR ATTACH_FORCE_REBUILD_LOG</B><br> If method (2) has failed, possibly due to the fact that SQL Server thinks the database is "recovery_pending"<br> or due to other reasons, we can try to "detach" and "attach" the database again, with a ATTACH_FORCE_REBUILD_LOG clause.<br> There is a warning though! Preferrably, create copies of your databasefiles to another disk or folder(s).<br> <br> <h3>4.1 Simulating a Missing (or corrupt) Transaction log file.</h3> Please create the SALES database again, using the full script as listed in chapter 1.<br> <br> Again, we have a sound database, with a couple of tables which we can access in the usual way.<br> Now, to simulate a serious error condition, in which the transaction log file is missing, do the following:<br> <br> - Shut down the SQL Server service.<br> - Navigate the filesystem to the "c:\mssql\data" folder.<br> - Delete the "SALES_LOG_001.ldf" transaction log file.<br> - Start the SQL Server service again.<br> <br> next, open SQL Server Management Studio (SSMS), and try to access the SALES database.<br> Likely, SSMS will throw the error message "SALES is not accessible.." to you.<br> In SSMS, you also see that next to the SALES database icon, there is no plus sign (+) to examine objects in that database. <br> If we take a look at the SQL Server log, we find entries like the following:<br> <br> <font face="courier" size=2 color="black"> File activation failure. The physical file name "c:\mssql\data\SALES_LOG_001.ldf" may be incorrect.<br> The log cannot be rebuilt because the database was not cleanly shut down.<br> Msg 945, Level 14, State 2, Line 1<br> Database 'SALES' cannot be opened due to inaccessible files or insufficient memory or disk space.<br> Msg 5069, Level 16, State 1, Line 1<br> ALTER DATABASE statement failed.<br> <br> and maybe also: <br> FileMgr::StartLogFiles: Operating system error 2(error not found) occurred while creating or opening file <br> 'c:\mssql\data\SALES_LOG_001.ldf'. >br> Diagnose and correct the operating system error, and retry the operation.<br> <br> <font face="arial" size=2 color="black"> We are not able to open and access the SALES database. There are several strategies that we can follow.<br> The next sections will discuss various methods.<br> <br> <h3>4.2 Solution 1: Perform the neccessary Trivial checks</h3> Here, ofcourse we <I>know</I> that the logfile is missing, but in general you should quicly check the following trivial items:<br> <br> - did somebody accidently <i>moved</I> database files (on your Server) to another location?<br> - is the file perhaps in the Recycle bin?<br> - do you have an NTFS Undelete utility (there are even a couple of free tools available)<br> - are perhaps the filesystem permissions changed, so that the service cannot access the file?<br> <br> We are not going to undelete the file, and we further treat this case as if the logfile is permanently missing.<br> <br> <h3>4.3 Solution 2: place the Database in Emergency Mode (not recommended)</h3> <br> If it's clear from the eror messages, that <I>it is the Transaction log, that is missing or corrupt</I>,<br> then this method is not recommended.<br> But in other circumstances, for example, when you first see a "Suspect" database, then it could be<br> a good action.<br> <br> But, suppose we get spooked by now, because we know we have a serious problem. Although better methods exists to get "life"<br> again, we might decide to retrieve the information from the SALES database, as soon as possible.<br> <br> In many cases, if you can't access a database (for example, due to missing transaction log files, or due to corrupt objects), you can place a database in "Emergency Mode".<br> If the database was "cleanly" shutdown before the deletion of the transaction logfile, it probably will work.<br> <br> If it works, a database is then effectively to be considered as "read-only", but we are able to SELECT all tables,<br> and copy the data to another database. Also, export tools as "bcp" will work.<br> There is no garantee that it will <I>always</I> work, but in many cases, it really does.<br> <br> Let's try it. Start a Query Window, and execute the following TSQL statement:<br> <br> <font face="courier" size=2 color="black"> ALTER DATABASE SALES SET EMERGENCY<br> <br> <font face="arial" size=2 color="black"> <I>If it worked</I>, then if you refresh the databases icon in SSMS, you will now see the "+" appear before the SALES database,<br> and you can view the objects as usual. Effectively, the database is read-only, but you are able to retrieve the data.<br> Just try to select the rows from the EMPLOYEE table, to check if it works.<br> <br> At least this solution is a method to salvage all your data, but it is not recommended.<br> <br> It can also be interresting to take a look at a couple of systemviews, like for example:<br> <br> <font face="courier" size=2 color="black"> select substring(name,1,30), database_id, state, state_desc from sys.databases<br> select * from sys.dm_tran_database_transactions<br> <br> <font face="arial" size=2 color="black"> <br> <h3>4.4 Solution 3: Rebuild the Transaction Log (recommended)</h3> Several approaches exists to let SQL Server create a new Transaction logfile.<br> <br> Once, we had the DBCC REBUILD_LOG() statement to our disposal, but it's not available over all versions,<br> so we don't consider that one.<br> <br> What we will try instead is the following command:<br> <br> ALTER DATABASE SALES REBUILD LOG ON (NAME=logical_name,FILENAME=physical_name)<br> <br> So, let's try it:<br> <br> <font face="courier" size=2 color="black"> ALTER DATABASE SALES REBUILD LOG ON (NAME=Sales_Log,FILENAME='c:\mssql\data\sales_log.ldf')<br> <br> The following output hopefully appears:<br> <br> Warning: The log for database 'SALES' has been rebuilt. Transactional consistency has been lost.<br> The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were.<br> You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode.<br> When you are ready to make the database available for use, you will need to reset database options and delete any extra log files. <br> <font face="arial" size=2 color="black"> <br> If the upper output appeared, we have succes! If now you refresh SSMS, the Emergency mode is replaced by "Restricted User" mode.<br> Now, change the database properties so that the Database is multi-user again.<br> Also, we have a rebuild transaction log, and you can add files, to what you see is appropriate for the Log.<br> <br> <h3>4.5 Solution 4: Rebuild log with CREATE DATABASE.. ATTACH_FORCE_REBUILD_LOG).</h3> If the solution of section 4.4 has failed, we are still not out of business yet.<br> Indeed, in some cases, SQL Server might "think" that a database is in a "RECOVERY_PENDING" state.<br> Then, if a Log is lost, it might just protest in rebuilding the transaction logfile, but there are other causes as well.<br> In this simple note, we emulate problems with the transaction log, by just deleting it (while the database is offline).<br> But there could be really complex problems due to (what SQL Server thinks) are unfinished transactions that must rollback.<br> <br> Below, is actually a bit of a trick, so if you are working right now on the SALES demo database, you do not have to take<br> any precautions.<br> <br> But if you want to apply the following method to a database that is of any value, I would like you to create copies of the<br> database files (e.g. to another disk, or folder). So, you might consider shutting down the SQL Server service, and make copies.<br> <br> Anyway, we go ahead with SALES database.<br> Please drop it, and create it again using the full script in Chapter 1.<br> <br> Again, we have a sound database, with a couple of tables which we can access in the usual way.<br> Now, just as in section 4.1, do the following:<br> <br> - Shut down the SQL Server service.<br> - Navigate the filesystem to the "c:\mssql\data" folder.<br> - Delete the "SALES_LOG_001.ldf" transaction log file.<br> - Start the SQL Server service again.<br> <br> If you try to access the database, you get the familiar errormessages, just as described in section 4.1.<br> <br> This time we are going to "detach" the database, so that SQL Server places it in a specific state, just as if<br> it does not know the SALES database anymore.<br> But again, it's advisable to create copies of the databasefiles first (if possible), while SQL Server is down<br> <br> Let's try the following:<br> <br> <font face="courier" size=2 color="black"> sp_detach_db sales<br> <br> <font face="arial" size=2 color="black"> Now, you may receive output that the command completed, but it's more likely you get output similar to this:<br> <br> <font face="courier" size=2 color="black"> Msg 947, Level 16, State 1, Line 1<br> Error while closing database 'SALES'. Check for previous additional errors and retry the operation.<br> <br> <font face="arial" size=2 color="black"> Now, don't worry about the message above. The database is de-registered from SQL Server<br> Also, we still have the original database files (.mdf and .ndf files), and next we tell SQL Server to attach<br> the SALES database, just as if you were moving it from one Server to another.<br> <br> <font face="courier" size=2 color="black"> CREATE DATABASE SALES<br> ON (FILENAME = 'c:\mssql\data\sales.mdf')<br> FOR ATTACH_FORCE_REBUILD_LOG<br> <br> <font face="arial" size=2 color="black"> You probably get the following output, which actually means succes, if you inspect the second line closely:<br> <br> <font face="courier" size=2 color="black"> File activation failure. The physical file name "c:\mssql\data\SALES_LOG_001.ldf" may be incorrect.<br> New log file 'c:\mssql\data\SALES_log.LDF' was created.<br> <br> <font face="arial" size=2 color="black"> Next, try if you can for example select the EMPLOYEE table again.<br> Hopefully it works, and we conclude that we have repaired the SALES database.<br> <br> <br> <h2>Chapter 5: Inaccesible or Suspect database, due to Severe corruption.</h2> <h3>5.1 A few remarks on databases pages: the ordinary datapages and the special pages</h3> Chapters 2 and 3 dealt with corruption on (ordinary) data pages. These pages will generally hold table- and index data.<br> If objects on ordinary data or index pages went corrupt, then SQL Server did not alter the "state of the database",<br> unless there is a reason to put it into the 'Suspect" state (like that at startup, rollback is needed, but the object is corrupt).<br> <br> You know that a database will have one .mdf (primary) file, and usually one or more .ndf (secondary) files,<br> which will hold (or are supposed to hold), the actual data (tables, indexes etc..). <br> The <B>first couple of pages</B> in the .mdf or any .ndf file, are "special", and will contain (among other things) <br> identification data, and metadata, describing the database, and the (starting)location of all objects in the file.<br> These pages form the 'administration" of the files and the database.<br> <br> A page is a unit of 8192 bytes with a specific structure. The larger part of a database file, is further organized<br> into extents. Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages.<br> <br> The "administration" of pages/extents is covered by some very specific pages:<br> <br> - The "GAM" (Global Allocation Map) and "SGAM" (Shared Global Allocation Map) keeps track of which extents <br> have been allocated, and which are free.<br> - The "PFS" page (Page Free Space page) keeps information about page allocation and free space available on pages.<br> <br> You will find a GAM and SGAM page as page 2 and page 3 at the begin of a file, covering 4G (64000) of extents.<br> PFS pages can be located at multiple places, but you will find one as page 1 at the beginning of a file.<br> By the way, PFS pages are only used for "heaps", which are tables without a clustered index.<br> <br> So, the first pages at the beginning of a database file, have a specific purpose. let's recapitulate them first.<br> <br> <B>Overview Specific Pages:</B><br> <br> <font face="courier" size=2 color="black"> page 0: The very first page in any file. It's the "fileheader page". it contains information about the file itself,<br> like to what "dbid" (database id) it belongs, on which pages the GAM and SGAM (sort of directory pages) are located etc..<br> <br> page 1: The (first) PFS pages. It keeps track of allocated pages and free space in pages.<br> <br> page 2: The first GAM page.<br> <br> page 3: The first SGAM page.<br> <br> pages 4-7: Specific pages which will track differential, changed pages and logging information.<br> <br> Usually, the first usable page (in an .ndf file) for regular data will start from page 8.<br> If an index was created in the database, page 9 could also be the first IAM page, unless the first<br> table creation and inserts of data, spanned a number of pages (as of page 8). Then, IAM's occur at later positions.<br> <br> Only for the primary .mdf datafile, page 10 is the "bootpage". It stores specific metadata, as well as Log Sequence Numbers<br> for all sorts of operations as full backups, differential backups, create index LSN's etc..<br> <br> <font face="arial" size=2 color="black"> let's play around a bit with the DBCC PAGE() and DBCC DBINFO() commands.<br> <br> Let's first create a new sales database again. If you still have the old one, then just drop it using SQL Server management Studio.<br> Now, create a new SALES database using the script of chapter 1. This script creates a database with<br> a specific layout with multiple filegroups, and also will create a few tables on different filegroups.<br> <br> After you have executed the script, let's explore the special pages a bit:<br> <br> <B>Playing around to view some specific pages:</B><br> <br> <font face="courier" size=2 color="black"> DBCC TRACEON (3604)<br> GO<br> <br> <B>-- 1. let's take a look at page 0 (fileheader) in file no 3 (which is "sales_data-01.ndf");</B><br> <br> DBCC PAGE('sales',3,0,3)<br> <br> <img src="sqlcorr4.jpg" align="center"/> <br> Only partial output is shown above.<br> Do you for example notice the location of the GAM and SGAM pages (at the bottom)<br> <br> <br> <B>-- 2. let's take a look at page 1 (PFS) in file no 3:</B><br> <br> DBCC PAGE('sales',3,1,3)<br> <br> <img src="sqlcorr5.jpg" align="center"/> <br> Only partial output is shown above.<br> Do you for example notice the listing of free and allocated pages (at the bottom)<br> <br> <br> <B>-- 3. let's take a look at page 2 (GAM page) in file no 3:</B><br> <br> DBCC PAGE('sales',3,2,3)<br> <br> <img src="sqlcorr6.jpg" align="center"/> <br> Only partial output is shown above.<br> Do you for example notice the listing of free and allocated pages (at the bottom)<br> <br> <br> <font face="arial" size=2 color="black"> <h3>5.2 Inaccesible or Suspect database due to corruption of an .ndf file</h3> Note: for information on what to do at a missing or corrupt Transaction log: please see Chapter 4.<br> <br> Ofcourse, most pages in a database are ordinary data pages (like table- and index) pages. If corruption is found<br> here, we have a good chance to repair it.<br> <br> Situation 1: Suspect database, and/or ordinary data pages in an .ndf file are corrupted, then: <ol> <li> Either SQL Server will start normally if it does not need to access the related objects for recovery purpose.</li><br> <li> Or, it might put the database in a suspect state, if at startup it needs to access the corrupt objects (like at "rollback"/"rollforward").</li><br> </ol> <br> Situation 2: Severe corruption or missing .ndf file.<br> <br> In section 5.1, we have seen a few examples of "special" system pages. They surely exist in the beginning of a database file,<br> but some (might) be present at certain locations throughout the file (like the PFS page, IAM's etc..), depending<br> on the size of the file, and the number of objects.<br> <br> <B>Situation 1: ordinary data pages in an .ndf file are corrupted</B><br> <br> Let's first deal with "Situation 1".<br> Situation 1 is, in general, survivable using the techniques we have seen before.<br> But it must be said, that if the Transaction Log is (physically or logically) corrupt at the same time, it gets a bit harder.<br> <br> Let's first explore the more general case:<br> <br> First, if the database is Suspect, you can try to put the database in Emergency mode.<br> After that, it is (likely) to be accesible. Then you can try the DBCC CHECKDB command to obtain a list of objects<br> which have corrupt pages.<br> Then, it might get clear that you must run DBCC CHECKTABLE with the "repair_allow_data_loss" option, to repair<br> the list of the objects. This might result in (usually slight) dataloss. But at times that I had to use<br> the procedure for "for real", dataloss really did not happen often. So, many corruptions are just logical corruptions,<br> instead of really trashing pages (like we have seen in Chapter 3).<br> <br> To put the above in a "neat" list:<br> <br> (1): Database is suspect. You then try:<br> (2): ALTER DATABASE database_name SET EMERGENCY<br> (3): If it turns out that the database is accesible (for example. you can query a Table), you then try:<br> (4): DBCC CHECKDB(database_name) to obtain a list of corrupt objects and other information.<br> (5): If it turns out, that only tables and/or indexes are affected, then<br> (6): Run DBCC CHECKTABLE(table_name) for those tables. See what what minimum "repair clause" is needed.<br> (7): Run DBCC CHECKTABLE(table_name,repair_rebuild) or DBCC CHECKTABLE (table_name,repair_allow_data_loss).<br> (8): If all tables are done, run DBCC CHECKDB(database_name) again.<br> <br> Hopefully it has worked.<br> <br> If it turns out, that the Transaction log was "bad" also, try the solutions of sections 4.4 or 4.5.<br> <br> <B>Situation 2: Severe corruption in an .ndf file</B><br> <br> It's possible, that in such a situation, "partial database availability", can be of use.<br> <br> Since SQL Server 2005, there is a new "state" in which the database can exist, and it's called "partial database availability".<br> <br> If you are familiar with Oracle, you know that you can put a "tablespace" offline. That feature is available since a very long time.<br> The "Tablespace" concept of Oracle, is very similar to the "Filegroup" concept in SQL Server.<br> <br> In principle, it's now possible to place a Filegroup offline, while the rest of the database is still accessible.<br> Ofcourse, if you only have a few Filegroups, then probably you are not helped much, if a large part of the database is not online.<br> But, even if you have only a small number of filegroups, then you still might have "distributed" the objects in a sort of<br> "functional" meaningfull way, so that the application can still function even if a part of the database is offline.<br> Ofcourse, you cannot expect this to work if you use a simple structure like having one .mdf file and a transaction log.<br> <br> So, there are a few requirements, before you can meaningfully use the "partial database availability".<br> <br> - The database should use the "Full Recovery model" for total recovery, but the feature will still work if the model is "Simple".<br> - There exists sufficient "structure" in the number and purpose of filegroups.<br> <br> let's try demonstrate the "partial database availablity" feature. Create the SALES database again, using the script from Chapter 1.<br> After the SALES database is created again, we know that we have the EMPLOYEE table in the "SALES_DATA_01" file group,<br> while the EMPLOYEE2 table resides in the Primary Filegroup.<br> Now, we execute the following:<br> <br> <font face="courier" size=2 color="black"> ALTER DATABASE SALES<br> MODIFY FILE (NAME = 'SALES_DATA_01' , OFFLINE)<br> GO<br> <br> ALTER DATABASE SALES SET ONLINE<br> GO<br> <br> <font face="arial" size=2 color="black"> So, the tablespace "SALES_DATA_01" (and thus the file "c:\mssql\data\SALES_DATA_01.ndf") is offline,<br> while the rest of the database is still open. <br> Now, what happens if we query the EMPLOYEE and EMPLOYEE2 tables?<br> We expect that we are not able to access the EMPLOYEE table, but querying the EMPLOYEE2 table should work.<br> <br> <font face="courier" size=2 color="black"> SELECT * FROM EMPLOYEE<br> <br> Msg 8653, Level 16, State 1, Line 1<br> The query processor is unable to produce a plan for the table or view 'EMPLOYEE' because the table resides in a filegroup which is not online.<br> <br> <br> SELECT * FROM EMPLOYEE2<br> <br> MP_ID EMP_NAME SALARY<br> <br> 1 Harry 2000.50<br> 2 Nadia 3000.00<br> 3 Albert 5000.00<br> <br> <font face="arial" size=2 color="black"> This feature indeed works. <br> <br> So, as of SQL Server 2005, it is possible to mark a lost or corrupt filegroup as offline, and then bring the rest<br> of the unaffected database online, so that an application might continue to function.<br> Ofcourse, objects that are located on the offline filegroup, are not available.<br> <br> The larger your database is, and the more distributed over different filegroups, the better this feature will work.<br> <br> Ofcourse, you should restore the filegroup and use transaction log backups to fully recover.<br> But at least it is a feature where not your entire database is lost.<br> <br> <br> <br> <h2>Chapter 6: Overview of SQL Server Repair commands.</h2><br> <font face="courier" size=2 color="black"> <h3>1. PROBLEMS WITH TRANSACTION LOG:</h3> See chapter 4 for explanations. You might try one of the following:<br> <br> 1. ALTER DATABASE DATABASE_NAME REBUILD LOG ON (NAME=logical_name,FILENAME=physical_name)<br> <br> 2. CREATE DATABASE DATABASE_NAME ON (FILENAME = 'path to mdf file') FOR ATTACH_FORCE_REBUILD_LOG<br> <br> 3. ALTER DATABASE DATABASE_NAME SET EMERGENCY (not recommended)<br> <br> 4. If everything fails (don't use it, unless out of options):<br> <br> - Detach the database.<br> - Shutdown SQL Server.<br> - Rename .mdf and all .ndf files to .old (all database files)<br> - Create a small blank database with the same name, and exactly same filegroups and physical files.<br> - Detach the new database<br> - Shutdown SQL Server.<br> - Delete the .mdf and .ndf files of blank database, except the log .ldf file.<br> - Rename the .old files to original .mdf and .ndf files.<br> - Start SQL Server<br> - Use CREATE DATABASE DATABASE_NAME ON (FILENAME = 'path to mdf file') FOR ATTACH_FORCE_REBUILD_LOG<br> - If database openend: use DBCC CHECKDB<br> - If database Suspect: ALTER DATABASE DATABASE_NAME SET EMERGENCY<br> - use DBCC CHECKDB, and then DBCC CHECKTABLE() for all affected objects.<br> <br> <br> <h3>2. PROBLEMS WITH OBJECTS LIKE TABLES, CORRUPT DATA PAGES:</h3> See chapters 2 and 3 for explanations. You might try one of the following:<br> <br> 1. Retreiving information on possible corrupt objects:<br> <br> DBCC CHECKDB(DATABASE_NAME) -- only listing, not repairing<br> <br> 2. Establishing level of corruption of a table and associated objects:<br> <br> DBCC CHECKTABLE(TABLE_NAME) -- only listing, not repairing<br> <br> 3. Repair logical corruption like checksum, index entries, pointers:<br> <br> DBCC CHECKTABLE(TABLE_NAME,repair_rebuild) -- no dataloss<br> <br> 4. Repair hard corruption (possible de-allocation of some pages):<br> <br> DBCC CHECKTABLE (TABLE_NAME,repair_allow_data_loss) -- possibly some dataloss<br> <br> 5. REPAIR A COMPLETE DATABASE (not recommended, see also chapter 3):<br> <br> DBCC CHECKDB(database_name, repair_rebuild)<br> or<br> DBCC CHECKDB(database_name, repair_allow_data_loss)<br> <br> Note:<br> If a table has only "soft" corruption (for example wrong checksum on one or more data pages),<br> setting the "page verify" database option to NONE, allows you to select the table again.<br> <br> <h3>3. PROBLEMS WITH A FILE OR FILEGROUP:</h3> See chapter 5 for explanations.<br> <br> => Try to place the "faulty" Filegroup offline.<br> <br> -- Execute below statements as "one" batch:<br> <br> ALTER DATABASE DATABASE_NAME<br> MODIFY FILE (NAME = 'Filegroup_Name' , OFFLINE)<br> GO<br> <br> ALTER DATABASE DATABASE_NAME SET ONLINE<br> GO<br> <br> If it works, the rest of the database is online.<br> <br> => As another possible method:<br> <br> (1): Database is suspect. You then try:<br> (2): ALTER DATABASE database_name SET EMERGENCY<br> (3): If it turns out that the database is accesible (for example. you can query a Table), you then try:<br> (4): DBCC CHECKDB(database_name) to obtain a list of corrupt objects and other information.<br> (5): If it turns out, that only tables and/or indexes are affected, then<br> (6): Run DBCC CHECKTABLE(table_name) for those tables. See what what minimum "repair clause" is needed.<br> (7): Run DBCC CHECKTABLE(table_name,repair_rebuild) or DBCC CHECKTABLE (table_name,repair_allow_data_loss).<br> (8): If all tables are done, run DBCC CHECKDB(database_name) again.<br> <br> <br> <h3>4. SOME SUPPORTING SQL</h3> See chapter 2 and 3 for examples.<br> <br> 1. View a database page:<br> <br> DBCC TRACEON (3604)<br> GO<br> <br> DBCC PAGE('database_name', file_id, pagenumber, output_mode) -- e.g.: DBCC PAGE('sales',3,30,3)<br> <br> 2. Get object_id from object_name, or the other way around:<br> <br> SELECT OBJECT_ID(object_name)<br> SELECT OBJECT_NAME(object_id)<br> <br> 3. Get a list of tables and their associated indexes:<br> <br> SELECT<br> substring(sysobjects.name,1,50) AS TABLENAME,<br> substring(sysindexes.name,1,50) AS INDEXNAME,<br> sysobjects.id, sysindexes.indid, sysindexes.groupid,sysindexes.rows<br> FROM sysobjects, sysindexes<br> WHERE sysobjects.id=sysindexes.id<br> ORDER BY sysindexes.rows desc<br> <br> 4. Kill a session (spid):<br> <br> kill spid -- e.g.: kill 132<br> <br> 5. Show SQL Server version:<br> <br> SELECT @@VERSION<br> <br> 6. Get Windows version:<br> <br> -> start cmd -> winver<br> -> start cmd -> systeminfo<br> -> start cmd -> systeminfo > list.txt (then open list.txt with notepad)<br> <br> 7. Other 2005/2008:<br> <br> <a href="http://antapex.org/sqlserversessions.txt">Showing sessions, spids, and blocked sessions</a><br> <a href="http://antapex.org/sqlserverbackups.txt">Showing backup history from the MSDB database</a><br> <a href="http://antapex.org/sqlserverindexes.txt">Showing index information (number of rows, fragmentation etc..)</a><br> <a href="http://antapex.org/sqlserverfiles.txt">Showing database file info (sizes,paths,%free/%full etc..)</a><br> <a href="http://antapex.org/sqlserverwaits.txt">Showing SQL Server waits</a><br> <a href="http://antapex.org/sqlserverbackup.txt">Note describing SQL Server backup and restore using TSQL</a><br> <br> <font face="arial" size=2 color="black"> <br> <h2>Chapter 7: Using queries to salvage data from corrupt tables.</h2><br> Instead of using repair commands, we might be able to use an alternative method. In this case, we simply use queries.<br> The trick then is, to try to select "around" corrupt pages.<br> <br> Actually, the best way to repair your table, is by using the methods as shown in chapters 2 and 3.<br> But in the unlikely event that those methods does not work, you might save data using queries.<br. <br> We might consider two categories:<br> <br> 1. The table has one or more non-clustered indexes.<br> 2. The table does not have indexes, or those indexes are unusable for our purposes.<br> <br> <B>Case 1: The table has one or more non-clustered indexes:</B><br> <br> Why would indexes help?<br> Suppose a table is so severely damaged, that you are hardly able to query data from the table itself.<br> Now, you know that a nonclustered index is an external object, and it "contains" one or more columns<br> of the table, from all rows of the table.<br> So, if you would use a <I>covering query</I>, that only selects the columns contained in the index,<br> and the Query Optimizer decides to use that index, <U>then</U> the table is <I>not accessed at all!</I><br> And suppose further, that multiple indexes exist, then you might even be able to retreive a relevant amount of data.<br> <br> Let's try it using a very simple example. We will use the example of the "hard" corrupted pages<br> as shown in Chapter 3.<br> Only this time, for the sake of demonstration, the index on the EMPLOYEE table, will use the EMP_NAME and SALARY columns.<br> So, create the SALES database, the EMPLOYEE and EMPLOYEE2 tables, and create a non-clustered index as follows:<br> <br> <font face="courier" size=2 color="black"> CREATE NONCLUSTERED INDEX indx_employee_empname ON EMPLOYEE(EMP_NAME,SALARY) ON SALESINDEX01<br> GO<br> <br> <font face="arial" size=2 color="black"> Now, execute the script that inserts 9999 rows in the EMPLOYEE table.<br> Next, do the procedure as listed in chapter 3, to hard corrupt a couple of data pages of the EMPLOYEE table.<br> <br> As you have seen in Chapter3 (and what you maybe are doing again if you practise this example), you cannot<br> select the EMPLOYEE table: SQL Server will come up with an errormessage preventing you to retreive the rows.<br> <br> Now, with our non-clustered index, we might be able to use this trick. Actually, it's not a "trick" ofcourse,<br> since we will use a "covering" query which forces the optimizer to use the index, and the table is not touched at all.<br> So, now try this:<br> <br> <font face="courier" size=2 color="black"> SELECT emp_name, salary FROM EMPLOYEE WITH (INDEX(indx_employee_empname))<br> GO<br> <br> <font face="arial" size=2 color="black"> SQL Server will return all rows, that is all EMP_NAMES and all SALARIES. Only the EMP_ID is missing,<br> since that is not "part" of the index.<br> <br> True, it's certainly not perfect. But suppose multiple non-clustered indexes on your damaged table exists,<br> all using different columns, then you would be able to construct several covering queries, and you might be able<br> to retreive a relevant amount of data.<br> <br> <br> So, for now I will quit working on this document. Next, I will start a similar note on Oracle.<br> <br> I certainly hope it was of use ! <br> <br> </body> </html>