SQL Server Patches and Hotfix Notification

December 1, 2014 2 comments

I often get asked, if there is an easy way to get notifications for Patches and Hotfixes for SQL Server.  Generally I recommend people keep an eye on http://sqlserverbuilds.blogspot.com.  However I just came across another site that can be used, http://blogs.msdn.com/b/sqlreleaseservices/default.aspx.  Unlike the SQL Server Builds website, you can subscribe to RSS feed on MSDN site, this way whenever Microsoft releases update you can review it right away.  I really give kudos to who ever is maintain SQL Server Builds website; by far the most compressive and easy to use site for SQL Server build details.  So both sites in conjunction, I think are beneficial links to have in your favorites for patching and hotfix release information.

As Microsoft best practices we recommend you keep your servers up-to-date on latest service pack level.  It can be challenging in my scenarios; however, if you know what fixes are being released and what are their impact.  Then you can have an proactive discussion with business owner and management on why certain patch level is required.

For example, lets say the patch required because of memory leak, or corruption fix, etc.  These kind of patches you don’t want to wait for until you run into issue, we need to be proactive about addressing this to maximum the availability and uptime for service provided by SQL Server.

Edmonton SQLPASS Chapter – SQL Presentation on SQL Server 2014: In-Memory OLTP Engine -

November 17, 2014 6 comments

Will be speaking about SQL Server 2014 In-Memory OLTP Engine @ The Edmonton SQLPASS Chapter @ Canadian Western Bank Place.  If your in town please register and drop by.

Edmonton SQLPASS Website

http://edmpass.sqlpass.org/Home.aspx

SQL Server Failover Cluster, Agent Services Failover to come online, Error 435.

November 3, 2014 Leave a comment

This was a troublesome error, because there is not much information available on what is going wrong.  I looked the application log, SQL Agent Error Log, cluster logs; but neither gave me much details on what was causing the issue.  So after research on internet found an article by Tom Green (Blog).  However after looking at the said keys in [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\SQLServerAgent\ values WorkingDirectory and ErrorLogFile were already correct for me; however agent still refused to start up.

So another suggestion Tom makes is it might be permission issue on directory.  Which were also correct; slowly running out of options…. but …

It dawned on me this key was recently imported because of corruption issue; and with Tom’s article’s hint, I figured what if permissions in registry got deleted.

Sure enough SQL Agent services had lost permission to [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\SQLServerAgent\ hive.  After adding the agent account back in with Full Control, services started without problems.

  1. Run regedit.msc
  2. Navigate to [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\
  3. Right click on SQLServerAgent, go to Permissions.
  4. Click Add…
  5. If using Service SIDS …
    1. Change Location from Domain to Local Computer
    2. If using Default Instance …
      1. Type “NT Service\SQLServerAgent”
    3. If using Named Instance …
      1. Type “NT Service\SQLAgent$InstanceName”
    4. Click OK.
  6. If using Domain Service Accounts …
    1. Type in Domain\AccountName
    2. Click OK.
  7. Select Full Control in Permissions Dialog Box.
  8. Click OK.

Restart SQL Agent, should work without problem.

Calgary SQLPASS Chapter – SQL Presentation on SQL Server 2014: In-Memory OLTP Engine -

October 28, 2014 Leave a comment

Will be speaking about SQL Server 2014 In-Memory OLTP Engine @ The Calgary SQLPASS Chapter @ Microsoft Office Calgary.  If your in town please register and drop by.

Calgary SQLPASS Website
http://calgary.sqlpass.org/

Missing registery settings in cluster nodes for SQL Server

October 27, 2014 Leave a comment

I run into this occasionally, I think in last 3-4 years being SQL Server PFE, I have seen this issue total of 4 times.  So its not common, issue.  So for this post, I’ll use example architecture.  Two-Node Cluster, Node A and NodeB running SQLFCI1 on it.  SQLFCI1 runs fine on NodeA but fails on NodeB.  Looking at the Application Log we see strange messages like “Could not open error log file ”“.  Other messages might around missing various configuration settings that SQL Server needs to start up.  So how can that happen?

When SQL Server is running as a Failover Cluster Instance (FCI); its configuration settings (a.k.a registry keys under HKLM\Software\Microsoft\Microsoft SQL Server) are saved in a cluster hive in registry.  So when the node fails over from active to passive these settings get carried over and applied to passive node.  That is why we have best practice to make all configuration settings on active node only, if you make it on passive node, or if instance is offline.  The Cluster Service will over write them with what it know of the settings.  This is called CheckPoint process.

We can check if all the required SQL Server keys being copied to cluster hive or not.  We can do that from Command Prompt using following command:

cluster.exe . res “SQL Network Name (SQLFCI1)” /CheckPoints

You will get a output similar to below:

Listing registry checkpoints for resource ‘SQL Network Name (SQLFCI1)’…

Resource                   Registry Checkpoint
————————– —————————————————————————-
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent’

However if you get only some or no record back we have an issue.

PLEASE NOTE DO THIS ON NODE THAT IS WORKING.  IF YOU DO IT ON NODE THAT IS NOT, YOU WILL LOSE ALL YOUR REGISTRY SETTINGS.

  1. Backup the HKLM\Software\Microsoft\Microsoft SQL Server\ hive on both NodeA and NodeB (just in case, you ignore my warning above/ or murphy’s law kicks in).
  2. Confirm instance is on NodeA, if not failback to NodeA from NodeB (NodeA was the good guy in my scenario above).
  3. Execute following commands to add each of the key registry settings to cluster checkpoint.

cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent”

After this re-run the /CheckPoints command above to verify they were added successfully.

The Database Engine system data directory in the registry is not valid

October 20, 2014 Leave a comment

Recently I was helping a client with this issue; they were trying to repair their instances due to some registry setting issues.  However kept getting this message; this error message in the Bootstrap Logs means one of the key parameters for setup are missing or incorrect.  So in your setup logs you might have error similar to below:

(01) 2014-10-17 12:34:38 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing Validation and scenario Validation.

(01) 2014-10-17 12:34:38 Slp: Microsoft.SqlServer.Configuration.SqlConfigBase.ValidationStateException: The Database Engine system data directory in the registry is not valid. —> Microsoft.SqlServer.Configuration.SqlEngine.RepairException: The Database Engine system data directory in the registry is not valid.

(01) 2014-10-17 12:34:38 Slp:    — End of inner exception stack trace —

(01) 2014-10-17 12:34:38 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.ValidateMergedSettings(EffectiveProperties properties)

(01) 2014-10-17 12:34:38 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.RepairCorruptInstance(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcb)

(01) 2014-10-17 12:34:38 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.ExecuteAction(String actionId)

(01) 2014-10-17 12:34:38 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)

(01) 2014-10-17 12:34:38 Slp: The following is an exception stack listing the exceptions in outermost to innermost order

(01) 2014-10-17 12:34:38 Slp: Inner exceptions are being indented

(01) 2014-10-17 12:34:38 Slp: 

(01) 2014-10-17 12:34:38 Slp: Exception type: Microsoft.SqlServer.Configuration.SqlConfigBase.ValidationStateException

(01) 2014-10-17 12:34:38 Slp:     Message: 

(01) 2014-10-17 12:34:38 Slp:         The Database Engine system data directory in the registry is not valid.

(01) 2014-10-17 12:34:38 Slp:     HResult : 0x851a0042

(01) 2014-10-17 12:34:38 Slp:         FacilityCode : 1306 (51a)

(01) 2014-10-17 12:34:38 Slp:         ErrorCode : 66 (0042)

(01) 2014-10-17 12:34:38 Slp:     Data: 

(01) 2014-10-17 12:34:38 Slp:       SQL.Setup.FailureCategory = ConfigurationValidationFailure

(01) 2014-10-17 12:34:38 Slp:       WatsonConfigActionData = REPAIR@VALIDATION@SQL_ENGINE_CORE_INST

(01) 2014-10-17 12:34:38 Slp:       WatsonExceptionFeatureIdsActionData = System.String[]

(01) 2014-10-17 12:34:38 Slp:     Stack: 

(01) 2014-10-17 12:34:38 Slp:         at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.ValidateMergedSettings(EffectiveProperties properties)

(01) 2014-10-17 12:34:38 Slp:         at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.RepairCorruptInstance(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcb)

(01) 2014-10-17 12:34:38 Slp:         at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.ExecuteAction(String actionId)

(01) 2014-10-17 12:34:38 Slp:         at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)

(01) 2014-10-17 12:34:38 Slp:     Inner exception type: Microsoft.SqlServer.Configuration.SqlEngine.RepairException

(01) 2014-10-17 12:34:38 Slp:         Message: 

(01) 2014-10-17 12:34:38 Slp:                 The Database Engine system data directory in the registry is not valid.

(01) 2014-10-17 12:34:38 Slp:         HResult : 0x851a0042

(01) 2014-10-17 12:34:38 Slp:                 FacilityCode : 1306 (51a)

(01) 2014-10-17 12:34:38 Slp:                 ErrorCode : 66 (0042)

Its easy to resolve please go and verify following settings are correct:

  1. Run regedit.exe
  2. Navigate to [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.InstanceID\Setup\
  3. Look for a String Key, SQLDataRoot.  The value should be where the System Data, Log, etc. folders are.  For example, its D:\MSSQL10.SQLInstance\MSSQL\

If setting isn’t there add it, if the path is incorrect fix it and run your setup/repair/upgrade again.

 

How do I calculate database growth 12 months from now?

October 6, 2014 Leave a comment

I often get asked a question similar to that; I need to figure out how to calculate what my database size will be 12 months down the road, or 24 months, or 36 months?  But how do I do that?

There is a simple formula we can use to calculate this; however in order to do it we need to know few key values.

  • What is current size? (CurrentSize(mb))
  • What is the expected monthly growth in percent? (ProjectedGrowthPerMonth(%))
  • How many months we want to know growth for? (NumberOfMonthsToProject)

GrowthFormula

So lets take this formula and try to calculate the growth for a database.  Lets say we have a database that is 10240MB and it is projected to grow by 10% per month.  What would be its size in 12 months from now?

NewSize (MB) = 10240MB * [(1 + 10%) ^ 12] = 10240MB * [1.10 ^ 12] = 10240MB * 3.14 = 32153MB approx.

We can verify our final value by doing manual calculation like so …

Month Size @ Start Growth Final Size
1      10,240.00    1,024.00    11,264.00
2      11,264.00    1,126.40    12,390.40
3      12,390.40    1,239.04    13,629.44
4      13,629.44    1,362.94    14,992.38
5      14,992.38    1,499.24    16,491.62
6      16,491.62    1,649.16    18,140.78
7      18,140.78    1,814.08    19,954.86
8      19,954.86    1,995.49    21,950.35
9      21,950.35    2,195.03    24,145.38
10      24,145.38    2,414.54    26,559.92
11      26,559.92    2,655.99    29,215.92
12      29,215.92    2,921.59    32,137.51

We can see the final size is approx same as one calculated above. Most likely difference occured because I rounded off the “3.14”; where as table above is generated via Microsoft Excel handling the rounding for me.

Help! How do I find disk I/O problems?

October 3, 2014 Leave a comment

This is a big topic, so the advice below is not meant to be comprehensive; but more of generaly troubleshooting walkthrought that you can use to try to identify the problem.

Disk performance generally is measured in the disk latency we are experience for reads and writes.  For reads going to disk we would like to have disk latency between 8 – 15 ms (measured by Avg. Disk Sec/Read) and for writes with cache backup we would like disk latency to bet between 4 – 8 ms (measured by Avg. Disk Sec/Write).   In addition to these we need to understand how my IOPS (Input/Outputs Per Second) are being generated on the server.  These counters can be measured by (Disk Reads/Sec, Disk Writes/Sec, and Disk Transfers/sec).  We need to understand the IOPS because generally there is upper limit on amount of IOPS a disk can handle.  For example a 15000 RMP disk can handle approx. 150 – 225 IOPS/sec; or if we are working with SAN then undestanding the ture volume can be benfical.  It might be difficult to assess how many IOPS your SAN can support; however you can have an intelligent discussion if you know what are the IOPS you are generating.

For SAN communication through Fiber Channel, we have to also consider the HBA Queue Deapth Setting.  By default HBA cards Queue Depth Length now days is set to 32.  Which is usually sufficient, however we can use SQLIO (stress testing tool) and SQLIOSIM (testing San/server with similar I/O patterns as SQL will generate), to test other settings; in some cases higher HBA queue depth setting can lead to better performance.  Recommend working with SAN and HBA vendor to figure out best setting for you.

After this we have the throughput being generated, we can look how many bytes are being transferred per second (Avg. Disk Bytes/Transfer).  This can help us assess the total data being pushed through HBA (Fiber Channel) or Network (iSCSI).  A 4Gbs HBA can support 476MB/s throughput at MAX; a 10Gbs can support 1192MB/s throughput at MAX.  However both these methods have limitations defined by the fiber and switches they must travel through and the speed negotiate.

We also like to look at the Avg. Disk Queue Length for both Reads and Writes (Avg. Disk Read Queue Length, Avg. Disk Write Queue Length, and Avg. Disk Queue Length).  It is hard define a range for these values because this calculation used to be done by knowing the number of spindles assigned to a LUN.  However in most modern SANs that is difficult to assess.  That said, it is still important this number is large like 30 or 40+; that still represents an issue. If that happens you should see multiple I/O stalls in SQL Server and disk latency should be poor.

These counters together with underline hardware configuration understand can be used to understand the bottlenecks on the disk subsystem.  For example, lets say we have a blade center enclouser with 16 servers.  Which has 4 HBA cards, each being 4Gbs (total throughput of 1904MB/s) being shared amount 16 servers in total.  However each server in blade center can access at most two HBA at any given time, therefore each server effective throughput is limited to 952MB/s.  So therefore 952MB/s is being shared by at least 8 servers within the enclosure; without knowing the other server workload and assuming equal distribution; we are left with final bandwidth limitation of 119MB/s per blade centre server.  Therefore having understanding of underlying hardware can help use runderstand the numbers we collected; are they an issue or they with in acceptable parameters.

Other indication of the I/O subsystem bottle neck is we start seeing stalled I/O messages with in SQL Server error logs.  In addition to this you can also see stall i/o requests with in SQL Server DMVs (sys.dm_io_pending_io_requests) and the waits being generated on disk by looking at latch waits in (sys.dm_os_wait_stats or sys.dm_os_latch_stats).

Enterprise Policy Management V4.0 Released

October 1, 2014 Leave a comment

EPM is great tool that builds on top of the SQL Server Policy Mangement and Central Management Server to provide you overview of your enviornment.

It is fairly easy to implment and EPM team has made many updates to the indexes, statistics, to improve its performance.

Please check it out @ http://epmframework.codeplex.com/

SQL Server PFE Position in Canada

September 30, 2014 Leave a comment

ROLE DESCRIPTION

The Premier Field Engineer provides advanced on-site and remote technical troubleshooting support and problem resolution for Microsoft Premier Customers. Candidate must be able to resolve complex issues escalated to the highest levels of management while ensuring satisfaction with Microsoft software by setting customer expectations, devising and implementing action plans and professionally communicating to all parties involved.

https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=6&jid=146017&jlang=en&pp=ss

Check it out guys and gals; if your intersted, feel free to ping me here or linked in :).

Categories: Announcement
Follow

Get every new post delivered to your Inbox.

Join 241 other followers

%d bloggers like this: