So a SQL Server Transaction Log ate Your Free Space.

This weekend I came across an unusual circumstance that I thought I would share with many of those part-time SQL Server admins. I currently maintain more than a couple of SQL servers. Because SQL Server has a good maintenance program I don’t spend the money on a third party software for backup. Instead I setup the maintenance plan to create a backup every 6 hours and then push the file to a network share. For some reason or another, the network share came detached and the backups filled up the local data volume. This effectively shutdown the server. I cleared up the space, restored the mapping, and didn’t think much more about the problem. I noticed that I was getting a backup file from each database but failed to pay attention to the transaction log.
This is where my new problem that consumed my weekend started. Friday night at 7pm I got another phone call about the SQL server being out of disk space again. Again I had no space on the volume, but the space wasn’t consumed by the backups. Instead, the transaction log which is normally a couple of gigs in size had ballooned to 100GB in size. I had attached an external USB drive to push a backup of the transaction log to and tried to shrink the transaction log from the SQL Server Manager. This only give me about 3 GB of storage back, but they were quickly consumed as soon as the end users started using in their application again. I then kicked off a backup of the database and then transaction log. I now had 99% of the space free in the transaction log file, but still could not shrink the database. I had fought and fought with the database trying to get that free space back.

Finally at about 2am, running out of ideas, I deleted the transaction log file and started up the database again which effectively locked the database for a lot of people. Having migrated the database before, and knowing that a simple restore of the database could easily fix the problem, I took the most recent backup which was actually taken after end users were cut off from the server and restored the database. After the restore, I again had the same problem of a database with a 100 GB transaction log file. This time however, I for some reason threw caution to the wind and performed yet another shrink to the transaction log file. Finally, I freed up 75% of the space on the volume which allowed everything to return to normal.
Why I had to backup and restore the database before I could perform an effective shrink of the database, I do not know. If this has happened to other people, I would like to know the reason behind this.
My corrective actions include scripting a compression command on the backups to reduce their size.  I also plan on creating a trigger to notify me by email when the disk space is low, 20% is one of my favorite guide lines as far as that is concerned. I am considering running a network mapping script to reattach the volume of the server before the files are moved over so that the network volume that I monitor won’t be so easily missed with the other backup files that I file on the backup storage volume.  I don’t like using compression because of how having to decompress a file to restore it effectively adds to the lengthy process of getting the database back to working order.  Then again, having a few extra copies of the database around is also handy.

I am open to other input. I thought I would just share my wonderful late night experience with others in hope to get some improvements or perhaps help out other admins who might run into the same problem.

12 thoughts on “So a SQL Server Transaction Log ate Your Free Space.”

  1. It’s stuff like this that makes me yearn for a SAN. I feel like if I had one, I could just eliminate whole classes of problems with stuff like the network mapping being lost, especially if the server is a VM. Then again, who knows if the added complexity of a SAN wouldn’t introduce issues of its own?

    Maybe it was simply taking the DB offline and bringing it back online which allowed the shrink to happen. I’ve had issues in the past where just doing an offline/online let me fix things. Sometimes you can connections that just refuse to be closed and the DB refuses to do anything while they are open…

    J.Ja

  2. We run an Enterprise EHR system written by Allscripts. As you might imagine, this is a distributed application running across many servers terminal, web, sql server, print servers, all on dell rack mounted systems.

    I would recommend you consider Big Brother or Nagios to help you catch things before they get fubar.

  3. @Dietrich Schmitz

    That’s probably good advice. We have Novell’s Sentinel Log Manager which was can also automate many reports when tooled correctly. I need to spend some time toying with the automation. The fact of the matter is that when trying to monitor twenty plus systems, spending the money to get something to monitor them while trying to handle other tasks makes good sense.

    Novell SLM is actually overkill for what it’s being used for, but due to a competitive bid that couldn’t be passed up, I took the opportunity which hopefully will pay off in the long run.

  4. I’ve been using SCOM 2007 R2, it does the job and automatically alerts on stuff like this out of the box. Miserable setup scenario, though. We use SCOM because it’s free to us, but I feel like it if wasn’t free and integrates nicely with the rest of the Microsoft stack, I’d seriously consider a different solution.

    J.Ja

  5. Did you look at Microsoft KB 873235? How to stop the transaction log of a SQL Server database from growing unexpectedly.

  6. I had similar problems in the past. When I first began using SQL Server I was warned by a good friend that I should monitor the transaction logs. He said that running out of space on a SQL Server was a common problem that could be avoided.

    Since then I have used Maintenance Plans to backup, reorganize and shrink the databases on a schedule. They work well. There are a few tricks that you need to be aware of and you need to test the maintenance plan several times to be sure it is doing what you want it to. Monitoring it for a few weeks is also prudent.

    Something to look at if you haven’t already done so – look at the database properties. There are parameters for autoshrink. If you need some assistance, just email me.

  7. George Ou :
    Justin, didn’t one of the companies you work with have a SAN meltdown?

    It did, now that you mention it. I seem to recall it being a configuration issue, or one of those situations that the vendor swore was “impossible”.

    J.Ja

  8. I’ve seen my share of SAN meltdowns first hand, and know of other people who have dealt with “stupid human command line tricks” that accidentally formated the wrong partition. And of course the tape restore failed which doesn’t surprise me one bit which is why I swear by hard drive backups.

    Aside from the disaster issues, there are logistical issues as well whenever you do maintenance. You have to meet with every department to come up with a shutdown plan and hope that you can find a least disruptive date for everyone. This is why I hate interdependencies.

  9. SAN Technology is coming a long ways through. Take IBM’s XIV or fourteen as I like to call it. The XIV doesn’t actually format or wipe drives until it is full, It simply points to a new destination and builds new partitions. You take a snapshot and you get a new location of duplicate data. No data lost and your old data is preserved. This reduces the amount of time to create and duplicate as well as creates a lot of redundancy. I know through redundancy we add complexity and through complexity we get additional failure points that are usually unaccounted for such as cache battery failures and other goody items that people fail to take in account. One big one that I remember IBM was pushing when they sold the ESS brand storage arrays was to hook the device into more than a single UPS from two different power sources so that if one power source died, a second one could hold the system up. Little things like this are easily missed because people assume that just because the dual power supply is purely for if the power supply fails.

  10. I found that Microsoft Server 2008 R2 has a reporting “role” that will manage space usage on the hard drive. It is under the file services roles and the administrative tool is called File Server Storage Manager. It lets you set quotas and if they are hit, it can send you a report by email, etc.

Comments are closed.