Running the Numbers: System, Network, and Environment Monitoring

by Daniel V. Klein and John Sellens

Available November 2009 from USENIX.org

It started innocently enough. I ran a web server, and it was in my third floor office. The office got hot, the machine got hot, so one day I decided to install a new chassis cooling fan, which meant I needed to reboot my server. The whole process took about 30 minutes, and at the end of it, the machine would not reboot. It turns out that the disk had been running far in excess of the rated temperatures, and the drive electronics developed problems. Annoyingly, they worked fine for the months of overtemperature conditions (with brief reboots for operating system upgrades), but when the machine was shut down for 30 minutes, the electronics cooled off more, shrank, and... cracked. No disk meant no webserver – and no income!

Everyone asked "didn't you have backups?" Well, of course I did – daily, weekly, and monthly! Except a few months before, I had changed something "unimportant" in my backup scripts, and never really tested them again (I just assumed that they'd work). And it turned out that I had a collection of nearly-empty backup tapes. Fortunately, the disk platters were undamaged, and for a mere $6,000 to the Drive Savers disk recovery service, I was able to recover all the data, and I was back up and running in a little over week. So much for five sigma reliability.

So I bought a window air conditioner for the summer months, and whenever it got hot, I'd turn it on. Since I was down $12,000 in lost income and disk recovery costs, I was motivated. But I also traveled a lot for work, and I didn't want to leave the A/C on for weeks at a time if I didn't need to.

I needed a way to remotely monitor the temperature of my office, and more importantly, the temperature of my servers. Thus was the innocent beginning for what has become a minor obsession.

I found an inexpensive $25 kit that would allow me (through my computer's serial line) to measure temperature in up to four locations. No special software was required – you could just connect to the serial line and look at the current temperatures. And so I could see the server temperature any time I was connected to the network. But as long as I was looking at temperatures, why not log a history of data? So I wrote logging software that would record the temperatures, and then I added extra cabling and sensors, so now I was keeping track of not only the server termperature, but also the ambient room temperature, the outside temperature, and the temperature in the basement.

And as long as I was at it, I wrote graphing software that would allow me to look at the day's temperature, the week, or the month. And then I added moving graphs, so I could look at specific days in the past. I could tell you when the furnace came on and how long it ran. I could show you how the temperature dropped when it rained on a summer day. I could even show you the blips in the basement temperature that corresponded to the cycling of the dehumidifier. And most importantly, I could tell my housemate to turn on the A/C if the office got too hot. I was hooked.

I built a second system out of an old Novell "pizza box" system (the computer and disk was about 1 foot square and only 3" high), and installed that at a friend's summer camp at Lake George NY. I built submersible sensors into the test tubes I got when I donated blood, and sealed them with marine goop. The sensors measured water temperature at the surface and at about 15' below the surface, as well as the inside and outside temperatures.

The problem was, my little monitoring kit would only support 4 temperature sensors, and my software would only support a single data collection unit. So at both my house and at the camp, I was maxed out. But where there is a will, there is a way. I researched other systems, and discovered that there was a whole variety of data collection systems, and a whole family of sensors (including temperature, humidity, wind, rain, barometric pressure, light, and switch sensors).

I wrote to one manufacturer, and told them that I had this neat monitoring package, and would like to expand it. If they'd send me a sample of their hardware, I'd support it and let them have my software. To my amazement, they said "yes!"

But when they actually sent me $200 worth of hardware, I got to work. Because the new device was ethernet based (and not serial), I had to completely revise my software design. While I was at it, I revamped the graphing software to be more usable, and added support for different types of sensors besides just temperature. I started looking at more data – and the more you look at, the more you learn, and the more you realize you need to look at even more.

I wrote to another company. They sent me another $200 sample system. I added support for wetness sensors, rain gauges, anemometers, and wind direction. I revamped the graphing software to handle radial graphs in addition to linear ones. I got braver and wrote to a third and fourth company, and they sent me $300 and $450 samples. I added support for switch sensors, barometric pressure sensors. This was getting scary! People started writing to me asking if I'd add support for their hardware.

I got listed on HackADay. I learned about more data collectors, and now support a large number of them.

But addictions are hard to satisfy. I now look at my hot water usage with temperature sensors on the pipes. I acquired a power monitoring system (another donation in exchange for software support), and can tell you how much power each circuit in my house is drawing. I have planned a click sensor on my gas meter to measure gas consumption. I have door sensors on the garage, and look at temperature and humidity throughout my house.

So, "why?" you may ask. Well, other than a hobby gone wild, other than "because I can," there is a very good reason: Money!

I bought my turn-of-the-century house in 1980. Since then, I have added insulation everywhere I could. It cost a little, and has saved a lot. My furnace was 60 years old, so when I replaced it, I installed a 98.5% efficient furnace. It cost more, but paid for itself in 5 years. As a luxury, I replaced the window A/C with whole-house air, but I only use that when it is beastly hot. But now that I am monitoring things, I can save even more money.

Running whole-house A/C is expensive, but I discovered that by raising the temperature on the thermostat by 1 degree, I was able to reduce the A/C use by 75% (I can show you the graphs where I compare plenum temperatures to the outside air temperature and humidity). The basement humidifier is essential, but it just has a dial on the front labeled "low" to "high". By monitoring the basement humidity, I determined where I could reasonably set the dial to prevent mold and mildew, yet keep the operating cost down. I've moved my office from the 3rd floor dowwn to the 2nd floor, but the 3rd floor sensors tell me when I have forgotten to close a window on a cold day. And the electrical and gas monitoring help me optimize my energy consumption while still staying comfortable.

But I'm not done yet – a true compulsive never is! I'm going to combine the temperature sensors with dampers in my ductwork, so I can better regulate the temperature in the whole house (right now I manually tweak things, but a scientific computerized approach will be much better). Is the garden getting dry? They make soil sensors! How much lightning was there during that storm last night? They make lightning sensors! Hey Alvin and Parker (my cats), are you hungry or thirsty? Why not monitor the water and food levels in the cat's bowl?

That is where this SAGE booklet comes from: our personal "obsession" with seeing how things work – because if you watch your systems, you can watch the effect your changes make, and then you can make things work better. And if you watch what effect external changes have on your operating environment, you can prevent problems. We do it "because we can," of course, but more so because it makes a difference, in terms of efficiency, cost savings, and disaster prevention.