Datacenters are always nice: nothing but servers, switches, cables and a lot of fan noise which is, surprisingly, quite soothing. The only downside is that I'm forced to stand in the "hot alley".
The way it works is this: A datacenter is basically just a huge hall with cages that hold the servers. Back in the old days they also had massive air conditioners to cool that entire hall. Then someone got smart and figured out that they didn't have to cool the entire hall, but just a small section by using cold- and hot alleys.
The way that cold- and hot alleys work is pretty simple: Servers keep them self cool by sucking in air from the front of the server, and blowing it through the inside to the rear. So if you make sure that the front of the server is in some kind of tunnel, then you only have to cool that tunnel. The servers will suck in the cool air, and blow it out the back. The front of the cage holding the servers, is in the tunnel, and the back is out in the open. This saves a lot of energy, due to the smaller air conditioners.
But, yeah... when you have to stand in the "hot alley" for a longer period, it's not so great. It's at least 35° Celsius, and the air humidity is very low. Quite the sauna indeed, except there's no steam and naked people here (I hope )
About 3 years ago I migrated from FreeBSD to Windows 2008, because I was unhappy with the manageability. Most of this dissatisfaction with the manageability was due to my own lack of effort and interest at that time. Last year I switched back to FreeBSD with a renewed passion: this time I would make it work; and I did!
My server has been running flawlessly for almost a year on FreeBSD 8.2, and I'm very satisfied. There is however always room for improvements, and optimizing stuff is somewhat of a hobby (though some might call it an obsessive-compulsive disorder ) of mine. Recently FreeBSD 9.0 was released, with a couple of new features that really interest me. I could just upgrade to this newer version on the fly, but I've decided to drive to the datacenter and install it from scratch.
The main reason for this is that FreeBSD 9.0 has a big improvement on the file system which I'm very keen on using. However, updating a file system "on the fly" is just something that I'm not willing to do just yet, So tomorrow is road trip-time. If I've got the time, I might take some pictures or maybe record some video while I'm at the datacenter.
With the new version of the operating system I'm also making some changes to the way stuff works on my server. These changes will include switching to high-performance web-server software, tightening up security even further without compromising performance, rewriting some code for the website and adding a decent CMS for my site. I might make tutorials, editorials, etc for each of those individually though.
But first things first: Road-trip tomorrow!
As most other system administrators, I put a lot of value in having a stable server. Unfortunately it is always possible that, for whatever reason, your server "hangs" and becomes unresponsive. One of the most common reasons is a Denial of Service attack (and sometimes bugged anti-virus software ) which generates 100% CPU usage and causes your server to become unresponsive.
To prevent stuff like this from happening, something called a watchdog was invented. The basic principle is real simple: the watchdog has to be reset within X seconds, or else the system will reboot. FreeBSD has support for both hard- and software based watchdogs. Since my server has an Intel ICHxx chipset, I logically opted for the hardware based solution.
Before making permanent changes to my kernel, with the possibility of wrecking my server, I had to determine if my server would actually support the interface. Since my server has an elevated kernel security level I first had to reboot it with level 0 security before being able to load kernel modules:
ams01# kldload ichwdNothing happened, the world did not implode on itself, my server did not suddenly reboot itself; This was a good sign. Fetching a list of the loaded kernel modules confirmed that the module was in fact loaded:
ams01# kldstat Id Refs Address Size Name 1 7 0xffffffff80100000 6abc20 kernel 2 1 0xffffffff807ac000 8b8 accf_data.ko 3 1 0xffffffff807ad000 1580 accf_http.ko 4 1 0xffffffff807af000 3818 ichwd.ko
And consequently, a quick peek in dmesg also told me that the interface was recognized and support:
ichwd0: on isa0 ichwd0: Intel ICH9R watchdog timer (ICH9 or equivalent)
Excellent! Of course loading a kernel module manually would mean that it would not be loaded anymore after the first reboot, and I still had to reboot the server to restore the kernel security level). I had two options now: either I compile a new kernel with the ichwd device enabled, or I tell the system to load up the kernel module at boot-time. I decided to go for the second option:
echo 'ichwd_load="YES"' >> /boot/loader.conf
Once I update the system to a newer release of FreeBSD, I have to compile a new kernel anyway, but for now this will do just fine. The next step was to enable the watchdog daemon that will be doing the polling:
echo 'watchdogd_enable="YES"' >> /etc/rc.conf /etc/rc.d/watchdogd start
I let the server run for a few minutes and nothing happened; which is good... it should only do something if something is wrong, after all. Since I had to reboot the server anyway to restore the kernel security level, and I wanted to see what would happen if something did go wrong, I killed the watchdogd process and waited. A few seconds later, suddenly my SSH connection was terminated. About 30 seconds later I received a text message on my phone that the server had rebooted itself.
Well well... It seems to work just fine! I sincerely hope that I never actually have to use this failsafe though
Back in 2008, I bought a Western Digital MyBook "World" 500GB NAS. It was a decent NAS, but mine had some cooling issues which were probably related to where I kept it. It was on top of a closet, where it was pretty dusty and warm, which isn't very beneficial for hard disks. So after a while it started developing issues: it would randomly give time-outs and become unresponsive. After a reboot it would work for a while, but after a week or so it would give timeouts again.
I quickly moved all essential data to my computer's hard disk, and powered down the NAS until I had found a replacement and would copy the rest of the data as well. But I made a cardinal sin when I powered down the NAS: I made an assumption. As we all know, assumption is the mother of all fuckups and Finagle's law will apply. I assumed that by turning off the NAS via the big button on the front, it would completely shut down the NAS. Unfortunately, I found out that this was not the case when I had a new storage device a while back and was ready to migrate all my data. The NAS felt quite warm, which surprised me to say the least. Apparently when you push the big "on/off" button at the front, it only powers down the little main board; but it keeps the hard disks spinning. To power down the hard disks, you have to unplug the power supply or flip a small switch on the back of the device. When I tried to boot up the NAS, it didn't do anything at all.
Crap! now what ? Did I just lose a large portion of my data ? I wasn't ready to give up just yet. My geek-credibility would be at stake. Since the NAS was essentially dead, I carefully dismantled it and took the Hard drive out. Because i didn't know what the state of the hard drive would be, I didn't want to plug it directly onto my computer's main board; after all, Finagle's law was in effect and I didn't want to blow up my main board just yet. I went online and bought a SATA/IDE to USB 2.0 Adapter. Since it was a Linux based NAS, I installed Ubuntu Linux on a USB stick and booted up my laptop.
First order of business was to determine if the Hard drive would be salvage-able. As soon as I plugged in the power to the hard drive I could hear a soft whirl indicating that it was still spinning up. Since it would be the third SATA device (the first being my laptop's internal hard drive, and the DVD-ROM the second) connected to my laptop, it would have to be /dev/sdc that I was looking for. A quick peek in the boot log confirmed that the system had indeed detected the hard drive. Next up was to see if the hard drive would still operate, and which part I had to restore:
root@ubuntu:/# sfdisk -l /dev/sdc Disk /dev/sdc: 60801 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdc1 3 368 366 2939895 fd Linux raid autodetect /dev/sdc2 369 381 13 104422+ fd Linux raid autodetect /dev/sdc3 382 504 123 987997+ fd Linux raid autodetect /dev/sdc4 505 60800 60296 484327620 fd Linux raid autodetect
Well, well... this is interesting. Even though the NAS only had one hard drive, Western Digital still made all the partitions part of a raid set. My best guess is that Western Digital did that because the MyBook line also featured larger models with two hard drives in it; and they would be able to use the same configuration for those. In my opinion a sloppy solution, but more about that in a moment. In any case, the device showed four partitions, of which one was roughly 480GB. This was clearly my target. Since you can't just mount a raid partition, I first had to find a way to recover the raid set:
root@ubuntu:/# mdadm --examine --scan /dev/sdc1 /dev/sdc2 /dev/sdc3 /dev/sdc4 ARRAY /dev/md4 UUID=dee2de1f:eec30950:726acf0d:54e86bff ARRAY /dev/md3 UUID=e1e3ad6e:0753b0ae:86339bf0:ca6e9d49 ARRAY /dev/md2 UUID=51a4640a:c27e3a2e:4b454248:eeac819a ARRAY /dev/md1 UUID=6afd8edb:48c319d6:c6d492e3:868da876
Good news so far, Linux recognized the arrays. Now we could attempt to reassemble the raid sets:
root@ubuntu:~# mdadm --assemble --scan mdadm: /dev/md1 has been started with 1 drive (out of 2). mdadm: /dev/md3 has been started with 1 drive (out of 2). mdadm: /dev/md4 has been started with 1 drive (out of 2).
Of course, with a raid1 array the system would expect two drives. But since the NAS only had one hard drive, it would mean that the array would always be in a "degraded" state. This won't stop the array from working, but to me it seems rather sloppy. It also seemed strange to me that it wouldn't reassemble /dev/md2, but since /dev/md4 was my main concern and the hard drive would be scrapped anyway, I couldn't be bothered to investigate it further. Fortunately, /dev/md4 was reassembled properly and I was able to mount it and access my files:
root@ubuntu:~# mount /dev/md4 /media/nas_backup root@ubuntu:~# cd /media/nas_backup root@ubuntu:/media/nas_backup# ls -l total 24 drwxr-xr-x 2 root www-data 4096 2009-04-25 18:10 backup drwx------ 2 root root 16384 2002-02-28 12:30 lost+found drwxr-xr-x 3 www-data www-data 4096 2010-05-07 05:31 PUBLIC
Copying the data took quite a bit longer than I thought. It was only 93GB, but it took well over 3 hours to copy it from the hard drive to my new storage system. A possible cause for the lack of speed was the fact that it was an USB adapter, and I'm not sure if I actually hooked it up to a USB 2.0 port on my laptop. On the other hand, the hard drive wasn't in the best state either, so that could also have been a bottleneck. It was totally worth the wait though; my data was secured.
*phew*
It took some blood, sweat, tears and a lot of gasoline; but we're back on the air, and we're cruising on FreeBSD
After postponing, delaying and deferring the issue for quite a bit of time; it was getting kind of embarrassing to put off the migration, and the worst part was that I didn't have an excuse not to do it. I had picked a date in my agenda to do the actual migration; which was on a Friday. But on Thursday I was bored, and decided to do it one day earlier. That decision may or may not have been rushed by the fact that my server was having yet another issue with the virus scanning software.
I downloaded FreeBSD-8.2-RELEASE-amd64-disc1.iso, made a final backup of my server data and got ready to make my way to the datacenter where the server is hosted. You can enter the datacenter 24/7, but they do require you to register on a website so they know who is coming. While trying to register I got an error on the website. I emailed the hosting company that I was unable to register on the website, but that I was en-route and would need access to the datacenter.
When I got to the datacenter and tried to log in, the system said there was no registration for me and therefore it could not let me in. I called the hosting company's helpdesk to ask why they hadn't arranged for access. The guy on the phone said that they had fixed the problem that was preventing me from registering, and that I should be able to register now. I told him that I was already at the datacenter, and asked if he could register access for me. He told me that they're not allowed to do that, and suggested that I use my smart phone to register. I told him that I had already tried that, but the website didn't work because it redirected to some kind of status page as soon as it detected that I was using a smart phone instead of a desktop pc. After some arguing with the helpdesk about how I would get access to the server without having to drive back to my home or harass Daniel at work, the security guard of the datacenter offered me use of his private laptop to register for access. Some bro-fists were exchanged and I was finally able to go inside.
I hooked up my USB CD-ROM player to the server, and made it boot from CD... or so I thought! While trying to boot, it got stuck halfway in loading the kernel. Switching USB ports, rubbing the CD; none of it seemed to help. Man, I was pissed! But I also facepalmed, because I neglected to check if the CD was working before driving off to the datacenter. I bro-fisted the security guard again, told him I would be back in a bit, and drove back home grumpy and hungry.Back home I downloaded FreeBSD-8.2-RELEASE-amd64-bootonly.iso to save some time. I double and triple checked that the CD was booting and working properly. A quick bite later I was on my way back to the datacenter. I hooked up the CD-ROM player to the server again and... it got stuck halfway in loading the kernel again! Needless to say, a small mushroom cloud would have manifested itself above the datacenter. I looked around the datacenter to see if someone else was there. I got lucky; some American guy was working on a couple of servers and had a CD-ROM with him that I was able to lend for a few minutes. Unfortunately, it gave the same result as with my own CD-ROM.
After cooling down a bit, I decided to bring the server home to figure out what the deal was. the brand of CD-R's, a driver issue, a BIOS configuration issue, the ISO's being broken... It could be a lot of different things. Back at home I decided to download FreeBSD-8.2-RELEASE-amd64-memstick.img and try to boot from an USB memory stick instead; which worked perfect the first time; man I was relieved! Since it was already late I decided to continue the next morning.
The next day, everything went as planned. I installed FreeBSD on the server, did some minimal configuration so that I would at least be able to receive some email, compiled a custom kernel and drove back to the datacenter to shove the server back in the rack. The rest of the weekend I spent tweaking the configuration and debugging some PHP scripts to fix case sensitive pathnames, etc..
Over the next few days or weeks I will probably need to do some minor tweaks, but right now I have everything running pretty much the way I wanted, and couldn't be more happy with it. It's so nice when everything works out the way you had it in mind.
During the "downtime" caused by the Kaspersky update, I started to browse for alternatives. One of the alternatives that caught my eye was Clamav, an open-source virus scanner for UNIX systems; Although there is also a Windows port available. As I was peeking a bit through the options and features, an idea sparked in my mind; A memory of an old love that popped up, so to speak.
I tried to dismiss the idea but it kept haunting me, and eventually I surrendered to the unspoken desire: I wanted my old love back, no matter what it takes.
In the last week of January 2011, version 8.2 of the FreeBSD operating system will be released. You might wonder why I'm mention this on November 2nd, but it has a reason. Basically I've got 3 months to freshen up my UNIX skills, convert my sites and services so that they can work with FreeBSD and work out some new stuff. I've installed version 8.1 on my laptop, which will serve as a staging / development template.
I've added a link in the menu to give an overview of the project status. I've done a lot of research and all the issues that made me decide to migrate to Windows in January 2009 are no longer an issue. Maybe I was just lazy back then, or maybe I was just tired of doing the research... Whatever the real reasons were, they're off the past. My love for the FreeBSD operating system is revitalized and stronger than ever. After 2 years of Windows, we're going back to FreeBSD!
Apparently something is wrong with the latest update from Kaspersky Anti-virus, because the last few days the CPU load on my server has skyrocketed to 80-100% load on average. This is caused by two worker processes from Kaspersky Anti-virus (kavfswp.exe) that take up 40-50% each. I've never had this problem before, and reinstalling the software temporarily fixes it, but as soon as it kicks in an update cycle for the anti-virus definitions, it starts all over again.
I'm not too happy with my server having high load. Aside from slowing down my websites, it also consumes more power and I don't know how happy the datacenter is with that. Technically I'm allowed to use 400mA for the server, but due to this nice CPU load bug It's been pumping 464mA. Some searching on Google only told me that in 2009 there was a similar problem. It was caused by an error in the anti-virus definitions and it was solved a week later when Kaspersky released new anti-virus definitions. I hope it's a similar issue, and that it will be fixed soon.
I could disable the anti-virus for the time being, but I don't know if that's such a good idea. Sure, I'm the only one that uploads files to it; but still... I don't like the idea of using an unprotected server. I've temporarily disabled videos till the problem is solved.
Update October 25th, 2010 - 12:19
It seems that I'm not the only one with this problem, judging by this thread on the Kaspersky support forums. Kaspersky promised to release an update that fixes the problem later on today.
Update October 25th, 2010 - 14:47
*phew* The update seems to have solved the problem.
The end of the summer is always a great time to clean up one's house or, in my case, computer. Up until now I had been using older software which, in its own right, is great software; but the newer versions had some features that just could not be ignored.
- Firstly, I've upgraded Adobe Lightroom to version 3.2, which has much better noise reduction than version 2.6, which I had been using previously. This obviously results in better image quality, which in turn makes me even more happy with my photography.
- I've also upgraded my HDR software to Photomatix Pro 3.29. I might be using this to beef up photos from time to time, if the lightening is tricky. Even with a single RAW image, Photomatix can make the lightening a lot better, though the best results are still achieved with multiple RAW files with different exposures.
- And last but certainly not least, I've upgraded from Windows Moviemaker (yeah, yeah, I know.. don't laugh) to Cyberlink Powerdirector 8, which gives me much better editing abilities and, in my opinion, better videos. The two time-lapse videos were made using this software, so you be the judge. As time goes by, and I start to make more videos, I will make a standardized template for the videos.
A nice update, that will give me even better image and video quality... ooh yeah
Put your brand new HTC "Snap" messenger phone in your pocket... Then when stepping out of the car brush your leg against the doorpost. What do you get?
HTC, the manufacturer of the phone, will send a courier to pick it up for repair next Friday, I hope they can get my poor baby fixed soon.
I'm all set. This next Saturday (July 25th, 2009), I will be moving my new web server from the "staging area" (read: my bedroom) to the data center in Amsterdam. Sunday (July 26th, 2009) the old server in Canada will be powered down and dismantled.
Aside from departing from the server in Canada, I will also be departing from Xoops (the CMS that I've been using for 2 years now). I've decided to write my own website code, for a couple of reasons: security, speed and size (also related to speed I guess...).
The more code you have, the slower a site is, and the more can go wrong. Xoops is a very large CMS, with a lot of functionality (most of which I don't use). If I write a minimalistic CMS myself, with just the things that I use; it should - in theory - make the site smaller, faster and more secure.
So... this Saturday my server will go online, but my website will be offline for a while until I've made a basic blog module.
Wish me luck!