Okay, the server move is done, and we’re even on the same IP we were on before we moved. The machines will still answer on the old IPs for the next couple days, but it looks like world DNS has caught up and I’m only seeing a few bytes a second over the old link.
Also, I’ll take this move as an opportunity to close out some old stuff that hasn’t been updated in a million years, over the next few weeks. Old domains, projects, email accounts, things like that. I’m not deleting anything or removing anything that looks like it’s been used recently, but if I do, email me and I’ll fix it as soon as I can.
As most of you know, I’m closing my office in Tacoma and combining my work and living situations. Part of that move is relocating my main servers out of a rack that I keep to a recently built server at a colocation facility in Phoenix, AZ. I’m happy to report that the server is built, shipped, and booted at the new facility, and all of the virtual machines that were running in Phoenix already (mostly podcast distribution) have been running on the new hardware for several days.
Today, I’m moving the virtual servers in Tacoma to their new home. This means shutting down each server one by one, refreshing the copy on the new server so all the data is up to date, and starting it on the new server. The process takes about half an hour for each server, more for some, less for others. During the copy time, those services will be unavailable.
Once this is done, I’ll be changing DNS entries to point to the new IP address, or contacting people to make that change if I cannot.
I’ve set up a dual-routing and VPN system that’ll let the new servers respond at the old and new IPs at the same time, cutting out transition outage time. It does mean that access at the OLD IP will be a bit slower and have higher latency, since all requests go back out over the network to Phoenix. But it will work as normal until we’re fully moved out of the old IPs.
If you have any questions, please contact me.
As of this moment, I’m taking snapshots of the virtual machines for import into the new server, which will be shipped to Phoenix by the end of the week.
I’ll be syncing the data parts of the virtual machines before we begin the switch-over, but to make things simpler, please keep changes to a minimum, or keep track of what changes you make and be prepared to make them again on the new server.
The server was down from about 4:00 AM until 11:30 AM– on a limited basis, with sporadic service. Problem was upstream, badly defined, and now resolved.
I’d look into it more, but we’re weeks away from being elsewhere.
The spam filter stopped running sometime overnight. I have restarted it. You’ll see a flood of your usual commercial usual, which stops now.
If you think you’re getting a lot of spam, so much so that the filters seem to have stopped running, please check the headers for X-Spam lines.
I’m doing some upgrades to the mail server to make it a little easier to manage and soak up less resources. I’ll be doing some VM magic to make the outages last only a few seconds at a time, so there shouldn’t be any noticeable impact on delivery.
Looks like I had a strange cascading failure today that began with a switch freaking out, which took the router offline and caused the main virtualization server’s ethernet driver to crap the bed.
We’re back now.
However, the virtualization server took it’s time and hiccuped a bit because of LVM inconsistencies and volumes that aren’t there. Now that the system is back online, I’m going to focus on getting that taken care of, and making sure the last few sets of backups are still good. This might mean things are intermittently slow as I throw commands at LVM.
Meanwhile, it’s time for me to announce that we’re closing the Tacoma office in favor of a joint living/workspace situation. Now that we’re paying for our own rackspace in Arizona, I’ll be moving vis.nu and the rest of the stuff back there in a new server.
So, the Comcast tech is here in the building, and he says that the WA market doesn’t use modems other than the one I’ve got for business accounts, so the fix I was hoping for was a bust. He says there’s a lot of RF noise and he’s working on that, and he’ll swap out for a different modem of the same type. I was told that the routing problem we’re seeing is a firmware issue, so I don’t hold out a lot of hope for this being a good solution.
So I’m on to Plan F: Abandon the affected IP addresses and ask for my money back. I’m going to start the process of moving said services onto a different IP. Since there isn’t a physical move involved, there should be no appreciable outages (other than when the modem decides to crap the bed again).
Puget Sound Atheism and vis.nu Networks are the only affected subsystems– basically, everything brought into the corporate substrate from The Great Convergence. It’s all one set of servers now, but it still speaks on three addresses.
I’ll just move them onto the same IP as Tacoma Telematics, and all should be well. This is also a temporary solution, as there’s likely another move coming up.
It’s a situation of everything happening at once.
The hypervisor is fixed, the router is running smoothly, and I’m even making progress on my terminal server. I’ve upgraded the Hypervisor routing tools, and they’re working more efficiently than before. But vis.nu and PSA would still stop talking to the world now and again, and flushing caches didn’t fix anything this time. I started to suspect that it was something upstream from here.
I placed a laptop outside of the rack, and the it can see PSA and vis.nu when the rest of the world can’t, no problem. Next step is the CPE, so I rebooted that. Lo and behold: everything came back up. I’m on the horn with comcast, just got handed to level 2, and I’m working on doing some long term testing.
At this point I’m kind of hoping the problem comes back, so they’ll replace the router. But if I were a betting man, I wouldn’t put money on it.
Update 21:23 PM: Finally got a solid answer out of Comcast after about 12 hours of fighting– a new firmware push introduced a bug where the CPE would drop the last two IP addresses on /28 networks. They’re sending a tech between 8:00 and 10:00 tomorrow to install a modem with a different firmware, which should fix the problem. For now, I’ve moved the comcast router into my remote rebooter so that if/when this happens again tonight, I can reboot the modem without having to live at the office.
The Tacoma production server is on the upgraded packages, including the new server. The reason the kernel was being strange was my fault. After working some things out, the new kernel is booting fine and the VMs are running. I’m going to nap in the back room now (too bleary-eyed to drive) and then get back to it.