Saturday, 7 June 2014

Why you should never clean up.

I was running some admin tasks on the database earlier, and when it came to the largest file, the one that contains all our words of wisdom, it crashed while parsing the one million plus rows of drivel we've produced over the years.  It doesn't look like it's going to come back without some intervention, so a support call has been put in.

Over the last couple of months we've had a growing concern that we might be outgrowing our current server, and we've been discussing a move to a more capable box behind the scenes for a couple of weeks now.  If we decide to go for this, the monthly cost will increase, but I'm hopeful that we'll then be running a better machine that should see us through the next couple of years.

Watch out for updates as we progress with the decision.


UPDATE: The server was rebooted this morning after some work overnight, and it wass clear that there were still some problems, so I switched off the forums, and some of you might have seen the "offline" message.  Support later switched off the server's HTTP Daemon, which serves up all the web pages to the users, so you'll now ge the "Unable to connect" message from your browser.  Once the next lot of repairs have been carried out, I'll do a bit of testing, and if everything works at that point, I'll bring the forum back online.  There may be a few missing posts after that, but if you have any errors, just report them and include the actual text of the message to help us with our repairs.

I think this incident has demonstrated that having grown a LOT during the last year to 18 months, we've out grown our relatively new server, so we're already looking at replacing it just as soon as we can.  We've been deliberating spec and pricing with Support for over a week now, and are getting there, but just ironing out a few wrinkles and trying to keep the costs as low as we can manage.

More news as and when it happens.


UPDATE: The server is now right-side-up, and I'm running some re-indexing tools to syncronise the posts and threads with eachother after the database repair.  As you can imagine, that's a big task, and it'll work better with everyone out of the site, so it's still switched off :)  The biggest job is out of the way, but I'm re-syncronising everything else for completeness.


Tuesday, 3 June 2014

It's a trap!

The site has gone down, possibly related to a Hard Disk check we're running with Support.  We had a SMART message about one yesterday, and they've been testing it since.  It might not be related, but they're aware of the site being down again, so we'll be back soon(ish), with luck :)

EDIT 15:32 BST: I've got FTP and Control Panel up, and it looks like the site is waking up slowly too.

EDIT: 16:34 BST: I think we know why the crash happened, and now we have a couple of options of what to do about it.  One will involve several hours offline, the other will require money.  Was it ever thus?