News: Server outage
Over the weekend, our VPS server supplier – 123reg, have had some major problems with hundreds, if not thousands of servers under their control. Unfortunately it involves one of our servers with websites and emails hosted on.
This is the latest email we have received from 123reg:
I am writing to you to explain what happened to some VPS services on 16.04.16. This email is to detail what our steps have been. I am committed to open communication with all customers and would like to take this opportunity to explain in detail.
So what happened to some services? As part of a clean-up process on the 123-reg VPS platform, a script was run at 7am on 16.04.16. This script is run to show us the number of machines active against the master database. An error on the script showed ‘zero-records’ response from the database for some live VPS. For those customers, this created a ‘failure’ scenario – showing no VM’s and effectively deleting what was on the host. As a result of our team’s investigations, we can conclude that the issues faced having resulted in some data loss for some customers. Our teams have been and continue to work to restore. What have we done? We have been working with an extended team of experts and have left no stone unturned. Our teams have been working long into the night to restore as much as we possibly can. We have also invested in external consultants to recover, in the best way possible.
We have recovery running on the VPS servers and some are restoring to new disks. We have also begun copying recovered VPS images to new hosts and we expect some VPS to be back up and running throughout the night and in to tomorrow.
Our teams have worked for more than 24 hours and will continue to do so. No stone is being left unturned.
As the technical teams come back with updates for individual VPS we will communicate updates to customers.
For those customers with their own backup of their settings and data, if you wish to restore services yourself you can do this by issuing a reimage command through your 123 Reg control panel, this will give you a freshly installed VPS on a new cluster, where you can restore your service.
I understand that some customers may have lost some confidence in the service that we offer. So, I want to explain what we have done to prevent this happening again. We have started an audit on all cron-jobs and scripts controlling the platform, and associated architecture, so that no script will have ability to delete images, only suspend. For image deletion for those suspended over 28 days we will have a human eye to double check. A new platform will be available by the end of the year for customers which we will provide self-managed and automated snapshot backups, in addition to architecture technology to backup the whole platform, something that is not available on the current platform. I hope this goes some way to win back your confidence.
123 Reg Brand Director
This may not make much sense to you but ultimately there could be major data loss and we might need to rebuild the server from scratch. I have backups of websites so hopefully very little will be lost there. At the moment I am unsure of the email situation.
I will keep customers updated but at the moment I only have the information above so I am unable to tell you when the server will be back. We will then assess the damage and make a decision whether to rebuild with 123reg or move to another supplier. These recent unusual outages have been detrimental to many business already and I have now lost confidence in them to provide the service they advertise.
I apologise for the inconvenience and disruption.