Azure Space

Doug Easterbrook doug at artsman.com
Fri Jun 7 16:20:33 EDT 2019


hi Andy:

you are so right.    no cloud service is immune (not even ours). — we’ve have 2 x 30 minute outages in 5 years.  one was our colocation centres’ router.  another was one of our firewalls hiccuppnig.     so now we have 3 and 4 of everything.


ticket master has been out (you pointed that out) - so if using cloud services, you need backups.  our co location facility has a number of different ISP’s and paths to the internet…   




here are some azure outages — just by googling them  (google has outages a well)
May 2, 2019 (3 hours)
https://buildazure.com/2019/05/03/may-2-2019-major-azure-outage-due-dns-migration-issue/
https://www.neowin.net/news/heres-why-microsoft-azure-faced-a-global-outage-yesterday/

FEb 2019.  data loss.
https://nakedsecurity.sophos.com/2019/02/01/dns-outage-turns-tables-on-azure-database-users/

Mar 2016 (outage 4 hours), APpril 9 (7.5 hours)
https://jeffputz.com/blog/another-azure-outage-and-why-regional-failover-isnt-straight-forward


sept 2018  (a day or so)
https://www.geekwire.com/2018/microsoft-releases-details-last-weeks-big-azure-outage-servers-damaged-no-data-lost/



or more often — from their own web site of outages: https://status.azure.com/en-us/status/history


so. when people say ‘go AWS’ (who have notiriouly bad hardware) or Azure and expect it to be up 24x7, they are seriously mistaken.





Doug Easterbrook
Arts Management Systems Ltd.
mailto:doug at artsman.com
http://www.artsman.com
Phone (403) 650-1978

> On Jun 7, 2019, at 12:43 PM, Andy Hilton <andyh at totallybrilliant.com> wrote:
> 
> A backhoe took out the TicketMaster Italy service for a day last week by doing exactly as Clifford described and going through the fiber cable to the facility where their server farm lived…..
> 
> Amazingly they managed to get an alternate fiber hooked up within 4 hours - but even so !!!!
> 
> The perils are indeed everywhere….
> 
> Andy Hilton
> Totally Brilliant Software Inc
> Phone (US) : (863) 858 4000 
> Phone (UK) : 0207 193 8582
> Web : www.totallybrilliant.com <http://www.totallybrilliant.com/> 
> Helpdesk : http://totallybrilliant.kayako.com
> Email : andyh at totallybrilliant.com
> 
>> On Jun 7, 2019, at 2:44 PM, Clifford Ilkay <cilkay at gmail.com> wrote:
>> 
>> On Fri, Jun 7, 2019 at 12:58 PM Andrew Stolarz <stolarz at gmail.com> wrote:
>> 
>>> Hi Doug,
>>> 
>>> 
>>> When you ran your numbers, did you factor in the auto scaling approach /
>>> benefits of the cloud?
>>> 
>> 
>> Another question would be what happens when, not if, something malfunctions
>> at the colo facility in which you have your servers? I've experienced all
>> of the following and the list is by no means exhaustive.
>> 
>> Two disks failed at the same time in a RAID 5 set thus rendering a critical
>> server useless. Even with a solid backup, which you won't have if you're
>> not testing restoring from that backup regularly, you will be out of
>> service for at least a day.
>> 
>> I've experienced network switching equipment failures that are too numerous
>> to remember. My servers could be running perfectly but if they are not
>> accessible, they're bricks.
>> 
>> A backhoe took out the fibre optic cable going into the colo facility. We
>> were dead in the water for a day. That colo facility installed another
>> fibre optic cable from another supplier and routed the cable away from the
>> original so that a backhoe could not take out two cables with one scoop of
>> dirt. That didn't prevent the same facility from going out of business due
>> to the first fibre provider jacking up their prices by an order of
>> magnitude because the colo facility had agreed to running a certain amount
>> of traffic through that cable to offset the cost of provisioning it. With
>> the second cable, traffic was halved so the colo facility had to add new
>> customers to drive more traffic and they couldn't do that fast enough to
>> make it economically viable.
>> 
>> That same colo facility had gone out of business and we just didn't know it
>> until the power was cut to the facility for non-payment of the electricity
>> bill. The company went into receivership and if you've ever experienced
>> something like that, you'd know that the receiver takes a "shoot first, ask
>> questions later" approach. It took months to retrieve our equipment from
>> the facility. We couldn't wait months so we provisioned servers on AWS and
>> restored from backups. If we had to wait until new hardware arrived before
>> we restored service, we wouldn't have had any customers left. As it was, we
>> still lost customers who were unhappy with the couple of days that it took
>> to restore service and that after a very intense few days of non-stop work
>> to restore everything. Does it really matter that it wasn't our fault that
>> our customers were down? They were down all the same and some were unappy
>> enough about it to leave.
>> 
>> The power distribution system in a rack malfunctioned and the servers went
>> down for the better part of a day until the faulty PDU was replaced and the
>> servers that had experienced a dirty shutdown were restored to normal
>> service.
>> 
>> These are just the ones that I remember. I know we had others.
>> 
>> If you have critical services, putting your servers into one colo facility
>> is a recipe for disaster. You must be diversified geographically and if
>> it's a smaller company, being diversified by company is a must, too. It's
>> unlikely that Amazon, Microsoft, or Google are going to go out of business
>> so if you have your services hosted with one of them, as long as you have
>> geographic diversity and you are using techniques to ensure high
>> availability, you would be engaging in best practices. If you are with a
>> smaller company, they can and do go out of business so even if you have
>> geographic diversity, it's not sufficient to mitigate against that risk.
>> 
>> Regards,
>> 
>> Clifford Ilkay
>> 
>> +1 647-778-8696
>> _____________________________________________________________
>> Manage your list subscriptions at http://lists.omnis-dev.com
>> Start a new message -> mailto:omnisdev-en at lists.omnis-dev.com 
> 
> _____________________________________________________________
> Manage your list subscriptions at http://lists.omnis-dev.com
> Start a new message -> mailto:omnisdev-en at lists.omnis-dev.com 




More information about the omnisdev-en mailing list