How Does Local Storage Offer High Availability
-
@Dashrender said:
I'd argue, in the case of virtualization, redundancy is often a major player in reliability, but not a sole requirement.
I'd argue that virtualization is a red herring. It's good and we should always have it, and high availability systems always have (doing back to the 1960s.) But it's not a factor here.
Redundancy is the most common means of getting reliability, but it is definitely not the sole means.
-
@Dashrender said:
I guess the term independent in RAID is what drives @scottalanmiller point the most. Redundant Array of Independent Drives = so at a drive only level, Independent, the drives are Redundant, and they are in an Array..
Wow - I've never looked at it this way before.
I think the "it must mean the data" perception probably comes from the fact that many people state that RAID is about improving reliability. But it isn't. That's a big reason that people choose it, but RAID is about increasing speed, capacity and/or reliability by using cheap Winchester drives rather than using some other drive type. It's one of the three.
So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.
If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.
-
@scottalanmiller said:
@Dashrender said:
I guess the term independent in RAID is what drives @scottalanmiller point the most. Redundant Array of Independent Drives = so at a drive only level, Independent, the drives are Redundant, and they are in an Array..
Wow - I've never looked at it this way before.
I think the "it must mean the data" perception probably comes from the fact that many people state that RAID is about improving reliability. But it isn't. That's a big reason that people choose it, but RAID is about increasing speed, capacity and/or reliability by using cheap Winchester drives rather than using some other drive type. It's one of the three.
So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.
If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.
That is definitely an interesting way to look at it.
-
@scottalanmiller said:
So when we look at it that way, RAID 0 has both redundancy (meaning more than one disk) AND redundancy (meaning something can fail and something else takes over) in two of three instances.
If we need a cache with increased speed over a single drive and we have a five disk RAID 0, then one fails, we just go down to a four disk RAID 0. Not as fast as before, but still faster than a single drive.
That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.
-
Is it possible to have a system failover to another system with zero actual failure?
of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.
-
@Dashrender said:
That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.
You are stuck on the idea that your array always carries stateful data. That's an incorrect assumption. RAID 0 arrays can be perfectly functional when degraded if they are not used for stateful data. So the redundancy remains fully useful.
-
@scottalanmiller said:
I'm confused, though. Sure, you improved reliability (I'm confused about the perception bit too) but why did this make you change your mindset versus a single reliable server? Since you didn't use a single reliable server for comparison, what changed the mindset?
I agree with Scott.
Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?
-
@Dashrender said:
of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.
There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.
-
@scottalanmiller said:
@Dashrender said:
of course I know the answer is yes, we've seen this in video where a laptop is watching a video that's streaming from one VM and that VM is moved/failed over to another server and the video either never stops... or has a small kinda pause, but no actual failure.
There can be zero pause, but the cost gets higher and higher to do that stuff. And there are other penalties. Like IBM, HP and Oracle all makes systems that will allow you to rip CPUs out of them while they are running. No blips. But they introduce some latency for all operations to make this possible.
Even the fact that this is possible is amazing to me
-
@Dashrender said:
Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?
And it doesn't mean that the old system was "bad", it could have just been normal.
Two HP Proliant DL380 servers in a cluster (if the clustering is good) is way more reliable than a single Proliant DL380.
But are two of them as reliable as a single HP Integrity SuperDome? Not likely. Those things never go down. Never. It's unheard of.
Now which is more cost effective? Buying 100 Proliants instead of one SuperDome, of course. Which is more powerful? One SuperDome.
-
@wirestyle22 said:
Even the fact that this is possible is amazing to me
Ever see an HP Integrity withstand an artillery round? There is a video of an HP Integrity doing that (easily ten years old) and another one of an HP 3PAR SAN taking one (more recent, actually the video was made by @HPEStorageGuy who is here in the community.) The HP 3PAR is basically HP's "mini computer" class of storage (same class as the HP Integrity is in servers).
In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.
-
@scottalanmiller said:
@Dashrender said:
Just to keep this going, @dafyre please tell us what the old failing system looked like. Was it 10 server each with internal disks? What was failing?
And it doesn't mean that the old system was "bad", it could have just been normal.
Two HP Proliant DL380 servers in a cluster (if the clustering is good) is way more reliable than a single Proliant DL380.
But are two of them as reliable as a single HP Integrity SuperDome? Not likely. Those things never go down. Never. It's unheard of.
Now which is more cost effective? Buying 100 Proliants instead of one SuperDome, of course. Which is more powerful? One SuperDome.
Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.
-
@scottalanmiller said:
@wirestyle22 said:
Even the fact that this is possible is amazing to me
Ever see an HP Integrity withstand an artillery round? There is a video of an HP Integrity doing that (easily ten years old) and another one of an HP 3PAR SAN taking one (more recent, actually the video was made by @HPEStorageGuy who is here in the community.) The HP 3PAR is basically HP's "mini computer" class of storage (same class as the HP Integrity is in servers).
In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.
That's wild. HP is doin' it right now.
-
@scottalanmiller said:
@Dashrender said:
That may be so, but who would care, because in your RAID 0 if you loose any drives, all of your data is gone, so being redundant is pointless in that case - the only think you care about with RAID 0 is the array for performance, not reliability.
You are stuck on the idea that your array always carries stateful data. That's an incorrect assumption. RAID 0 arrays can be perfectly functional when degraded if they are not used for stateful data. So the redundancy remains fully useful.
really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...
-
@wirestyle22 said:
Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.
So the HPE Proliant line is a micro-computer line based on the PC architecture. They are, just for clarify, the industry reference standard for commodity servers (generally considered the best in the business going back to the Compaq Proliant era in the mid-1990s.) They are very good, but they are "commodity". They are basically no different (more or less) than any PC you could build yourself with parts you order online (this is not totally true, there is a tonne of HPE unique engineering, they are tested like crazy, they have custom firmware and boards, they buy parts better than are on the open market, they add some proprietary stuff like the ILO, etc.) but more or less, these are PCs. The DL380 is the best selling server in the world, from any vendor, in any category.
The HPE Integrity line is a mini-computer line. They are not PCs. Most of them (not all) are built on the IA64 EPIC architecture and have RAS [Reliability, availability and serviceability] features that the PC architecture does not support. For example, hot swappable memory and CPUs are standard. Things like redundant controllers are common. The overall build and design is less about cost savings and more about never failing (or being fixed without going down.) It's a truly different class of device. They are also bigger devices, you don't put one in just to run your website. But you can fit more workloads on them, making it make more sense to invest in a single device that almost never fails.
-
@wirestyle22 said:
In both cases, they fired an artillery round into the chassis of a running HP system (bolted to a surface of course as the thing would have gone flying) and in both cases the system stayed up and running, didn't lose a ping.
That's wild. HP is doin' it right now.
HP has been doing this stuff for decades. This isn't new technology. You can get similar from IBM, Oracle and Fujitsu. Dell does not dabble in the mini and mainframe market.
From IBM this would be the i and z series (i is mini and z is main). From Oracle this is the M series. Fujitsu makes the M series for Oracle (they co-design it and Fujitsu makes it) and sells it themselves under their own branding that I don't know as it is not sold in America, you just buy the Oracle branded ones.
-
@scottalanmiller said:
@wirestyle22 said:
Can you clarify as to what you mean? What reason do they attribute to a higher uptime than a ProLiant if they are both configured correctly? Honest question.
So the HPE Proliant line is a micro-computer line based on the PC architecture. They are, just for clarify, the industry reference standard for commodity servers (generally considered the best in the business going back to the Compaq Proliant era in the mid-1990s.) They are very good, but they are "commodity". They are basically no different (more or less) than any PC you could build yourself with parts you order online (this is not totally true, there is a tonne of HPE unique engineering, they are tested like crazy, they have custom firmware and boards, they buy parts better than are on the open market, they add some proprietary stuff like the ILO, etc.) but more or less, these are PCs. The DL380 is the best selling server in the world, from any vendor, in any category.
The HPE Integrity line is a mini-computer line. They are not PCs. Most of them (not all) are built on the IA64 EPIC architecture and have RAS [Reliability, availability and serviceability] features that the PC architecture does not support. For example, hot swappable memory and CPUs are standard. Things like redundant controllers are common. The overall build and design is less about cost savings and more about never failing (or being fixed without going down.) It's a truly different class of device. They are also bigger devices, you don't put one in just to run your website. But you can fit more workloads on them, making it make more sense to invest in a single device that almost never fails.
Interesting. Thank you for the information.
-
@Dashrender said:
really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...
Oh there will be a blip, the array has to re-initialize. It's not a transparent fail like a RAID 1 would be. But it can be automatic and very, very fast. Few people do this, but you can, no problem. You could probably get downtime to a couple of seconds.
-
@scottalanmiller said:
@Dashrender said:
really? the array will stay active in a degraded state? I had no idea - I figured the RAID controller would basically just kill the array once a drive was lost. yep me and assuming = mistake...
Oh there will be a blip, the array has to re-initialize. It's not a transparent fail like a RAID 1 would be. But it can be automatic and very, very fast. Few people do this, but you can, no problem. You could probably get downtime to a couple of seconds.
Interesting, I had no idea - what would it be good for?
-
@Dashrender said:
Interesting, I had no idea - what would it be good for?
Primarily caching or cache-like databases.
If you were doing a large proxy cache, for example, this would be a great way to handle it. Don't invest too much money where it isn't needed, and in the case of a drive loss, the delay as you flush the cache and reload isn't too tragic.
The need for this is rapidly going away because of SSDs. So I'm not about to run out and do this today, mind you. But five years ago or, far more likely, twenty years ago, this would have been absolutely viable and completely obvious if you had the use case for it. Today, meh, who needs RAID 0 for speed?
Or any kind of read only caching system, even on a database.
Or a short term use database where the data isn't useful after, say, a day. On Wall St. we had a lot of systems that took trillions of transactions per day and then, at the end of the day.... dropped them. With a RAID 0, you might just accept losing a few hours of data, recreate and be underway again because the streaming writes are more important than any potential reliability.