Show simple item record

dc.contributor.advisorPecht, Michaelen_US
dc.contributor.authorElerath, Jonen_US
dc.date.accessioned2007-06-22T05:32:36Z
dc.date.available2007-06-22T05:32:36Z
dc.date.issued2007-04-10
dc.identifier.urihttp://hdl.handle.net/1903/6733
dc.description.abstractToday's most reliable data storage systems are made of redundant arrays of inexpensive disks (RAID). The quantification of RAID system reliability is often based on models that omit critical hard disk drive failure modes, assume all failure and restoration rates are constant (exponential distributions), and assume the RAID group times to failure follow a homogeneous Poisson process (HPP). This paper presents a comprehensive reliability model that accounts for numerous failure causes for today's hard disk drives, allows proper representation of repair and restoration, and does not rely on the assumption of a HPP for the RAID group. The model does not assume hard disk drives have constant transition rates, but allows each hard disk drive "slot" in the RAID group to have its own set of distributions, closed form or user defined. Hard disk drive (HDD) failure distributions derived from field usage are presented, showing that failure distributions are commonly non-homogeneous, frequently having increasing hazard rates from time zero. Hard disks drive failure modes and causes are presented and used to develop a model that reflects not only complete failure, but also degraded conditions due to undetected, but corrupted data (latent defects). The model can represent user defined distributions for completion of "background scrubbing" to correct (remove) corrupted data. Sequential Monte Carlo simulation is used to determine the number of double disk failures expected as a function of time. RAID group can be any size up to 25. The results are presented as mean cumulative failure distributions for the RAID group. Results estimate the number of double disk failures can be as much as 5000 times greater than that predicted over 10 years when using the mean time to data loss method or Markov models when the characteristic lives of the input distributions is the same. Model results are compared to actual field data for two HDD families and two different RAID group sizes and show good correlation. Results show the rate of occurrence of failure for the RAID group may be increasing, decreasing or constant depending on the parameters used for the four input distributions.en_US
dc.format.extent1222186 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleRELIABILITY MODEL AND ASSESSMENT OF REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) INCORPORATING LATENT DEFECTS AND NON-HOMOGENEOUS POISSON PROCESS EVENTS.en_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentMechanical Engineeringen_US
dc.subject.pqcontrolledEngineering, Mechanicalen_US
dc.subject.pquncontrolledRAIDen_US
dc.subject.pquncontrolledreliabilityen_US
dc.subject.pquncontrolledmodelen_US
dc.subject.pquncontrolledNHPPen_US
dc.subject.pquncontrolledROCOFen_US
dc.subject.pquncontrolledstorageen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record