Hard disks are an indispensable part of personal computers (PCs), both laptops and desktop computers. When we buy a computer, we check out the capacity (for example, 1 TB or 500 GB) and type of its hard disk. Solid-state drive (SSD) and hard disk drive (HDD) are two mainstream hard disk types today. In cloud computing, disks are also an indispensable part. However, unlike the disks in common PCs, the disks in cloud computing are not physical entities. Cloud users concern themselves only with the disk performance and capacity, rather than their physical attributes. However, as a cloud computing engineer, you need to know not only the end users' concerns but also how to convert physical hard disks to cloud disks.

Cloud disk specifications

Storage architecture in virtualization

In this architecture, physical disks are located at the bottom layer and cloud disks the top layer. A series of operations such as logical partitioning and file system formatting are performed between the two layers. You need to understand these operations. In addition, you need to understand the differences between virtualized and non-virtualized storage. First, let's see the mainstream physical disk types.

Mainstream Physical Disk Types

HDD

The world's first disk storage system IBM 305 RAMAC was introduced by IBM in 1956. It weighed about a ton and stored five million six-bit characters (5 MB) on a stack of fifty 24-inch disks. In 1973, IBM developed a new type of HDD, IBM 3340. This type of HDD had several coaxial metal platters coated with magnetic material. They were sealed in a box together with removable heads that read changes in magnetic signals from the rotating platters. IBM 3340 was introduced with an advanced disk technology known as "Winchester" and is considered as the predecessor of all of today's HDDs. IBM 3340 consists of two 30 MB storage units, including 30 MB of fixed storage and 30 MB of removable storage. The famous Winchester rifle also had a caliber and charge of 30-30. This type of disk drive was also called "Winchester disk drive." In 1980, Seagate produced the first Winchester disk drive used for PCs, which was approximately the size of a floppy disk drive at the time: 5 MB. Figure shows a Winchester disk drive.

Winchester disk drive

The read speed of the disk drive was limited by the rotation speed. Increasing the rotation speed could increase the data access speed. However, the platters were paired with magnetic heads and in contact with each other. A high rotation speed may easily damage the disk drive. Therefore, technicians envisioned to make the heads "fly" above the platters. High-speed rotation of the platters produced a flowing wind. Therefore, as long as the heads were properly shaped, they could fly above the platter surfaces like an airplane. In this way, the platters could rotate at a high speed without friction. This was the Winchester technology.

The magnetic heads of a Winchester disk drive are arranged on an actuator arm that moves radially along the platters and do not make contact with the platters. When the heads move in relation to the platters, the heads could induce the magnetic poles on the platter surfaces, and record or change the status of the magnetic poles to complete data read and write. Because the heads moved at a high speed relative to the platters and they were close to each other, even a particle of dust may damage the disk. Therefore, the disk needs to be encapsulated in a sealed box to maintain a clean internal environment and ensure that the heads and the platters can work efficiently and reliably.

A modern PC typically contains storage media such as an HDD, a CD or DVD-ROM drive, a tape drive, or a solid state disk (SSD). HDDs are considered an irreplaceable and important storage device owing to its large capacity, low price, high read speed, and high reliability.

When we talk about a hard disk, we are usually referring to an HDD. It consists of multiple disk platters, a spindle assembly, a floating head assembly, a head actuator mechanism, a front drive control circuit, and interfaces,

HDD assembly

Platters and spindle. The platters and spindle are two closely connected parts. The platters are circular plates coated with a layer of magnetic material for recording data. The spindle is driven by the spindle motor, which drives the platters to rotate at a high speed.

Floating head assembly. The floating head assembly consists of read/write heads, an actuator arm, and an actuator axis. When the platters rotate at a high speed, the actuator arm drives the read/write heads at the front end to move in the vertical direction of the rotated platters with the actuator axis as center point. The heads induce the magnetic signals on the platters to read or change the magnetic properties of the magnetic coating and to write information.

Actuator. It consists of a head actuator, a motor, and a shockproof mechanism. It is used to actuate and precisely locate the heads, so that the heads can read and write data on a specified track at a high speed and with high accuracy.

Actuator PCB. It is an amplifier circuit sealed in a shielding cavity. It is used to control the induction signals of the heads, adjust the speed of the spindle motor and drive, and locate the heads.

Interfaces. Typically, they include power supply interfaces and data transfer interfaces. Currently, the mainstream interface types are SATA and SAS.

Platters, metal plates coated with magnetic material, are used to store data in an HDD. Platter surfaces are divided as circles of tracks. When the platters rotate rapidly driven by the motor, the heads arranged on the platter surfaces are precisely controlled, and data is read and written along the tracks. When the system writes data to an HDD, a current that varies with the data content is generated in the heads. The current generates a magnetic field, which changes the status of the magnetic material on the platter surfaces. The status can be maintained permanently after the magnetic field disappears, which means the data is saved. When the system reads data from an HDD, the heads pass through the specified area of the platters. The magnetic field on the platter surfaces causes the heads to generate an induced current or change the coil impedance. The change is captured and processed to restore the original

written data.

Currently, SATA and SAS hard drives are commonly used and distinguished by interface type.

The following describes the two types of hard drives.

SATA hard drive

You need to know the advanced technology attachment (ATA) interface first. ATA interfaces are integrated drive electronics (IDE) interfaces. ATA interfaces have been developed since the 1980s. Due to low price and good compatibility, ATA interfaces have become the mainstream storage interfaces in the market. However, with the rapid development of technologies, its low speed hinders its application in modern computer systems.

SATA, that is, serial ATA, has basically replaced all parallel ATA interfaces. As its name implies, SATA transfers data in serial mode. One of its distinctive features is that SATA is faster than ATA. Currently, the mainstream SATA 3.0 can have a transfer rate of 6.0 Gbit/s, which is several times higher than that of parallel ATA.

SATA interface

SATA uses independent data interfaces and signal interfaces to transfer data. Parallel ATA (PATA) uses a 16-bit data bus for data transfer and needs to transfer many additional support and control signals. Due to the limitation of the manufacturing process, PATA is susceptible to noise and its signal voltage is 5 V. In contrast, SATA uses embedded clock signals and has a stronger error correction capability. Signal voltage of SATA is 0.5 V.

From the perspective of bus structure, SATA uses a single channel to perform point-to-point transfer. Data is transferred by bit in serial mode, and checksum signal bit is embedded in the data. This transfer method ensures the transfer rate and improves the transfer reliability.

SATA uses a point-to-point architecture and supports hot swap. The SATA interface uses seven data pins and fifteen power pins. Compared with the parallel ATA interface, the SATA interface uses thinner cables, which are easy to bend. An SATA cable can have a maximum length of one meter, facilitating chassis cooling. Figure 4-6 shows the SATA hard drive.

SATA interface

SAS hard drive

Serial attached SCSI (SAS), that is, small computer system interface. Similar to SATA, SAS is developed based the parallel SCSI technology.

SCSI is widely used as enterprise storage owing to its high performance. The SCSI interface provides three types of connectors: 50-pin, 68-pin, and 80-pin. After decades of development, the mainstream SCSI technology Ultra 320 SCSI supports a transfer rate of 320 MB/s.

SAS is a branch of the SCSI technology. Similar to SATA, SAS uses serial transfer to achieve higher performance. Currently, the mainstream SAS transfer rate is 6 Gbit/s. In addition, with serial technology, thin and long cables can be used to achieve a longer connection distance and improve an anti-interference capability. Figure shows the front and rear views of a SAS interface.

SAS interface

SAS is backward compatible with SATA. SAS controllers can be connected to SATA hard drives, which deliver low cost and excellent flexibility to enterprises. SAS uses a point-to-point architecture. Similar to SATA, SAS does not require termination signals and does not have synchronization issues. SAS supports up to 65,536 devices, whereas SCSI supports only 8 or 16 devices.

NL-SAS hard drive

Near Line SAS (NL-SAS) is a type of hard drive between SATA and SAS. Compared with a SATA hard drive, a SAS hard drive has a higher read/write speed, higher price, and smaller capacity. Therefore, an NL-SAS hard drive is developed, which consists of SAS interfaces and SATA platters. An HDD consists of SAS interfaces and SATA platters.

NL-SAS

The input/output operations per second (IOPS) of an NL-SAS hard drive is approximately half of that of a SAS hard drive that has an IOPS of 15000. However, the NL-SAS hard drive outperforms the SATA hard drive. In addition, the NL-SAS hard drive has the capacity and price of the SATA hard drive and the reliability of the SAS hard drive. Therefore, the NL-SAS hard drive is popular in the market.

SSD

The world's first SSD appeared in 1989. SSDs were extremely expensive at that time, but its performance was far lower than that of a common HDD. Therefore, it was not widely used. However, due to its unique features such as shock resistance, low noise, and low power consumption, it has been used extensively in some special fields such as medical and military.

With the maturing of technology, improvement of manufacturing process, and cost reduction, SSD gained increasing popularity in the consumer field. In 2006, Samsung launched its first laptop with a 32 GB SSD. At the beginning of 2007, SanDisk launched two 32 GB SSD products. In 2011, a severe flood occurred in Thailand. Many HDD manufacturers, such as Western Digital and Seagate, were forced to close their factories in Thailand. As a result, the shipments of HDDs dropped sharply and prices increased sharply. This greatly stimulated the demand for SSDs, bringing the golden age of SSDs. Today, the capacity, cost, speed, and service life of SSDs have been greatly improved compared with the original products. The capacity of common SSDs in the market has reached 128 GB and 256 GB. The price per GB is only a fraction of the price at that time, making SSDs affordable to common consumers. SSDs are one of the most essential storage devices in the ultra-thin laptop and tablet fields. It is foreseeable that SSDs will receive even greater attention in the next few years.

An SSD consists of controller and memory chips. Simply put, an SSD is a hard drive composed of a solid state electronic chip array. The interface specifications, definitions, functions, and usage of SSDs are the same as those of common HDDs. The appearance and dimensions of SSDs are the same as those of common HDDs, including 3.5-inch, 2.5-inch, and 1.8-inch. Because an SSD does not have a rotation structure like a common HDD, it has superior shock resistance and wide operating temperature range (–45°C to +85°C). Therefore, it is widely used in fields like the military, vehicle-mounted, industrial control, video surveillance, network surveillance, network terminal, electric power, medical care, aviation, and navigation equipment. Traditional HDDs are disk drives and data is stored in disk sectors. The common storage medium of an SSD is flash memory. SSDs are one of the major trends of hard drives in the future. Figure 4-9 shows the internal structure of an SSD.

Internal structure of an SSD

An SSD consists of a flash controller and memory chips. The flash controller controls the coordination of the data read/write process and the memory chips are responsible for storing data. Memory chips are classified as two types by medium. A most common type is to use a flash memory chip as the storage medium, and the other type is to use a dynamic random access memory (DRAM) chip as the storage medium.

Flash-based SSDs
The most common SSDs use a flash memory chip as the storage medium. Flash memory chips can be manufactured into various electronic products, such as SSDs, memory cards, and USB drives. These devices are small in size and easy to use. The SSDs discussed in this section are flash-based SSDs.
DRAM-based SSDs
This type of SSDs uses DRAM as the storage medium. This type of storage medium has superior performance and long service life and is currently widely used in memory. However, the DRAM stores data only when it is powered on. Once it is powered off, information stored in the DRAM will be lost. Therefore, the DRAM requires extra power supply for protection. Currently, such SSDs are expensive and used in a few fields.

SSDs have the following advantages over traditional HDDs:

High read speed
As an SSD uses flash memory chips as the storage medium and has no disk and motor structure, the seek time is saved when data is read, and the speed advantage can be particularly reflected when data is read randomly. In addition, the SSD performance is not affected by disk fragments.
Superior shock resistance
There are no moving mechanical parts inside an SSD, which eliminates the possibility of a mechanical fault and enables SSDs to tolerate collisions, shocks, and vibrations. SSDs run properly even when they are moving fast or tilted severely, and minimize data loss when its laptop drops or collides into another object.
No noise
There is no mechanical motor inside an SSD, which means it is truly noise-free and silent.
Wider operating temperature range
A typical HDD can work only within an operating temperature range of 5°C to 55°C. Most SSDs can work within an operating temperature range of –10°C to 70°C, and some industrial-grade SSDs can work within an operating temperature range of –40°C to 85°C or even larger.

However, SSDs also have two disadvantages, and therefore cannot be used as substitutes for HDDs. One disadvantage is the high cost. Currently, the price per GB of an SSD is about 10 times that of a traditional HDD. Large-capacity SSDs are still rare in the market. Therefore, for applications that are insensitive to the data read/write speeds, HDDs are still the first choice. The other disadvantage is limited service life. Typically, a high performance flash memory can be erased for 10,000 to 100,000 times, and a common consumer-grade flash memory can be erased for 3,000 to 30,000 times. With continuous improvement of the manufacturing process, a smaller size of the storage unit further reduces the maximum erase times of flash memory. Typically, the controller of the SSD is able to balance chip loss, so that the storage chip can be consumed more evenly, thereby improving the service life.

SSDs, as storage media with higher read/write speeds than traditional HDDs, have received widespread attention. Unlike traditional HDDs, SSDs do not have any mechanical components. Therefore, SSDs improve performance quickly. In addition, SSDs have distinctive features such as shock resistance, small size, no noise, and low cooling requirements. Many people hope that SSDs can replace traditional HDDs and become a new generation of storage devices. However, the cost of SSDs is far higher than that of traditional HDDs and the performance of HDDs can meet a large part of requirements. Therefore, the traditional HDDs and the SSDs will coexist and develop together for a long time.

Table Comparison of different types of hard drives

Centralized Storage and Distributed Storage

Centralized Storage

Centralized storage means that all storage resources are centrally deployed and are provisioned over a unified interface. With centralized storage, all physical disks are centrally deployed in disk enclosures and are used to provide storage services externally through the controller. Centralized storage typically refers to disk arrays.

Based on the technical architecture, centralized storage can be categorized as SAN and NAS. SAN can be further categorized as FC SAN, IP SAN, and FCoE SAN. Currently, FC SAN and IP SAN technologies are mature, and FCoE SAN still has a long way to go to reach maturity. A disk array combines multiple physical disks into a single logical unit. Each disk array consists of one controller enclosure and multiple disk enclosures. This architecture delivers an intelligent storage space featuring high availability, high performance, and large capacity.

SAN Storage

The storage area network (SAN), with blocks as basic data access unit, is a dedicated high-speed storage network that is independent of the service network system. This type of network is implemented in the form of fibre channel storage area network (FC SAN), IP storage area network (IP SAN), and serial attached SCSI storage area network (SAS SAN). Different implementations transfer data, commands, and status between servers and storage devices using different communication protocols and connections.

Direct attached storage (DAS) is the most widely used storage system before SAN is introduced. DAS has been used for nearly forty years. Early data centers used disk arrays to expand storage capabilities in DAS mode. Storage devices of each server serve only a single application and provide an isolated storage environment. However, these isolated storage devices are difficult to share and manage. With the increase of user data, the disadvantages of this expansion mode in terms of expansion and disaster recovery are increasingly obvious. SAN resolves these issues. SAN connects these isolated storage islands through a high-speed network. These storage devices can be shared by multiple servers through the network, delivering remote data backup and excellent scalability. All these factors make this storage

technology develop rapidly.

As an emerging storage solution, SAN accelerates data transfer, delivers greater flexibility, and reduces network complexity, alleviating the impact of transfer bottlenecks on the system and improving the efficiency of remote disaster recovery.

A SAN is a network architecture that consists of storage devices and system components, including servers that need to use storage resources, host bus adapters (HBAs) that connect storage devices, and FC switches.

On a SAN, all communication related to data storage is implemented on an independent network that is isolated from the application network, which means that transferring data on the SAN does not affect the data network of the existing application system. Therefore, SAN delivers higher I/O capabilities of the entire network without reducing the data network efficiency of the original application system, redundant links to the storage system, and support for the high availability (HA) cluster system.

With the development of SAN technologies, three SAN types are made available: FC SAN, IP SAN, and SAS SAN. The following describes FC SAN and IP SAN.

In FC SAN, two network interface adapters are configured on the storage server. One is a common network interface adapter (NIC) that connects to the service IP network and the server interacts with the client through this NIC. The other is an HBA that connects to FC SAN and the server communicates with the storage devices on the FC SAN through this adapter. Figure 4-10 shows the FC SAN architecture.

FC SAN architecture

IP SAN has become a popular network storage technology in recent years. The early SANs are all FC SANs, where data is transferred in the fibre channel as a block-based access unit. Due to the incompatibility between FC protocol and the IP protocol, customers who want to implement the FC SAN have to purchase its devices and components. Its high price and complicated configuration impede large number of small and medium-sized users' demands for it. Therefore, FC SAN is mainly used for mid and high-end storage that requires high performance, redundancy, and availability. To popularize SANs and leverage the advantages of SAN architecture, technicians consider to combine SANs with prevailing and affordable IP networks. Therefore, the IP SAN that uses the existing IP network architecture is introduced. The IP SAN is a combination of the standard TCP/IP protocol with the SCSI instruction set and implements block-level data storage based on the IP network.

The difference between IP SAN and FC SAN lies in the transfer protocol and medium. Common IP SAN protocols include iSCSI, FCIP, and iFCP. iSCSI is the fastest growing protocol standard. In most cases, IP SAN refers to iSCSI-based SAN.

An iSCSI initiator (server) and an iSCSI target (storage device) form a SAN. Figure shows the IP SAN architecture.

IP SAN architecture

IP SAN has the following advantages over FC SAN:

Standard access. IP SANs require only common Ethernet cards and switches for connection between storage devices and servers instead of dedicated HBAs or fibre channel switches.
Long transfer distance. IP SANs are available wherever IP networks exist. Currently, IP networks are the most widely used networks in the world. Good maintainability. Most network maintenance personnel have good knowledge of IP networks, therefore, IP SANs are more acceptable than FC SANs. In addition, IP SAN can be maintained using the developed IP network maintenance tools.
The bandwidth can be easily expanded. With the rapid development of 10 GB Ethernet, it is inevitable that the bandwidth of a single port on the Ethernet-based iSCSI IP SAN can be expanded to 10 GB.

These advantages reduce the total cost of ownership (TCO). For example, to build a storage system, the TCO includes purchase of disk arrays and access devices (HBAs and switches), personnel training, routine maintenance, capacity expansion, and disaster recovery capacity expansion. With the wide application of IP networks, IP SANs help customers significantly cut down the purchase cost of access devices, maintenance cost, and capacity and network expansion costs.

Table lists the comparison between IP SAN and FC SAN.

NAS

Network attached storage (NAS) is a technology that integrates distributed and independent data into a large-scale and centralized data center for access by different hosts and application servers. NAS is a file-level computer data storage connected to a computer network providing data access to heterogeneous group of clients. An NAS server contains storage devices, such as disk arrays, CD/DVD drives, tape drives, or portable storage media. An NAS server delivers an embedded OS to share files across platforms.

The inception of NAS is closely related to the development of network. After the inception of the ARPANET, modern network technologies develop rapidly, and users have increasing demand for sharing data over network. However, sharing files on the network is confronted with many issues such as cross-platform access and data security. Figure shows network sharing in the early stage.

Network sharing in the early stage

To resolve this issue, technicians use a dedicated computer to store a large number of shared files. The computer is connected to an existing network and allows all users over the network to share the storage space. In this way, the early UNIX network environment evolved into a way of sharing data depending on file servers.

Storing shared data using dedicated servers with large storage space must ensure data security and reliability. A single server needs to process access requests from multiple servers. Therefore, I/O performance of the file server needs to be optimized. In addition, the extra overhead of the OS is unnecessary. Therefore, the server used in this mode should run a thin OS with only the I/O function and be connected to an existing network. Users over the network can access the files on this special server as they access the files on their own workstations, meeting the demands for sharing files over the network of all users. Figure shows the TCP/IP network in the early UNIX environment.

TCP/IP network sharing

With the development of networks, there are increasing demands for sharing data among computers over the network. People want system and users in a network to be connected to a specific file system to obtain access to remote files from a shared computer as they access files in a local OS. This way, they can use a virtual file set with files stored in a virtual location instead of a local computer. This storage mode develops towards integrating with traditional client or server environment that supports Windows OS. This involves Windows network capabilities, proprietary protocols, and UNIX-based database servers. In the initial development phase, the Windows network consists of a network file server that is still in use and uses a dedicated network system protocol. The following shows the early Windows file server.

Windows file server

The inception of file servers drives centralized data storage, leading to sharp growth of centralized data and service volume. Therefore, NAS products dedicated to file sharing services have been developed.

NAS typically has its own nodes on a LAN and does not require the intervention of application servers. NAS allows users to directly access file data over the network. In this configuration, NAS centrally manages and processes all shared files on the network and releases loads from application or enterprise servers, reducing the TCO and maximizing customers' ROI. Simply speaking, a NAS is a device that is connected to the network with file storage function. Therefore, NAS is also called a network file storage device. It is a dedicated file data storage server. It delivers centralized file storage and management and separates storage devices from servers, releasing bandwidth, improving performance, maximizing customers' ROI, and reducing the TCO.

Essentially, NAS is a storage device rather than a server. NAS is not a compact file server. It delivers more distinctive features than other servers. Servers process services and storage devices store data. In a complete application environment, the two types of devices must be combined.

The advantage of NAS is that it can deliver file storage services in a fast and cost-effective manner using existing resources in the data center. The current solution is compatible between UNIX, Linux, and Windows OSs, and can be easily connected to users' TCP/IP networks. The following shows the NAS system.

NAS system

NAS should be able to store and back up large volumes of data, and deliver stable and efficient data transfer services. These requirements cannot be fulfilled only by hardware. Therefore, NAS is software-dependent. The NAS software can be divided as five modules: OS, volume manager, file system, file sharing over a network, and web management, as shown in Figure .

NAS architecture

The NAS device can read and write the common internet file system (CIFS) or the network file system (NFS), and can also read and write the two systems at the same time.

CIFS is a public and open file system developed by Microsoft Server Message Block (SMB). SMB is a file sharing protocol set by Microsoft based on NetBIOS. Users can access data on a remote computer through CIFS. In addition, CIFS prevents read-write conflicts and write-write conflicts to support multi-user access.

To enable Windows and Unix computers to share resources and enable Windows users to use resources on Unix computers as if they use Windows NT servers, the best way is to install software that supports the SMB/CIFS protocol on Unix computers. When all mainstream OSs support CIFS, communications between computers are convenient. Samba helps Windows and Unix users achieve this desire. A CIFS server is set up to share resources with the target computers. The target computers mount the shared resources on the CIFS server to their own OSs through simple shared mapping and use the resources as local file system resources. Through a simple mapping, the computer users obtain all the required shared resources from the CIFS server.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. It is designed for use among different operating systems, therefore, its communication protocol is independent from hosts and operating systems. When users want to use remote files, they only need to use the mount command to mount the remote file system under their own local file systems. There is no difference between using remote files and local files.

The platform-independent file sharing mechanism of NFS is implemented based on the XDR/RPC protocol.

External data representation (EDR) can convert the data format. Typically, EDR converts data into a unified standard data format to ensure data consistency between different platforms, operating systems, and programming languages.

The remote procedure call (RPC) requests services from remote computers. Users send requests to remote computers over the network. The remote computers process the requests.

NFS uses the virtual file system (VFS) mechanism to send users' remote data access requests to servers through unified file access protocols and remote procedure calls. NFS has evolved. Since its inception, it has been updated in four versions and has been ported to almost all mainstream operating systems, becoming the de facto standard for distributed file systems. NFS is introduced in an era when the network status is unstable. It is initially transmitted based on UDP rather than TCP with higher reliability. UDP works well on a LAN with better reliability, however, it is incompetent on a WAN with poor reliability, such as the Internet. Currently, TCP enhancements make NFS using TCP deliver high reliability and good performance.

Table lists the comparison between CIFS and NFS.

RAID

In a centralized storage system, all disks are put into disk enclosures and uniformly managed by the controller enclosure. The system supports dynamic storage capacity expansion and improves fault tolerance as well as read and write performance. Such a system typically uses a technology called Redundant Arrays of Independent Disks (RAID).

There are seven basic RAID levels: RAID 0 to RAID 6. There are also common combinations of basic RAID levels, such as RAID 10 (combination of RAID 1 with RAID 0) and RAID 50 (combination of RAID 5 with RAID 0). Different RAID levels represent different storage performance, data security, and costs. This section describes only RAID 0, RAID 1, RAID 5, and RAID 6.

RAID 0

RAID 0, also known as stripping, combines multiple physical disks into a logical disk, which delivers the highest storage performance among all RAID levels. Although RAID 0 delivers the highest speed, it has no redundancy and does not support parallel I/O. When data is stored, data is segmented based on the number of disks that build the RAID 0 volume, and the data is written into the disks in parallel. Therefore, RAID 0 is the fastest among all levels. However, RAID 0 has no redundancy. If a physical disk fails, all data will be lost.

Theoretically, the total disk performance equals the performance of a single disk multiplied by the number of disks. However, due to the bus I/O bottleneck and other factors, the RAID performance is not a multiple of the number of disks. That is, if the performance of one disk is 50 MB/s, the RAID 0 performance of two disks is about 96 MB/s, and that of three disks may be 130 MB/s instead of 150 MB/s, therefore, the RAID 0 performance of two disks is significantly improved.

Figure shows RAID 0. There are two disks, Disk 1 and Disk 2. RAID 0 divides the data (D1, D2...) into two parts and stores the two parts at the same time. D1 and D2 are stored in Disk 1 and Disk 2, respectively. After D1 is stored, D3 is stored in Disk 1. Other data blocks are stored in the same way. In this way, two disks can be considered as a large disk, and I/O is simultaneously performed on both disks. However, if a data block is damaged, the entire data will be lost.

RAID 0

RAID 0 delivers superior read and write performance but has no data redundancy. It is applicable to applications that have fault tolerance for data access and applications that can regenerate data through other methods, such as web applications and streaming media.

RAID 1

RAID 1, also known as mirror or mirroring, is designed to maximize the availability and repairability of user data. RAID 1 automatically copies all data written to one disk to the other disk in a RAID group.

RAID 1 writes the same data to the mirror disk while storing the data on the source disk. When the source disk fails, the mirror disk takes over services from the source disk. RAID 1 delivers the best data security among all RAID levels because the mirror disk is used for data backup. However, no matter how many disks are used, the available storage space is only the capacity of a single disk. Therefore, RAID 1 delivers the lowest disk usage among all RAID levels.

RAID 1

Figure shows RAID 1. There are two disks, Disk 1 and Disk 2. RAID 1 stores the data (D1, D2...) in the source disk (Disk 1), and then stores the data again in Disk 2 for data backup.

RAID 1 is the most expensive storage unit among all RAID levels. However, it delivers the highest data security and availability. RAID 1 is applicable to online transaction processing (OLTP) applications with intensive read operations and other applications that require high read/write performance and reliability, for example, email, operating system, application file, and random access environment.

RAID 5

RAID 5 is the most common RAID level in advanced RAID systems and is widely used for its superior performance and data redundancy balance design. It is short for independent data disks with distributed parity. RAID 5 uses parity for parity check and error correction.

Figure shows the data storage mode of RAID 5. In the figure, three disks are used as an example. P is the check value of data, and D is the actual data. RAID 5 does not back up the stored data but stores corresponding parity data on different member disks. When the data on a member disk is corrupted, the corrupted data can be recovered based on the data on other member disks. Therefore, RAID 5 is a storage solution that balances storage performance, data security, and storage costs.

RAID 5

RAID-5 is a widely used data protection solution that delivers optimal overall performance although it has some capacity loss. It applies to I/O-intensive applications with a high read/write ratio, such as OLTP applications.

RAID 6

RAID 6 is a RAID mode designed to further enhance data protection. Compared with RAID 5, RAID 6 has an independent parity block. In this way, each data block has two parity blocks (one is hierarchical check and the other is overall check). Therefore, RAID 6 delivers superior data redundancy performance. However, two parity check mechanisms slow down data writes, the RAID controller design is more complicated, and two parity areas reduce available storage space.

Common RAID 6 technologies include PQ and DP. The two technologies use different methods for obtaining verification information, but both technologies allow data loss on two disks in an array.

RAID 6

Data security of RAID 6 is higher than that of RAID 5. Even if two disks in an array fail, the array can still work and recover the data on the faulty disks. However, the design of the controller is more complicated, the write speed is lower, and it takes longer time to calculate check information and verify data correctness. When write operations are performed on each data block, two independent check calculations need to be performed, resulting heavier system load. In addition, the disk usage is lower, and the configuration is more complicated. Therefore, RAID 6 is applicable to the environment that requires high data accuracy and integrity.

Distributed Storage and Replication

Distributed storage is quite different from the conventional storage. It virtualizes all available space distributed across different hosts into a single virtual device. The data stored in this virtual storage is also distributed all over the storage network.

Distributed storage

As shown in Figure, the storage resources in a distributed storage system come from commodity x86 servers rather than dedicated storage devices. A distributed storage system has no controller or disk enclosures. The clients delivered by the distributed storage system are responsible for all of the following: identify and manage hard drives; establish routes; and execute I/Os.

The distributed storage client mode has both advantages and disadvantages.

In terms of capacity expansion, any x86 server with a client installed can be a part of the distributed system. Therefore, this mode delivers great scalability.

However, in addition to the applications running on the server, the client software installed on the server also consumes compute resources. When you plan a distributed storage system, you must reserve certain amounts of compute resources on servers you intend to add to this system. Therefore, this mode has certain requirements on the hardware resources of the server. In a traditional centralized storage system, data is read and written by controllers. The number of controllers is limited. In a distributed storage system, any servers with clients installed can read and write data. This improves the I/O speed to some extent because the distributed storage system does not have the controller bottleneck that exists in a centralized storage system. The paths for reading and writing data need to be calculated repeatedly. An excessively large number of clients adds complexity to path calculation. This is why sometimes performance cannot be linearly improved simply by adding more clients.

To ensure high availability and security of data, the centralized storage system uses the RAID technology. RAID can be implemented by hardware and software. All hard drives in the same RAID array, regardless of software or hardware implementation, must reside on the same server (hardware RAID requires a unified RAID card, and software RAID requires a unified OS). Because the hard drives of a distributed storage system are distributed across different servers, the RAID mechanism simply cannot be used in such a system. Therefore, a replication mechanism is typically used in distributed storage systems to ensure high data reliability.

The replication mechanism keeps identical copies of data on different servers. The failure of a single server will not cause data loss. The distributed storage system combines local disks of all servers into several resource pools. Based on the resource pools, the distributed storage system delivers interfaces for creating and deleting application volumes and snapshots, and delivers volume device functions for upper-layer software, as shown in Figure.

Distributed storage architecture

In a distributed storage system, each hard drive is divided into several partitions. Each partition belongs to only one resource pool. A partition functions as a data copy. The system ensures that multiple data copies are distributed on different servers (when the number of servers is greater than the number of data copies) and data consistency between multiple data copies. Then, data in the partitions is stored as key/value pairs.

The distributed storage system delivers volumes for the upper layer, which is easy to use. The system ensures that the number of active partitions is the same as that of standby partitions on each hard drive to avoid hot spots. All hard drives can be used as hot spares for resource pools. A resource pool supports up to hundreds of hard drives.

Distributed storage modules

Storage interface layer: delivers volumes for OSs and databases over the Small Computer System Interface (SCSI).

Storage service layer: delivers various advanced storage features, such as snapshot, linked cloning, thin provisioning, distributed cache, and backup and disaster recovery (DR).

Storage engine layer: delivers basic storage functions, including management status control, distributed data routing, strong-consistency replication, cluster self-recovery, and parallel data rebuilding.

Storage management layer: delivers the operation and maintenance (O&M) functions, such as software installation, automated configuration, online upgrade, alarm reporting, monitoring, and logging, and also delivers a portal for user operations.

When writing data, applications can use only the storage pool delivered by the distributed storage system. After the write requests of applications are sent to the storage pool, the data is copied to a specified number of copies (the number is manually set by users), and the write operation is delivered to different hard drives. The write operation is complete only after all messages indicating the write operation completion are returned.

When applications read data, they read data from the active copies rather than all copies. When the active copies are unavailable, the applications read data from other copies. Common distributed storage products include Ceph (open source), HDFS, FusionStorage (Huawei), and vSAN (VMware).

Distributed storage has the following advantages:

Excellent performance

The distributed storage system uses an innovative architecture to organize SATA/SAS HDDs into a storage pool like SAN delivering a higher I/O than SAN devices and optimal performance. In a distributed storage system, SSDs can replace HDDs as high-speed storage devices, and InfiniBand networks can replace GE/10GE networks to deliver higher bandwidth, fulfilling the high performance requirements for real-time processing of large volumes of data.

The distributed storage system uses stateless software engines deployed on each node, eliminating the performance bottleneck of centralized engines. Moreover, these distributed engines deployed on standalone servers consume much fewer CPU resources and deliver higher IOPS than centrally-deployed engines do.

The system integrates computing and storage and evenly distributes cache and bandwidth to each server node. Each disk on the distributed storage system servers uses independent I/O bandwidths, preventing a large number of disks from competing for limited bandwidths between computing devices and storage devices in an independent storage system.

The distributed storage system uses some memory of each server for read cache and non-volatile dual in-line memory module (NVDIMM) for write cache. Caches are evenly distributed to all nodes. The total cache size on all servers is greatly larger than that delivered by external storage devices. Even if large-capacity and low-cost SATA hard drives are used, the distributed storage system can still deliver high I/O performance, improving the overall performance by 1 to 3 folds and providing larger effective

capacity.

Global load balancing

The implementation mechanism of the distributed storage system ensures that I/O operations of upper-layer applications are evenly distributed on different hard drives of different servers, preventing partial hot spots and implementing global load balancing. The system automatically disperses data blocks on the hard disks of various servers. Data that is frequently or seldom accessed is evenly distributed on the servers, avoiding hot spots. FusionStorage employs the data fragment distribution algorithm to ensure that active and standby copies are evenly distributed to different hard disks of the servers. In this way, each hard disk contains the same number of active and standby copies. When the system capacity is expanded or reduced due to a node failure, the data reconstruction algorithm helps ensure load balancing among all nodes after system reconstruction.

Distributed SSD storage

The distributed storage system uses the SSD storage system that supports high-performance applications to deliver higher read/write performance than the traditional HDDs (SATA/SAS hard drives). PCIe SSDs deliver higher bandwidth and I/O. The PCIe 2.0 x8 interface provides a read/write bandwidth of up to 3.0 GB. I/O performance of SSDs delivers 4 KB random data transfer, realizing up to 600,000 continuous random read IOPS and 220,000 continuous random write IOPS. Although SSDs deliver high read and write speeds, SSDs have a shorter write lifespan. When SSDs are used, the distributed SSD storage system uses multiple mechanisms and measures to improve reliability.

High-performance snapshot

The distributed storage system delivers the snapshot mechanism, which allows the system to capture the status of the data written into a logical volume at a particular point in time. The data snapshot can then be exported and used for restoring the volume data when required. The snapshot data of the distributed storage system is based on the distributed hash table (DHT) mechanism. Therefore, the snapshots do not cause performance deterioration of the original volumes. The DHT technology delivers high query efficiency. For example, to build indexes for a 2 TB hard disk in the memory, tens of MB of memory space is required. Only one hash query operation can determine whether a snapshot has been created for the disk. If a snapshot has been created, the hash query can also determine the storage location of the snapshot.

High-performance linked cloning

The distributed storage system delivers the linked cloning mechanism for incremental snapshots so that multiple cloned volumes can be created for a snapshot. The data in the cloned volumes is the same as that in the snapshot. Subsequent modifications to a cloned volume do not affect the snapshot or other cloned volumes. The distributed storage system supports batch deployment of VM volumes. Hundreds of VM volumes can be created in seconds. A cloned volume can be used for creating a snapshot, restoring data from the snapshot, and cloning the volume as the base volume again.

High-speed InfiniBand (IB) network

To eliminate storage switching bottlenecks in a distributed storage environment, a distributed storage system can be deployed on an IB network designed for high-bandwidth applications.

Virtualized Storage and Non-virtualized Storage

Storage virtualization described in this section refers to virtualization in a narrow sense. If a cluster has a file system, it is virtualized storage. Otherwise, it is non-virtualized storage. The file system can be an NFS or a virtual cluster file system. If no file system is available, the virtualized cluster needs to directly invoke logical volumes.

We know that physical disks reside at the bottom of the storage system, either centralized or distributed. After RAID or replication is implemented, physical volumes are created on top of these physical disks. In most cases, the physical volumes are not directly mounted to upper-layer applications, for example, OSs or virtualization systems (used in this document). The reason is that once a physical volume is mounted, all its space is formatted by upper-layer applications. After the storage space is used up, you can add disks to expand the capacity. However, you need to reformat the physical volume after the capacity expansion, which may cause data loss. Therefore, typically, multiple physical volumes are combined into a volume group, and then the volume group is virtualized into multiple logical volumes (LVs). The upper-layer applications use the spaces of the LVs.

In cloud computing, the virtualization program formats the LVs. Vendors use different virtual file systems. For example, VMware uses Virtual Machine File System (VMFS), and Huawei uses Virtual Image Manage System (VIMS). Both of them are high-performance cluster file systems that deliver a capacity exceeding the limit of a single system and allow multiple compute nodes to access an integrated clustered storage pool. The file system of a computing cluster ensures that no single server or application software have complete control over the access to the file system.

VIMS is used as an example. It is based on SAN storage. FusionStorage delivers only non-virtualized storage space. FusionCompute manages VM images and configuration files on VIMS. VIMS uses the distributed locking mechanism to ensure read/write consistency of cluster data. The minimum storage unit used by virtualization programs is logical unit number (LUN). LUNs correspond to volumes. Volumes are managed objects in the storage system. LUNs are external presentation of volumes. LUNs and volumes are allocated from the same resource pool.

After virtualization is used, LUNs can be divided as thick LUNs and thin LUNs.

As a traditional type of LUNs, thick LUNs support virtual resource allocation. They are easy to create, expand, and compress. A thick LUN gets full storage capacity from the storage pool once being created, namely, the LUN size equals to the allocated space. Therefore, the performance of a thick LUN is relatively high and predictable. Aside from thick LUNs, thin LUNs support virtual resource allocation. They are easy to create, expand, and compress.

An initial capacity allocation policy is set during the creation of thin LUNs. After thin LUNs are created, the storage system allocates an initial capacity to each LUN and retains the remaining capacity in the storage pool. When the usage of the allocated storage capacity reaches the threshold, the storage system allocates a certain amount of capacity from the storage pool to the thin LUNs. This process repeats until the thin LUNs reach the preset full capacity. Therefore, thin LUNs have higher storage capacity utilization.

The differences between thick and thin LUNs are as follows:

Capacity

Thick LUNs, once being created, get full storage capacity from the storage pool. Thin LUNs get storage capacity on demand. A thin LUN is allocated with an initial capacity when created and then allocated with more capacity dynamically.

Thin LUNs

Disk space reclamation

Capacity reclamation here refers to releasing the capacity of some LUNs to the storage pool for the use of other LUNs. Capacity reclamation does not apply to a thick LUN, as it gets full capacity from the storage pool when created. Though data in a thick LUN is deleted, the allocated capacity is occupied by the thick LUN and cannot be used by other LUNs. However, if a thick LUN is deleted manually, its capacity can be reclaimed.

When data in a thin LUN is deleted, space in the thin LUN can be released. In this way, storage capacity can be used dynamically, improving the utilization rate.

Disk space reclamation

Performance

A thick LUN delivers higher performance for sequential reads/writes as it gets full storage capacity from the beginning, but it has some storage capacity wasted. The performance of a thin LUN is hampered as background formatting is required each time the thin LUN expands capacity. In addition, capacity allocations may cause discontinuous disk storage space, so it takes more time for sequential reads/writes to find storage locations.

Application scenarios

Thick LUNs:

− High performance is required.

− Storage space utilization is less concerned.

− Costs are insensitive.\

Thin LUNs:

− Moderate performance is required.

− Storage space utilization is more concerned.

− Costs are sensitive.

− Required storage capacity is hard to predict

In addition to virtualized clustered file systems, common file systems include NAS systems (NFS and CIFS) and OS file systems.

A file system is a hierarchical structure of large numbers of files. An OS file system enables users to view data in the form of files and folders, and to copy, paste, delete, and restore data at any time. File systems use directories to organize data into hierarchical structures. Directories are the places where file pointers are stored. All file systems maintain this directory. The operating system maintains only the local directory. The cluster maintains the shared directory formed by the NAS or clustered file system.

The common OS file formats include FAT32 (Microsoft), NTFS (Microsoft), UFS (Unix), and EXT2/3/4 (Linux).

OS file system

A user or an application creates files or folders.

These files and folders are stored in the file system.

The file system maps data corresponding to these files to file system blocks.

The file system blocks correspond to the logical partitions formed by the logical volumes.

The logical partitions are mapped to the physical partitions of the physical disks by using the OS or LVM.

A physical partition contains one or more physical disks in a physical volume.

VM Disks

A VM consists of configuration files and disk files. Each VM disk corresponds to a disk file where user data is stored.

If virtualized storage is used, all disk files are stored in the shared directory of the file system. If non-virtualized storage is used, each disk file corresponds to a LUN. From the perspective of users and OSs, either files or LUNs are the same as common hard drives, which are displayed as hard drives among the hardware resources of the system. When creating a VM, the administrator needs to create disks for the VM to store data. The disk information corresponds to several lines in the configuration file.

Similar to other files, VM disk files have their own fixed formats. Table lists common VM disk formats.

Table Common VM disk formats

Each vendor can use its own tool to convert other VM disk formats to formats that can be used by its own products. For example, Huawei Rainbow can convert third-party or open-source VM disks to the VHD format.

Storage Features of Huawei Virtualization Products

Storage Architecture of Huawei Virtualization Products

FusionCompute can use the storage resources from dedicated storage devices or the local disks of hosts. Dedicated storage devices are connected to hosts through network cables or optical fibers.

FusionCompute uniformly converts storage resources into datastores. After datastores are associated with hosts, virtual disks can be created for VMs.

Storage resources that can be converted to datastores include:

LUNs on SAN devices, including iSCSI storage devices and FC SAN storage devices
File systems created on network attached storage (NAS) devices
Storage pools on FusionStorage Block
Local disks on hosts (virtualized)

In Huawei FusionCompute, these storage units are called storage devices, and physical storage media that deliver storage space for virtualization are called storage resources, as shown in Figure.

Huawei storage model

When adding storage devices to FusionCompute, observe the Huawei-defined logical architecture and determine how devices at each logical layer are added to the system. For example, storage resources need to be manually added, and storage devices can be scanned.

Before using datastores, you need to manually add storage resources. If the storage resources are IP SAN, FusionStorage, or NAS storage, you need to add storage ports for hosts in the cluster and use the ports to communicate with the service ports of centralized storage controller or the management IP address of FusionStorage Manager. If the storage resources are provided by FC SAN, you do not need to add storage ports.

After adding storage resources, you need to scan for these storage devices on the FusionCompute portal to add them as datastores.

Datastores can be virtualized or non-virtualized. You can use LUNs as datastores and connect them to VMs from the SAN without creating virtual disks. This process is called raw device mapping (RDM). This technology applies to scenarios requiring large disk space, for example, database server construction. RDM can be used only for VMs that run certain OSs.

Characteristics of Huawei VM Disks

After adding datastores, you can create virtual disks for VMs. Customers may have various needs for using VMs, for example, they may want to share a VM disk to save more physical space. Therefore, Huawei VM disks are classified into different types based on these requirements.

Based on sharing type, VM disks are classified as non-shared disks and shared disks

− Non-shared: A non-shared disk can be used only by a single VM.

− Shared: A shared disk can be used by multiple VMs.

If multiple VMs that use a shared disk write data into the disk at the same time, data may be lost. Therefore, you need to use application software to control disk access permissions

Based on the configuration mode, VM disks can be classified as common disks, thin provisioning disks, and thick provisioning lazy zeroed disks.

− Common: The system allocates disk space based on the disk capacity. During disk creation in this mode, data remaining on the physical device will be zeroed out. The

performance of the disks in this mode is better than that in the other two modes, but the creation duration may be longer than that required in the other modes.

− Thin provisioning: In this mode, the system allocates part of the configured disk capacity for the first time, and allocates the rest disk capacity based on the storage usage of the disk until the configured disk capacity is allocated. In this mode, datastores can be overcommitted. It is recommended that the datastore overcommit rate not exceed 50%. For example, if the total capacity is 100 GB, the allocated capacity should be less than or equal to 150 GB. If the allocated capacity is greater than the actual capacity, the disk is in thin provisioning mode.

− Thick provisioning lazy zeroed: The system allocates disk space based on the disk capacity. However, data remaining on the physical device is zeroed out only on first data write from the VM as required. In this mode, the disk creation speed is faster than that in the Common mode, and the I/O performance is between the Common and Thin provisioning modes. This configuration mode supports only virtualized local disks or virtualized SAN storage.

Based on the configuration mode, VM disks can be classified as dependent disks, independent persistent disks, and independent non-persistent disks.

− Dependent: A dependent disk is included in the snapshot. Changes are written to disks immediately and permanently.

− Independent persistent: In this mode, disk changes are immediately and permanently written into the disk, which is not affected by snapshots.

− Independent non-persistent: In this mode, disk changes are discarded after the VM is stopped or restored using a snapshot.

If you select Independent persistent or Independent non-persistent, the system does not take snapshots of the data on the disk when creating a snapshot for the VM. When the VM snapshot is used to restore a VM, the VM disks are not restored. After a snapshot is taken for a VM, if disks on the VM are detached from the VM and not attached to any other VM, the disks will be attached to the VM after the VM is restored using the snapshot. However, data on the disks will not be restored.

If a disk is deleted after a snapshot is created for the VM, the disk will not be attached to the VM after the VM is restored using the snapshot. Some disk types cannot be changed once they are set and some can be changed. For example, disk modes can be converted.

Ref : 1

Little Learn

Storage Basics for Cloud Computing