Cloud Storage Defined

Cloud storage means storing data files with a public cloud provider.  The customer can host their virtual machines and apps with the cloud provider too.  Or, in a hybrid architecture, they can keep those in their own data center and store databases, datafiles, and archived in the cloud.
Cloud storage can provide lower cost, more security, and a reduced risk of data loss than a company building their own infrastructure.  Cloud storage also helps a company meet SOX. HIPAA, ediscovery, and other legal requirements for secure, offsite storage.

Here we explain basic storage concepts:

Local and Network Drives

Storage can be block, hierarchical storage, referenced by the traditional directory/filename address. Or it can be object storage referenced using a URL and object ID.  Or it can be Hadoop storage, referenced using the notation hdfs://(some address)/filename.

Data in the cloud can be stored on local devices or network storage.  It can be on magnetic or solid state disks.

Storage that is local to virtual machines is locally-attached storage, often called SCSI storage. In that case, the disk controller is attached to the same machine as the disk drives.

Data files are stored in data blocks.  Those blocks can also be stored in SAN or SATA network-attached storage. There are storage hardware, such as EMC, Dell, HP and other storage devices.  These are often called storage arrays.

Hadoop stores data across a network of PCs using locally-attached storage. The idea is to provide redundant storage yet do so using low cost storage.  (Hadoop writes each data block 3 times.)

Data Blocks, Object Storage, and the Hierarchical File System

Data files are stored in data blocks.  You can think of that roughly as an equal amount of memory, called a page, written to disk.  The computer reads and writes these in contiguous sections of memory called pages.

A block contains data from more than one file.  Object storage also use data blocks, but the files are referenced by an object id. The logic behind that is a directory structure can quickly get so long as to be difficult to work with, as in //customers/ford/2015/images/highres/… The object address is easier to read, short-hand notation.

Objects are stored in URLs like https://(server)/(objectID or key, meaning filename).  Hierarchical files are stored in logical mount points like /usr/local that map to a physical device.

Storage should be abstracted and device independent.  It is sometimes cumbersome to store one logical file system across multiple devices.  Object storage is easier to conceive and implement in the cloud because it is, by design, device independent.  Still most systems use the hierarchical file system.

Object storage also supports user defined metadata.  So instead of being limited to date created, date modified, size, or building some description into the directory name it can include customer, storage retention time, format, encryption algorithm, etc.

Lower Operating Costs and Increased Redundancy

Cloud storage moves files onto the cloud.  This can provide increased redundancy and high availability and lower cost.  That model is what a cloud provider bases their business own.  This is why it has become a de facto standard in IT.  The cloud can store data in geographically distinct locations easier than a private cloud or using traditional standalone servers.

Cloud providers too can adjust pricing for storage to deliver the lowest operating cost by matching that with requirements.  For example, data can be stored on fast SSD (solid state) devices or regular magnetic drives.  Archived data can be kept offline with some agreed time to bring it back online.  That results in storage that can be as low as pence per GB.  The vendor can also distinguish between data is that accessed frequently and data that is not.  And some storage vendors use disks even for perpetual storage, instead of tape.  That is something the private cloud would find expensive to do.

A company that uses a hybrid architecture can have their applications in their own private cloud and their data in a public cloud.  That public cloud could be across different cloud vendors.  The cloud storage device is given an internal IP address on the network using a VPN connection.  So it becomes an extension of the company’s infrastructure.  The customer stores their database files, logical files, email archives, or whatever data both in the cloud and on their own devices.  The application can use web services to retrieve object files or mount points to retrieve logical files across the network.  The second option has a latency that might be too high for certain applications across a WAN.  So cloud storage in that architecture might be designed as redundant storage.  This latency is why it is best to keep the apps in the cloud too.


Finally, it is not necessarily true that the cloud is more secure that maintaining one’s own equipment.  The cloud provider assumes no responsibility for malware or viruses or other hacking.  But they have better physical security and better protection against data loss via replication to multiple storage devices across multiple locations and redundant power and ISP connections.

