Given the gradual and very noticeable increase in global ecommerce, It is rapidly becoming clear that disk-oriented clouds are unable to keep up with the scalability requirements of web applications. The huge number of I/O operations generate large unprocessed datasets in the queue, thereby decreasing the overall performance of the application.
During my study of the issue, I came across a research conducted by the Department of Computer Science at Stanford University. The topic of the research was RAMClouds and how this concept might reshape the future of web applications. In this article, I will provide a brief overview of the RAMCloud storage system and how it contrasts with the current disk oriented clouds.
The classical Disk-Oriented clouds were originally based on magnetic disk storage systems. Technologies evolved and these clouds migrated to Solid State Drives that took advantage of improved algorithms and databasing techniques. However, the performance improvements were not scaling up as the storage size increased.
The concept behind the RAMCloud is to handle the data (with thousand times the throughput of disk-based systems and with thousand times lower access latency) through DRAMs (Dynamic Random Access Memories). The research covered the impact of RAMClouds in three critical areas: scaling of large web applications, enabling data-intensive workloads at lower latency and scalable storage ( an important necessity for modern cloud computing industry).
The Architecture of RAMClouds
In your laptop, RAM holds data bits using charged and/or discharged capacitance within the integrated circuit. This requires a continuous supply of electricity to maintain this data in the RAM. When this supply is interrupted, all the data in the RAM is lost. RAMClouds operate on a similar principle.
The architecture of RAMClouds is really simple. In contrast to MemCached or other caching mechanism where hits or misses determines the data-fetch process, RAMClouds use DRAMs for permanent data storages.
However, RAMClouds come at a cost! As mentioned earlier, RAMClouds must be protected against power outages. The paper mentions that the data sources should be replicated across dispersed storage centers. An important issue in this technique is the failure probability that is equal to the probability of the failure of all the replicas at any given time. The data can be synced using different logging techniques to keep the data changes synced with each other. See the diagram below:
In order to translate the theory of RAMClouds into viable implementation that is accessible to the masses, researchers have to surmount the following challenges:
Low Latency RPC
RPC (Remote Procedural Calls) is the basic protocol that nodes utilize for sending data to other nodes of the network without caring about network switching. Now keeping in mind that modern data centers have a three-tier switching network, every switching point adds to the latency of the network. Thus, in order to keep the latency as low as possible, RAMClouds have to minimize processing times. The research suggests that this is implemented by dedicating one of the processing cores for processing the network calls only. This core also provides data to other cores for further processing. This is one the finest utilization of multi-core architecture, where a number of requests are handled with blazing parallelism.
Furthermore researchers are looking into optimizing the current TCP acknowledgements to get the job done with fewer network calls and time-outs to wipe out the congestion-causing packets.
Distribution and Scaling
Considering the size and architecture of the RAMClouds, one has to keep in mind the threat of crashes and continuous server spawning inside the pool for scaling. This brings me to the related idea of continuous and efficient data synchronization.
Everyone knows the challenges of maintaining good performance and availability during Big Data Set migrations. These challenges also arise when developing and working with RAMClouds. Researchers suggest that RAM logs should be created during data migration. This will intelligently migrate the data during leisure time while serving the queued requests and then update the data logs once the migration is done.
One of the main challenges during the implementation of RAMCloud storage system will be the methodology for sharing thousands of applications of different nomenclature. This will result in a number of small but rapidly growing applications on a single RAMCloud. This should be implemented in such a way that small application does not suffer from the impact of resource hungry applications. Similarly, these applications need to be quickly scale up to achieve high performance levels. RAMClouds need to provide different techniques for performance and resource isolations and should implement security mechanism to allow different pools of data to work in a shared environment.
If you want to learn more about RAMClouds you can read this informative research paper and gain deeper insights about RAMCloud storage system.