Facebook Haystack for Photo Storage

Facebook, unlike on any other social media platform, image sharing is one of the most widely used feature. Recent data reported by Beaver et al. indicates that until 2010, over 65 billion images had been uploaded on Facebook making it the largest photo-sharing website globally. Indeed, as Facebook popularity increases, image storage presents a substantial test for Facebook’s storage architecture.


Facebook currently uses Haystack, an object-oriented image storage system developed to enable Facebook users to share and retain photos in what is best described as a write-once, read- frequently, never-alter, and hardly-delete infrastructure (Huang et al.; Beaver et al.). Unlike the old storage system that consisted of three tiers namely upload, photo serving, and NFS storage tier, the Haystack ecosystem comprises of three critical components namely, Store, Directory, and Cache (Huang et al.). Haystack photo architecture combines the photo serving and storage tiers together and utilizes a HTTP based image server that retains images in a generic storage referred to as Haystack (Beaver et al.). By doing so, the haystack ecosystem eliminates metadata overheads that slowed down the read feature of the previous infrastructure (Huang et al.). It therefore allows photo retrieval with the least number of input/output operations. Haystack is less costly, and offers high output compared to previous ecosystems. It is also incrementally expandable; a highly sought after feature especially with the ever-rising number of photo uploads (Beaver et al.; Huang et al.).


Possible Future Solutions


One of the main future solutions to Facebook storage is the use of flash storage. Its programmed to store and erase media data electronically. Since Facebook stores a large of data, this can cause delays in loading or uploading media data. This can be eliminated by use of flash storage, this is because it eliminates delay in seek time since it provides the large magnitude of data faster compared to spinning flasks. This is a technology breakthrough in performance for large data storage applications.


Architecture Improvements


In the previous architecture, Facebook stored its data in three-tier layers, that is upload photo serving and storage tier. This delayed the process of uploading and loading media since Facebook is more of a read than write structure.
In the haystack architecture, storage, serving and upload of photos are done in one physical layer. The photos are then stored in generic object form. This eliminates the metadata of photos for the read operation. The function layers of haystack architecture include photo store, file system, storage, server and object store.


Cold Storage Methods


For a better performance, primary storage is required for data that is frequently accessed by users. As time passes by and the data is not being frequently being accessed as before, it moved to cold flash storage. To reduce cost and storage resources optimization organizations should use tier-based approach because it moves the less frequently accessed data to cold data storage.


Flash Storage Issues


Flash storage advantages can be so enticing, but there are trade-offs to be considered. Its high performance comes with processing power price, its high in some cases compared to hard drives, it’s expensive to use flash storage. The reading and writing of data to the flash cell cause repeated wear which reduces there reliability hence high error rates are caused. This can be prevented by adding storage of data at each channel, this will maintain various error correcting codes which were computed to the data channels. The SSD detect errors in the flash cells. Logic in the SSD controller corrects small errors while for large errors a more complex logic in the controller is used, or from the host containing a running driver. Thus, large errors can also be corrected from the perspective of the system.


There is data loss due to errors not corrected at by the driver. SSD failure is brought by when SSD allows collecting large errors that are corrected by the host and not corrected by the SSD. This means that the errors collected are host correctable and not SSD correctable. This occurrence is known as SSD failure. Flash storage has also a capacity issue, this prevents it from reaching higher size compared to hard drives disk. Data compression and reduction are techniques being used in flash storage to addressing its storage capacity.

Leave a Reply