This paper presents HAC, a novel technique for managing the client cache in a distributed, persistent object storage system. HAC is a hybrid between page and object caching that combines the virtues of both while avoiding their disadvantages. It achieves the low miss penalties of a page-caching system, but is able to perform well even when locality is poor, since it can discard pages while retaining their hot objects. It realizes the potentially lower miss rates of object-caching systems, yet avoids their problems of fragmentation and high overheads. Furthermore, HAC is adaptive: when locality is good it behaves like a page-caching system, while if locality is poor it behaves like an object-caching system. It is able to adjust the amount of cache space devoted to pages dynamically so that space in the cache can be used in the way that best matches the needs of the application. The paper also presents results of experiments that indicate that HAC outperforms other object storage systemsacross a wide range of cache sizes and workloads; it performs substantially better on the expected workloads, which have low to moderate localit y. Thus we show that our hybrid, adaptive approach is the cache management technique of choice for distributed, persistent object systems.
This paper describes a new scheme for guaranteeing that transactions in a client/server system observe consistent state while they are running. The scheme is presented in conjunction with an optimistic concurrency control algorithm, but could also be used to prevent read-only transactions from conflicting with read/write transactions in a multi-version system. The scheme is lazy about the consistency it provides for running transactions and also in the way it generates the consistency information. The paper presents results of simulation experiments showing that the cost of the scheme is negligible.The scheme uses multipart timestamps to inform nodes about information they need to know. Today the utility of such schemes is limited because timestamp size is proportional to system size and therefore the schemes don't scale to very large systems. We show how to solve this problem. Our multipart timestamps are based on real rather than logical clocks; we assume clocks in the system are loosely synchronized. Clocks allow us to keep multipart timestamps small with minimal impact on performance: we remove old information that is likely to be known while retaining recent information. Only performance and not correctness is affected if clocks get out of synch.
Cooperative caching is a promising technique to avoid the increasingly formidable disk bottleneck problem in distributed storage systems; it reduces the number of disk accesses by servicing client cache misses from the caches of other clients. However, existing cooperative caching techniques do not provide adequate support for fine-grained sharing. In this paper, we describe a new storage system architecture, split caching, and a new cache coherence protocol, fragment reconstruction, that combine cooperative caching with efficient support for fine-grained sharing and transactions. We also present the results of performance studies that show that our scheme introduces little overhead over the basic cooperative caching mechanism and provides better performance when there is false sharing.
Applications of the future will need to support large numbers of clients and will require scalable storage systems that allow state to be shared reliably. Recent research in distributed file systems provides technology that increases the scalability of storage systems. But file systems only support sharing with weak consistency guarantees and can not support applications that require transactional consistency. The challenge is how to provide scalable storage systems that support transactional applications. We are developing technology for scalable transactional storage systems. Our approach combines scalable caching and coherence techniques developed in serverless file systems and DSM systems, with recovery techniques developed in traditional databases. This position paper describes the design rationale for {\em split caching}, a new scalable memory management technique for network-based transactional object storage systems, and {\em fragment reconstruction}, a new coherence protocol that supports fine-grained sharing.
This paper describes an efficient optimistic concurrency control scheme for use in distributed database systems in which objects are cached and manipulated at client machines while persistent storage and transactional support are provided by servers. The scheme provides both serializability and external consistency for committed transactions; it uses loosely synchronized clocks to achieve global serialization. It stores only a single version of each object, and avoids maintaining any concurrency control information on a per-object basis; instead, it tracks recent invalidations on a per-client basis, an approach that has low in-memory space overhead and no per-object disk overhead. In addition to its low space overheads, the scheme also performs well. The paper presents a simulation study that compares the scheme to adaptive callback locking, the best concurrency control scheme for client-server object-oriented database systems studied to date. The study shows that our scheme outperforms adaptive callback locking for low to moderate contention workloads, and scales better with the number of clients. For high contention workloads, optimism can result in a high abort rate; the scheme presented here is a first step toward a hybrid scheme that we expect to perform well across the full range of workloads.
Object-based client caching allows clients to keep more frequently accessed objects while discarding colder objects that reside on the same page. However, when these objects are modified and sent to the server, it may need to read the corresponding page from disk to install the update. These {\em installation reads} are not required with a page-based cache because whole pages are sent to the server. The relative effectiveness of the two cache techniques depends crucially on the application workload and on the object layout on pages. In a large system when many applications share data objects customizing cache management to either object-based or page-based configuration may not be optimal to any of the applications. We describe a hybrid system that permits clients to cache objects and pages. The system uses a simple cache design that combines the best of object caching and page caching. The client increases its cache hit ratio as in object-based caching. The client avoids some installation reads by sending pages to the server when possible. Using simulated workloads we explore the performance of our design and show that it can offer a significant performance improvement over both pure object caching and pure page caching on a range of workloads.