%z InProceedings %K Rosenblum91 %s golding@cis.ucsc.edu (Thu Oct 17 11:12:07 1991) %A Mendel Rosenblum %A John K. Ousterhout %y UCBCS. %T The design and implementation of a log-structured file system %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 1 15 %x This paper presents a new technique for disk storage management %x called a log-structured file system. A log-structured file system %x writes all modifications to disk sequentially in a log-like %x structure, thereby speeding up both file writing and crash %x recovery. The log is the only structure on disk; it contains %x indexing information so that files can be read back from the log %x efficiently. In order to maintain large free areas on disk for %x fast writing, we divide the log into segments and use a segment %x cleaner to compress the live information from heavily fragmented %x segments. We present a series of simulations that demonstrate the %x efficiency of a simple cleaning policy based on cost and benefit. %x We have implemented a prototype log-structured file system called %x Sprite LFS; it outperforms current Unix file systems by an order of %x magnitude for small-file writes while matching or exceeding Unix %x performance for reads and large writes. Even when the overhead for %x cleaning is included, Sprite LFS can use 70% of the disk bandwidth %x for writing, whereas Unix file systems typically can use only %x 5--10%. %k Sprite, log cleaning, crash recovery, LFS %z InProceedings %K Gifford91 %s golding@cis.ucsc.edu (Thu Oct 17 11:21:05 1991) %A David K. Gifford %A Pierre Jouvelot %A Mark A. Sheldon %A James W. O'Toole Jr %y MITLCS. %T Semantic file systems %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 16 25 %x A semantic file system is an information storage system that %x provides flexible associative access to the system's contents by %x automatically extracting attributes from files with file type %x specific transducers. Associative access is provided by a %x conservative extension to existing tree-structured file system %x protocols, and by protocols that are designed specifically for %x content based access. Compatibility with existing file system %x protocols is provided by introducing the concept of a virtual %x directory. Virtual direcotry names are interpreted as queries, and %x thus provide flexible associative access to files and directories %x in a manner compatible with existing software. Rapid %x attribute-based access to file systme contents is implemented by %x automatic extraction and indexing of key properties of file system %x objects. The automatic indexing of files and directories is called %x ``semantic'' because user programmable transducers use information %x about the semantics of updated file system objects to extract the %x properties for indexing. Experimental results from a semantic file %x system implementation support the thesis that semantic file systems %x present a more effective storage abstractoin than do traditional %x tree structured file systems for information sharing and command %x level programming. %k transducers, virtual directories, associative access, file naming %k information retrieval %z InProceedings %K Vaswani91 %s golding@cis.ucsc.edu (Thu Oct 17 11:30:04 1991) %A Raj Vaswani %A John Zahorjan %y DEPTCS., Univ. of Washington %T The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 26 40 %x In a shared memory multiprocessor with caches, executing tasks %x develop ``affinity'' to processors by filling their caches with %x data and instructions during execution. A scheduling policy that %x ignores this affinity may waste processing power by causing %x excessing cache refilling. Our work focuses on quantifying the %x effect of processor reallocation on the performance of various %x parallel applications multiprogrammed on a shared memory %x multiprocessor, and on evaluating how the magnitude of this cost %x affects the choice of scheduling policy. %z InProceedings %K Karlin91 %s golding@cis.ucsc.edu (Thu Oct 17 17:14:30 1991) %A Anna R. Karlin %y DECSRC. %A Kai Li %y DEPTCS., Princeton Univ. %A Mark S. Manasse %A Susan Owicki %y DECSRC. %T Empirical studies of competitive spinning for a shared-memory multiprocessor %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 41 55 %x This paper studies seven strategies for determining whether and how %x long to spin before blocking. Of particular interest are %x competitive strategies, for which the performance can be shown to %x be no worse than some constant factor times an optimal off-line %x strategy. Measurements of lock-waiting time distributions were %x used to compare the cost of synchronization under all the %x strategies. Additional measurements of elapsed time for some of %x the programs and strategies allowed assessment of the impact of %x synchronization strategy on overall program performance. Both %x types of measurements indicated that the standard blocking strategy %x performs poorly compared to mixed strategies. Among the mixed %x strategies studied, adaptive algorithms perform better than %x non-adaptive ones. %k blocking, semaphores, performance evaluation, adaptive strategies %k spin locks %z InProceedings %K Muller91 %s golding@cis.ucsc.edu (Thu Oct 17 17:23:14 1991) %A Keith Muller %A Joseph Pasquale %y DEPTCSE., Univ. of California San Diego %T A high performance multi-structured file system design %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 56 67 %x We present a multi-structured file system designed for high %x bandwidth I/O and fast response. Our design is based on combining %x disk caching with three different file storage structures, each %x implemented on an independent and isolated disk array. Each %x storage structure is designed to optimize a different set of file %x system access characteristics such as cache writes, directory %x searches, file attribute requests or large sequential reads/writes. %k disk access patterns, RAID, mirrored disks, disk arrays, caching %z InProceedings %K Govindan91 %s golding@cis.ucsc.edu (Thu Oct 17 17:27:15 1991) %A Ramesh Govindan %A David P. Anderson %y UCBCS. %T Scheduling and IPC mechanisms for continuous media %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 68 80 %x Next-generation workstations will have hardware support for digital %x ``continuous media'' (CM) such as audio and video. CM applications %x handle data at high rate, with strict timing requirements, and %x often in small ``chunks''. If such applications are to run %x efficiently and predictably as user-level programs, an operating %x system must provide scheduling and IPC mechanisms that reflect %x these needs. We propose two such mechanisms: split-level CPU %x scheduling of lightweight processes in multiple address spaces, and %x memory-mapped streams for data movement between address spaces. %x These techniques reduce the number of user/kernel interactions %x (system calls, signals, and preemptions). Compared with existing %x mechanisms, they can reduce scheduling and I/O overhead by a factor %x of 4 to 6. %k split-level scheduling, CM, memory-mapped streams, lightweight processes %z InProceedings %K Rangan91 %s golding@cis.ucsc.edu (Fri Oct 18 10:29:48 1991) %A P. Venkat Rangan %A Harrick M. Vin %y DEPTCSE., Univ. of California, San Diego %T Designing file systems for digital audio and video %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 81 94 %x We address the unique requirements of a multimedia file system such %x as continuous storage and retrieval of media, maintenance of %x synchronization between multiple media streams, and efficient %x manipulation of huge media objects. We presenta model that relates %x disk and device characteristics to the recording rate, and derive %x storage granularity and scattering parameters that guarantee %x continuous access. In order for the file system to support %x multiple concurrent requests, we develop admission control %x algorithms for determining whether a new request can be accepted %x without violating the real-time constraints of any of the requests. %x We have implemented a prototype multimedia file system, which %x serves as a testbed for experimenting with policies and algorithms %x for multimedia storage. We present our initial experiences with %x using the file system. %k multimedia, granularity, admission control, scattering, continuous media %z InProceedings %K Anderson91 %s golding@cis.ucsc.edu (Fri Oct 18 10:35:13 1991) %A Thomas E. Anderson %A Brian N. Bershad %A Edward D. Lazowska %A Henry M. Levy %y DEPTCSE., Univ. of Washington %T Scheduler activations: effective kernel support for the user-level management of parallelism %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 95 109 %x Threads can be supported either by the operating system kernel or %x by user-level library code in the appliation address space, but %x neither approach has been fully satisfactory. This paper addresses %x this dilemma. First, we argue that the performance of kernel %x threads is inherently worse than that of user-level threads, rather %x than this being an artifact of existing implementations; we thus %x argue that managing parallelism at the user level is essential to %x high-performance parallel computing. Next, we argue that the lack %x of system integration exhibited by user-level threads is a %x consequence of the lack of kernel support for user-level threads %x provided by contemporary multiporcessor operating systems; we thus %x argue that kernel threads or processes, as currently conceived, ar %x the wrong abstraction on which to support user-level management of %x parallelism. Finally, we describe the design, imploementation, and %x performance of a new kernel interface and user-level thread package %x that together provide the same functionality as kernel threads %x without compromising the performance and flexibility advantages of %x user-level management of parallelism. %k threads, Firefly, scheduling policy %z InProceedings %K Marsh91 %s golding@cis.ucsc.edu (Fri Oct 18 13:04:53 1991) %A Brian D. Marsh %A Michael L. Scott %A Thomas J. LeBlanc %A Evangelos P. Markatos %y CSDEPT., Univ. of Rochester %T First-class user-level threads %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 110 121 %x We describe a set of kernel mechanisms and conventions designed to %x accord first-class status to user-level threads, allowing them to %x be used in any reasonable way while leaving the details of their %x implementation to user-level code. The key features of our %x approach are (1) shared memory for asynchronous communication %x between the kernel and the user, (2) software interrupts for events %x that might require action on the part of a user-level scheduler, %x and (3) a scheduler interface convention that facilitates %x interactions in user space between dissimilar kinds of threads. We %x have incorporated these mechanisms in the Psyche parallel operating %x system, and have used them to implement several different kinds of %x user-level threads. We argue for our approach in terms of both %x flexibility and performance. %k software interrupts, scheduling, shared memory, virtual processors, Psyche %z InProceedings %K Draves91a %s golding@cis.ucsc.edu (Fri Oct 18 13:10:16 1991) %A Richard P. Draves %A Brian N. Bershad %A Richard F. Rashid %A Randall W. Dean %y School of Comp. Sci., CMU. %T Using continuations to implement thread management and communication in operating systems %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 122 136 %x We have improved the performance of the Mach 3.0 operating system %x by redesigning its internal thread and interprocess communication %x facilities to use continuations as the basis for control trasfer. %x Compared to previous versions of Mach 3.0, our new system consumes %x 85% less space per thread. Cross-address space remote procedure %x calls execute 14% faster. Exception handling runs over 60% faster. %x In addition to improving system performance, we have used %x continuations to generalize many control transfer optimizations %x that are common to operating systems, and have recast those %x optimizations in terms of a single implementation methodology. %x This paper describes our experiences with using continuations in %x the Mach operating system. %k Mach %z InProceedings %K LaRowe91 %s golding@cis.ucsc.edu (Fri Oct 18 13:15:30 1991) %A Richard P. LaRowe, Jr %A Carla Schlatter Ellis %A Laurence S. Kaplan %T The robustness of NUMA memory management %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 137 151 %x The study of operating systems level memory management policies for %x nonuniform memory access time (NUMA) shared memory multiprocessors %x is an area of active research. Previous results have suggested %x that the best policy choice often depends on the application under %x consideration, while others have reported that the best policy %x depends on the particular architecture. Since both observations %x have merit, we explore the concept of policy tuning on an %x application/architecture basis. %x We introduce a highly tunable dynamic page placement policy for %x NUMA multiprocessors, and address issues related to the tuning of %x that policy to different architectures and applications. %x Experimental data acquired from our DUnX operating system running %x on two different NUMA multiprocessors are used to evaluate their %x usefulness, importance, and ease of policy tuning. %x Our results indicate that while varying some of the parameters can %x have dramatic effects on performance, it is easy to select a set %x of default parameter settings that result in good performance for %x each of our test applications on both architectures. This apparent %x robustness of our parameterized policy raises the possibility of %x machine-independent memory management for NUMA-class machines. %k shared memory multiprocessors, page placement policies, %k architecture independence, DUnX %z InProceedings %K Carter91 %s golding@cis.ucsc.edu (Fri Oct 18 21:54:01 1991) %A John B. Carter %A John K. Bennett %A Willy Zwaenepoel %y Comp. Sys. Lab., Rice Univ. %T Implementation and performance of Munin %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 152 164 %x Munin is a distributed shared memory (DSM) system that allows %x shared memory parallel programs to be executed efficiently on %x distributed memory multiprocessors. Munin is unique among existing %x DSM systems in its use of multiple consistency protocols and in its %x use of release consistency. In Munin, shared program variables are %x annotated with their expected access pattern, and these annotations %x are then used by the runtime system to choose a consistency %x protocol best suited to that access pattern. Release consistency %x allows Munin to mask network latency and reduce the number of %x messages required to keep memory consistent. Munin's %x multi-protocol release consistency is implemented in software, %x using a delayed update queue that buffers and merges pending %x outgoing writes. A sixteen-procssor prototype of Munin is %x currently operational. We evaluate its implementation and describe %x the execution of two Munin programs that achieve performance within %x ten percent of message passing implementations of the same %x programs. Munin achieves this level of performance with only minor %x annotations to the shared memory programs. %k multiple consistency protocols, distributed shared memory %k program annotation, release consistency %z InProceedings %K Lampson91 %s golding@cis.ucsc.edu (Mon Oct 21 15:32:18 1991) %A Butler Lampson %A Mart\'in Abadi %A Michael Burrows %A Edward Wobber %y DECSRC. %T Authentication in distributed systems: theory and practice %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 165 182 %x We describe a theory of authentication and a system that implements %x it. Our theory is based on the notion of principal and a ``speaks %x for'' relation between principals. A simple principal either has a %x name or is a communication channel; a compound principal can %x express an adopted role or delegation of authority. The theory %x explains how to reason about a principal's authority byt deducing %x the other prinicpals that it can speak for; authenticating a %x channel is one important application. We use the theory to explain %x many existing and proposed mechanisms for security. In particular, %x we describe the system we have built. It passes prinicpals %x efficiently as arguments or results of remote procedure calls, and %x it handles public and shared key encryption, name lookup in a large %x name space, groups of prinicpals, loading programs, delegation, %x access control, and revocation. %k Kerberos, access control, revocation, booting, principals, delegation %k speaks for %z InProceedings %K Rodeheffer91 %s golding@cis.ucsc.edu (Mon Oct 21 15:37:16 1991) %A Thomas L. Rodeheffer %A Michael D. Schroeder %y DECSRC. %T Automatic reconfiguration in Autonet %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 183 197 %x Autonet is a switch-based local area network using 100 Mbit/s %x full-duplex point-to-point links. Crossbar switches are %x interconnected to other switches and to host controllers in an %x arbitrary pattern. Switch hardware uses the destination address in %x each packet to determine the proper outgoing link for the next step %x in the path from source to destination. Autonet automatically %x recalculates these forwarding paths in response to failures and %x additions of network components. This automatic reconfiguration %x allows the network to continue normal operation without need of %x human intervention. Reconfiguration occurs quickly enough that %x higher-level protocols are not disrupted. This paper describes the %x fault monitoring and topology acquisition mechanisms that are %x central to automatic reconfiguration in Autonet. %k point-to-point communication, topology acquisition, cut-through routing %k communication failure detection, fault tolerance %z InProceedings %K Baker91a %s golding@cis.ucsc.edu (Mon Oct 21 15:43:27 1991) %A Mary G. Baker %A John H. Hartman %A Michael D. Kupfer %A Ken W. Shirriff %A John K. Ousterhout %y UCBCS. %T Measurements of a distributed file system %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 198 212 %x We analyzed the user-level file access patterns and caching %x behavior of the Sprite distributed file system. The first part of %x our analysis repeated a study done in 1985 of the BSD Unix files %x system. We found that file throughput has increased by a factor of %x 20 to an average of 8 Kbytes per second per active user over %x 10-minute intervals, and that the use of process migration for load %x sharing increased burst rates by another factor of six. Also, many %x more very large (multi-megabyte) files are in use today than in %x 1985. The second part of our analysis measured the behavior of %x Sprite's main-memory file caches. Client-level caches average %x about 7Mbytes in size (about one-quarter to one-third of main %x memory) and filter out about 50% of the traffic between clients and %x servers. 35% of the remaining server traffic is caused by paging, %x even on workstations with large memories. We found that client %x cache consistency is needed to prevent stale data errors, but that %x it is not invoked often enough to degrade overall system %x performance. %k Sprite, file access patterns, caches, BSD Unix file system %k process migration %z InProceedings %K Kistler91a %s golding@cis.ucsc.edu (Mon Oct 21 15:49:24 1991) %A James J. Kistler %A M. Satyanarayanan %y CMU. %T Disconnected operation in the Coda file system %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 213 225 %x Disconnected operation is a mode of operation that enables a client %x to continue accessing critical data during temporary failures of a %x shared data repository. An important, though not exclusive, %x application of disconnected operation is in supporting portable %x computers. In this paper, we show that disconnected operation is %x feasible, efficient, and usable by describing its design and %x implementation in the Coda File System. The central idea behind %x our work is that caching of data, ow widely used for performance, %x can also be exploited to improve availability. %k cached data, inconsistent replication, portable computers %k optimistic replica control, hoarding files %z InProceedings %K Liskov91a %s golding@cis.ucsc.edu (Mon Oct 21 15:53:52 1991) %A Barbara Liskov %A Sanjay Ghemawat %A Robert Gruber %A Paul Johnson %A Liuba Shrira %A Michael Williams %y MITLCS. %T Replication in the Harp file system %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 226 238 %x This paper describes the design and implementation of the Harp file %x system. Harp is a replicated Unix file system accessible via the %x VFS interface. It provides highly available and reliable storage %x for files and guarantees that file operations are executed %x atomically in spite of concurrency and failures. It uses a novel %x variation of the primary copy replication technique that provides %x good performance because it allows us to trade disk accesses for %x network communication. Harp is intended to be used within a file %x service in a distributed network; in our current implementation, it %x is accessed via NFS. Preliminary performance results indicated %x that Harp provides equal or better response time and system %x capacity than an ureplicated implementaion of NFS that uses Unix %x files directly. %k primary copy replication, NFS, VFS, uninterruptable power supplies %z InProceedings %K Schmuck91 %s golding@cis.ucsc.edu (Mon Oct 21 16:01:42 1991) %A Frank Schmuck %A Jim Wyllie %y Almaden Res. Center, IBM. %T Experience with transactions in QuickSilver %C Proc. 13th SOSP. %c Asilomar, Pacific Grove, CA %p ACM. SIGOPS %D 13 Oct. 1991 %P 239 253 %x All programs in the QuickSilver distributed system behave %x atomically with respect to their updates to permanent data. %x Operating system support for transactions provides the framework %x required to support this, as well as a mechanism that unifies %x reclamation of resources after failures or normal process %x termination. This paper evaluates the use of transactions for %x these purposes in a general purpose operating system and presents %x some of the lessons learned from our experience with a complete %x running system based on transactions. Examples of how transactions %x are used in QuickSilver and measurements of their use demonstrate %x that the transaction mechanism provides an efficient and powerful %x means for solving many of the problems introduced by operating %x system extensibility and distribution. %k distributed operating systems, failure recovery, concurrency control