%z InProceedings
%K Rosenblum91
%s golding@cis.ucsc.edu (Thu Oct 17 11:12:07 1991)
%A Mendel Rosenblum
%A John K. Ousterhout
%y UCBCS.
%T The design and implementation of a log-structured file system
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 1 15
%x This paper presents a new technique for disk storage management
%x called a log-structured file system.  A log-structured file system
%x writes all modifications to disk sequentially in a log-like
%x structure, thereby speeding up both file writing and crash
%x recovery.  The log is the only structure on disk; it contains
%x indexing information so that files can be read back from the log
%x efficiently.  In order to maintain large free areas on disk for
%x fast writing, we divide the log into segments and use a segment
%x cleaner to compress the live information from heavily fragmented
%x segments.  We present a series of simulations that demonstrate the
%x efficiency of a simple cleaning policy based on cost and benefit.
%x We have implemented a prototype log-structured file system called
%x Sprite LFS; it outperforms current Unix file systems by an order of
%x magnitude for small-file writes while matching or exceeding Unix
%x performance for reads and large writes.  Even when the overhead for
%x cleaning is included, Sprite LFS can use 70% of the disk bandwidth
%x for writing, whereas Unix file systems typically can use only
%x 5--10%. 
%k Sprite, log cleaning, crash recovery, LFS

%z InProceedings
%K Gifford91
%s golding@cis.ucsc.edu (Thu Oct 17 11:21:05 1991)
%A David K. Gifford
%A Pierre Jouvelot
%A Mark A. Sheldon
%A James W. O'Toole Jr
%y MITLCS.
%T Semantic file systems
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 16 25
%x A semantic file system is an information storage system that
%x provides flexible associative access to the system's contents by
%x automatically extracting attributes from files with file type
%x specific transducers.  Associative access is provided by a
%x conservative extension to existing tree-structured file system
%x protocols, and by protocols that are designed specifically for
%x content based access.  Compatibility with existing file system
%x protocols is provided by introducing the concept of a virtual
%x directory.  Virtual direcotry names are interpreted as queries, and
%x thus provide flexible associative access to files and directories
%x in a manner compatible with existing software.  Rapid
%x attribute-based access to file systme contents is implemented by
%x automatic extraction and indexing of key properties of file system
%x objects.  The automatic indexing of files and directories is called
%x ``semantic'' because user programmable transducers use information
%x about the semantics of updated file system objects to extract the
%x properties for indexing.  Experimental results from a semantic file
%x system implementation support the thesis that semantic file systems
%x present a more effective storage abstractoin than do traditional
%x tree structured file systems for information sharing and command
%x level programming.
%k transducers, virtual directories, associative access, file naming
%k information retrieval

%z InProceedings
%K Vaswani91
%s golding@cis.ucsc.edu (Thu Oct 17 11:30:04 1991)
%A Raj Vaswani
%A John Zahorjan
%y DEPTCS., Univ. of Washington
%T The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 26 40
%x In a shared memory multiprocessor with caches, executing tasks
%x develop ``affinity'' to processors by filling their caches with
%x data and instructions during execution.  A scheduling policy that
%x ignores this affinity may waste processing power by causing
%x excessing cache refilling.  Our work focuses on quantifying the
%x effect of processor reallocation on the performance of various
%x parallel applications multiprogrammed on a shared memory
%x multiprocessor, and on evaluating how the magnitude of this cost
%x affects the choice of scheduling policy.

%z InProceedings
%K Karlin91
%s golding@cis.ucsc.edu (Thu Oct 17 17:14:30 1991)
%A Anna R. Karlin
%y DECSRC.
%A Kai Li
%y DEPTCS., Princeton Univ.
%A Mark S. Manasse
%A Susan Owicki
%y DECSRC.
%T Empirical studies of competitive spinning for a shared-memory multiprocessor
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 41 55
%x This paper studies seven strategies for determining whether and how
%x long to spin before blocking.  Of particular interest are
%x competitive strategies, for which the performance can be shown to
%x be no worse than some constant factor times an optimal off-line
%x strategy.  Measurements of lock-waiting time distributions were
%x used to compare the cost of synchronization under all the
%x strategies.  Additional measurements of elapsed time for some of
%x the programs and strategies allowed assessment of the impact of
%x synchronization strategy on overall program performance.  Both
%x types of measurements indicated that the standard blocking strategy
%x performs poorly compared to mixed strategies.  Among the mixed
%x strategies studied, adaptive algorithms perform better than
%x non-adaptive ones.
%k blocking, semaphores, performance evaluation, adaptive strategies
%k spin locks

%z InProceedings
%K Muller91
%s golding@cis.ucsc.edu (Thu Oct 17 17:23:14 1991)
%A Keith Muller
%A Joseph Pasquale
%y DEPTCSE., Univ. of California San Diego
%T A high performance multi-structured file system design
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 56 67
%x We present a multi-structured file system designed for high
%x bandwidth I/O and fast response.  Our design is based on combining
%x disk caching with three different file storage structures, each
%x implemented on an independent and isolated disk array.  Each
%x storage structure is designed to optimize a different set of file
%x system access characteristics such as cache writes, directory
%x searches, file attribute requests or large sequential reads/writes.
%k disk access patterns, RAID, mirrored disks, disk arrays, caching

%z InProceedings
%K Govindan91
%s golding@cis.ucsc.edu (Thu Oct 17 17:27:15 1991)
%A Ramesh Govindan
%A David P. Anderson
%y UCBCS.
%T Scheduling and IPC mechanisms for continuous media
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 68 80
%x Next-generation workstations will have hardware support for digital
%x ``continuous media'' (CM) such as audio and video.  CM applications
%x handle data at high rate, with strict timing requirements, and
%x often in small ``chunks''.  If such applications are to run
%x efficiently and predictably as user-level programs, an operating
%x system must provide scheduling and IPC mechanisms that reflect
%x these needs.  We propose two such mechanisms:  split-level CPU
%x scheduling of lightweight processes in multiple address spaces, and
%x memory-mapped streams for data movement between address spaces.
%x These techniques reduce the number of user/kernel interactions
%x (system calls, signals, and preemptions).  Compared with existing
%x mechanisms, they can reduce scheduling and I/O overhead by a factor
%x of 4 to 6.
%k split-level scheduling, CM, memory-mapped streams, lightweight processes

%z InProceedings
%K Rangan91
%s golding@cis.ucsc.edu (Fri Oct 18 10:29:48 1991)
%A P. Venkat Rangan
%A Harrick M. Vin
%y DEPTCSE., Univ. of California, San Diego
%T Designing file systems for digital audio and video
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 81 94
%x We address the unique requirements of a multimedia file system such
%x as continuous storage and retrieval of media, maintenance of
%x synchronization between multiple media streams, and efficient
%x manipulation of huge media objects.  We presenta model that relates
%x disk and device characteristics to the recording rate, and derive
%x storage granularity and scattering parameters that guarantee
%x continuous access.  In order for the file system to support
%x multiple concurrent requests, we develop admission control
%x algorithms for determining whether a new request can be accepted
%x without violating the real-time constraints of any of the requests.
%x We have implemented a prototype multimedia file system, which
%x serves as a testbed for experimenting with policies and algorithms
%x for multimedia storage.  We present our initial experiences with
%x using the file system.
%k multimedia, granularity, admission control, scattering, continuous media

%z InProceedings
%K Anderson91
%s golding@cis.ucsc.edu (Fri Oct 18 10:35:13 1991)
%A Thomas E. Anderson
%A Brian N. Bershad
%A Edward D. Lazowska
%A Henry M. Levy
%y DEPTCSE., Univ. of Washington
%T Scheduler activations: effective kernel support for the user-level management of parallelism
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 95 109
%x Threads can be supported either by the operating system kernel or
%x by user-level library code in the appliation address space, but
%x neither approach has been fully satisfactory.  This paper addresses
%x this dilemma.  First, we argue that the performance of kernel
%x threads is inherently worse than that of user-level threads, rather
%x than this being an artifact of existing implementations; we thus
%x argue that managing parallelism at the user level is essential to
%x high-performance parallel computing.  Next, we argue that the lack
%x of system integration exhibited by user-level threads is a
%x consequence of the lack of kernel support for user-level threads
%x provided by contemporary multiporcessor operating systems; we thus
%x argue that kernel threads or processes, as currently conceived, ar
%x the wrong abstraction on which to support user-level management of
%x parallelism.  Finally, we describe the design, imploementation, and
%x performance of a new kernel interface and user-level thread package
%x that together provide the same functionality as kernel threads
%x without compromising the performance and flexibility advantages of
%x user-level management of parallelism.
%k threads, Firefly, scheduling policy

%z InProceedings
%K Marsh91
%s golding@cis.ucsc.edu (Fri Oct 18 13:04:53 1991)
%A Brian D. Marsh
%A Michael L. Scott
%A Thomas J. LeBlanc
%A Evangelos P. Markatos
%y CSDEPT., Univ. of Rochester
%T First-class user-level threads
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 110 121
%x We describe a set of kernel mechanisms and conventions designed to
%x accord first-class status to user-level threads, allowing them to
%x be used in any reasonable way while leaving the details of their
%x implementation to user-level code.  The key features of our
%x approach are (1) shared memory for asynchronous communication
%x between the kernel and the user, (2) software interrupts for events
%x that might require action on the part of a user-level scheduler,
%x and (3) a scheduler interface convention that facilitates
%x interactions in user space between dissimilar kinds of threads.  We
%x have incorporated these mechanisms in the Psyche parallel operating
%x system, and have used them to implement several different kinds of
%x user-level threads.  We argue for our approach in terms of both
%x flexibility and performance.
%k software interrupts, scheduling, shared memory, virtual processors, Psyche

%z InProceedings
%K Draves91a
%s golding@cis.ucsc.edu (Fri Oct 18 13:10:16 1991)
%A Richard P. Draves
%A Brian N. Bershad
%A Richard F. Rashid
%A Randall W. Dean
%y School of Comp. Sci., CMU.
%T Using continuations to implement thread management and communication in operating systems
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 122 136
%x We have improved the performance of the Mach 3.0 operating system
%x by redesigning its internal thread and interprocess communication
%x facilities to use continuations as the basis for control trasfer.
%x Compared to previous versions of Mach 3.0, our new system consumes
%x 85% less space per thread.  Cross-address space remote procedure
%x calls execute 14% faster.  Exception handling runs over 60% faster.
%x In addition to improving system performance, we have used
%x continuations to generalize many control transfer optimizations
%x that are common to operating systems, and have recast those
%x optimizations in terms of a single implementation methodology.
%x This paper describes our experiences with using continuations in
%x the Mach operating system.
%k Mach

%z InProceedings
%K LaRowe91
%s golding@cis.ucsc.edu (Fri Oct 18 13:15:30 1991)
%A Richard P. LaRowe, Jr
%A Carla Schlatter Ellis
%A Laurence S. Kaplan
%T The robustness of NUMA memory management
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 137 151
%x The study of operating systems level memory management policies for
%x nonuniform memory access time (NUMA) shared memory multiprocessors
%x is an area of active research.  Previous results have suggested
%x that the best policy choice often depends on the application under
%x consideration, while others have reported that the best policy
%x depends on the particular architecture.  Since both observations
%x have merit, we explore the concept of policy tuning on an
%x application/architecture basis.
%x We introduce a highly tunable dynamic page placement policy for
%x NUMA multiprocessors, and address issues related to the tuning of
%x that policy to different architectures and applications.
%x Experimental data acquired from our DUnX operating system running
%x on two different NUMA multiprocessors are used to evaluate their
%x usefulness, importance, and ease of policy tuning.
%x Our results indicate that while varying some of the parameters can
%x have dramatic effects on performance, it is easy to select a set
%x of default parameter settings that result in good performance for
%x each of our test applications on both architectures.  This apparent
%x robustness of our parameterized policy raises the possibility of
%x machine-independent memory management for NUMA-class machines.
%k shared memory multiprocessors, page placement policies,
%k architecture independence, DUnX

%z InProceedings
%K Carter91
%s golding@cis.ucsc.edu (Fri Oct 18 21:54:01 1991)
%A John B. Carter
%A John K. Bennett
%A Willy Zwaenepoel
%y Comp. Sys. Lab., Rice Univ.
%T Implementation and performance of Munin
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 152 164
%x Munin is a distributed shared memory (DSM) system that allows
%x shared memory parallel programs to be executed efficiently on
%x distributed memory multiprocessors.  Munin is unique among existing
%x DSM systems in its use of multiple consistency protocols and in its
%x use of release consistency.  In Munin, shared program variables are
%x annotated with their expected access pattern, and these annotations
%x are then used by the runtime system to choose a consistency
%x protocol best suited to that access pattern.  Release consistency
%x allows Munin to mask network latency and reduce the number of
%x messages required to keep memory consistent.  Munin's
%x multi-protocol release consistency is implemented in software,
%x using a delayed update queue that buffers and merges pending
%x outgoing writes.  A sixteen-procssor prototype of Munin is
%x currently operational.  We evaluate its implementation and describe
%x the execution of two Munin programs that achieve performance within
%x ten percent of message passing implementations of the same
%x programs.  Munin achieves this level of performance with only minor
%x annotations to the shared memory programs.
%k multiple consistency protocols, distributed shared memory
%k program annotation, release consistency

%z InProceedings
%K Lampson91
%s golding@cis.ucsc.edu (Mon Oct 21 15:32:18 1991)
%A Butler Lampson
%A Mart\'in Abadi
%A Michael Burrows
%A Edward Wobber
%y DECSRC.
%T Authentication in distributed systems: theory and practice
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 165 182
%x We describe a theory of authentication and a system that implements
%x it.  Our theory is based on the notion of principal and a ``speaks
%x for'' relation between principals.  A simple principal either has a
%x name or is a communication channel; a compound principal can
%x express an adopted role or delegation of authority.  The theory
%x explains how to reason about a principal's authority byt deducing
%x the other prinicpals that it can speak for; authenticating a
%x channel is one important application.  We use the theory to explain
%x many existing and proposed mechanisms for security.  In particular,
%x we describe the system we have built.  It passes prinicpals
%x efficiently as arguments or results of remote procedure calls, and
%x it handles public and shared key encryption, name lookup in a large
%x name space, groups of prinicpals, loading programs, delegation,
%x access control, and revocation.
%k Kerberos, access control, revocation, booting, principals, delegation
%k speaks for 

%z InProceedings
%K Rodeheffer91
%s golding@cis.ucsc.edu (Mon Oct 21 15:37:16 1991)
%A Thomas L. Rodeheffer
%A Michael D. Schroeder
%y DECSRC.
%T Automatic reconfiguration in Autonet
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 183 197
%x Autonet is a switch-based local area network using 100 Mbit/s
%x full-duplex point-to-point links.  Crossbar switches are
%x interconnected to other switches and to host controllers in an
%x arbitrary pattern.  Switch hardware uses the destination address in
%x each packet to determine the proper outgoing link for the next step
%x in the path from source to destination.  Autonet automatically
%x recalculates these forwarding paths in response to failures and
%x additions of network components.  This automatic reconfiguration
%x allows the network to continue normal operation without need of
%x human intervention.  Reconfiguration occurs quickly enough that
%x higher-level protocols are not disrupted.  This paper describes the
%x fault monitoring and topology acquisition mechanisms that are
%x central to automatic reconfiguration in Autonet.
%k point-to-point communication, topology acquisition, cut-through routing
%k communication failure detection, fault tolerance 

%z InProceedings
%K Baker91a
%s golding@cis.ucsc.edu (Mon Oct 21 15:43:27 1991)
%A Mary G. Baker
%A John H. Hartman
%A Michael D. Kupfer
%A Ken W. Shirriff
%A John K. Ousterhout
%y UCBCS.
%T Measurements of a distributed file system
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 198 212
%x We analyzed the user-level file access patterns and caching
%x behavior of the Sprite distributed file system.  The first part of
%x our analysis repeated a study done in 1985 of the BSD Unix files
%x system.  We found that file throughput has increased by a factor of
%x 20 to an average of 8 Kbytes per second per active user over
%x 10-minute intervals, and that the use of process migration for load
%x sharing increased burst rates by another factor of six.  Also, many
%x more very large (multi-megabyte) files are in use today than in
%x 1985.  The second part of our analysis measured the behavior of
%x Sprite's main-memory file caches.  Client-level caches average
%x about 7Mbytes in size (about one-quarter to one-third of main
%x memory) and filter out about 50% of the traffic between clients and
%x servers.  35% of the remaining server traffic is caused by paging,
%x even on workstations with large memories.  We found that client
%x cache consistency is needed to prevent stale data errors, but that
%x it is not invoked often enough to degrade overall system
%x performance. 
%k Sprite, file access patterns, caches, BSD Unix file system
%k process migration

%z InProceedings
%K Kistler91a
%s golding@cis.ucsc.edu (Mon Oct 21 15:49:24 1991)
%A James J. Kistler
%A M. Satyanarayanan
%y CMU.
%T Disconnected operation in the Coda file system
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 213 225
%x Disconnected operation is a mode of operation that enables a client
%x to continue accessing critical data during temporary failures of a
%x shared data repository.  An important, though not exclusive,
%x application of disconnected operation is in supporting portable
%x computers.  In this paper, we show that disconnected operation is
%x feasible, efficient, and usable by describing its design and
%x implementation in the Coda File System.  The central idea behind
%x our work is that caching of data, ow widely used for performance,
%x can also be exploited to improve availability.
%k cached data, inconsistent replication, portable computers
%k optimistic replica control, hoarding files 

%z InProceedings
%K Liskov91a
%s golding@cis.ucsc.edu (Mon Oct 21 15:53:52 1991)
%A Barbara Liskov
%A Sanjay Ghemawat
%A Robert Gruber
%A Paul Johnson
%A Liuba Shrira
%A Michael Williams
%y MITLCS.
%T Replication in the Harp file system
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 226 238
%x This paper describes the design and implementation of the Harp file
%x system.  Harp is a replicated Unix file system accessible via the
%x VFS interface.  It provides highly available and reliable storage
%x for files and guarantees that file operations are executed
%x atomically in spite of concurrency and failures.  It uses a novel
%x variation of the primary copy replication technique that provides
%x good performance because it allows us to trade disk accesses for
%x network communication.  Harp is intended to be used within a file
%x service in a distributed network; in our current implementation, it
%x is accessed via NFS.  Preliminary performance results indicated
%x that Harp provides equal or better response time and system
%x capacity than an ureplicated implementaion of NFS that uses Unix
%x files directly.
%k primary copy replication, NFS, VFS, uninterruptable power supplies

%z InProceedings
%K Schmuck91
%s golding@cis.ucsc.edu (Mon Oct 21 16:01:42 1991)
%A Frank Schmuck
%A Jim Wyllie
%y Almaden Res. Center, IBM.
%T Experience with transactions in QuickSilver
%C Proc. 13th SOSP.
%c Asilomar, Pacific Grove, CA 
%p ACM. SIGOPS
%D 13 Oct. 1991
%P 239 253
%x All programs in the QuickSilver distributed system behave
%x atomically with respect to their updates to permanent data.
%x Operating system support for transactions provides the framework
%x required to support this, as well as a mechanism that unifies
%x reclamation of resources after failures or normal process
%x termination.  This paper evaluates the use of transactions for
%x these purposes in a general purpose operating system and presents
%x some of the lessons learned from our experience with a complete
%x running system based on transactions.  Examples of how transactions
%x are used in QuickSilver and measurements of their use demonstrate
%x that the transaction mechanism provides an efficient and powerful
%x means for solving many of the problems introduced by operating
%x system extensibility and distribution.
%k distributed operating systems, failure recovery, concurrency control