%z InProceedings
%K Burrows89
%A Michael Burrows
%A Mart{\'\i}n Abadi
%y Digital Equipment Corporation Systems Research Center, Palo Alto, CA
%A Roger Needham
%y University of Cambridge Computer Laboratory
%T A logic of authentication
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 1 13
%k security, verification, reasoning, protocols, cryptography
%k Kerberos, Andrew Secure RPC, Needham-Schroeder Public-Key, CCITT X.509
%k Otway-Rees, wide-mouthed frog (WMF), Yahalom
%x Authentication protocols are the basis of security in many
%x distributed systems, and it is therefore essential to ensure that
%x these protocols function correctly. Unfortunately, their design has
%x been extremely error-prone. Most of the protocols found in the
%x literature contain redundancies or security flaws.
%x A simple logic has allowed us to describe the beliefs of
%x trustworthy parties involved in authentication protocols and the
%x evolution of these beliefs as a consequence of communication. We
%x have been able to explain a variety of authentication protocols
%x formally, to discover subtleties and errors in them, and to suggest
%x improvements. In this paper, we present the logic and then give the
%x results of our analysis of four published protocols, chosen either
%x because of their practical importance or because they serve to
%x illustrate our method.

%z InProceedings
%K Lomas89
%A T. Mark A. Lomas
%A Li Gong
%y University of Cambridge Computer Laboratory
%A Jerome H. Saltzer
%y Massachusetts Institute of Technology, Cambridge, MA
%A Roger M. Needham
%y University of Cambridge Computer Laboratory
%T Reducing risks from poorly chosen keys
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 14 18
%k security, authentication protocols, passwords, verifiable plaintext
%x It is well known that, left to themselves, people will choose
%x passwords than can be rather easily guessed. if this is done, they
%x are usually vulnerable to an attack based on copying the content of
%x the messages forming part of an authentication protocol and
%x experimenting, e.g.~with a dictionary, offline. The most usual
%x counter to this threat is to require people to use passwords which
%x are obscure, or even to insist on the system choosing their
%x passwords for them. In this paper we show alternatively how to
%x construct an authentication protocol in which offline
%x experimentation is impracticable; any attack based on experiment
%x must involve the real authentication server and is thus open to
%x detection by the server noticing multiple attempts.

%z InProceedings
%K Bolosky89
%A William J. Bolosky
%y Department of Computer Science, University of Rochester
%A Robert P. Fitzgerald
%y IBM Thomas Journal Watson Research Center
%A Michael L. Scott
%y Department of Computer Science, University of Rochester
%T Simple but effective techniques for NUMA memory management
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 19 31
%k non-uniform memory access, MP, multiprocessor, Mach, IBM ACE
%k performance measurements, false sharing, page placement
%x Multiprocessors with non-uniform memory access times introduce the
%x problem of placing data near the processes that use them, in order
%x to improve performance. We have implemented an automatic page
%x placement strategy in the Mach operating system on the IBM ACE
%x multiprocessor workstation. Our experience indicates that even very
%x simple automatic strategies  can produce nearly optimal page
%x placement. It also suggests that the greatest leverage for further
%x performance improvement lies in reducing {\em false sharing}, which
%x occurs when the same page contains objects that would best be
%x placed in different memories.

%z InProceedings
%K Cox89
%A Alan L. Cox
%A Robert J. Fowler
%y Department of Computer Science, University of Rochester
%T The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with PLATINUM
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 32 44
%k MP memory management, NUMA, non-uniform memory access, coherent memory
%k BBN Butterfly Plus, OS, Mach, Cmap data structure, page replication policy
%k page placement policy
%x PLATINUM is an operting system kernel with a novel memory
%x management system for {\em Non-Uniform Memory Access} (NUMA)
%x multiprocessor architectures. This memory management system
%x implements a {\em coherent memory} abstraction. Coherent memory is
%x uniformly accessible from all processors in the system. When used
%x by applications coded with appropriate programming styles it
%x appears to be nearly as fast as local physical memory and it
%x reduces memory contention. Coherent memory makes programming NUMA
%x multiprocessors easier for the user while attaining a level of
%x performance comparable with hand-tuned programs.
%x This paper describes the design and implementation of the PLATINUM
%x memory manageemnt system, emphasizing the coherent memory. We
%x measure the cost of basic operations implemening the coherent
%x memory. We also measure the performance of a set of application
%x programs running on PLATINUM. Finally, we comment on the
%x interaction between architecture and the coherent memory system.
%x PLATINUM currently runs on the BBN Butterfly Plus(TM) Multiprocessor.

%z InProceedings
%K Srinivasan89
%A V. Srinivasan
%y University of Wisconsin
%A Jeffrey C. Mogul
%y Digital Equipment Corporation Western Research Laboratory, Palo Alto, CA
%T Spritely NFS: experiments with cache-consistency protocols
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 45 57
%k distributed file sytems, file servers, file caching, Sprite
%k cache consistency protocols, performance, NFS
%x File caching is essential to good performance in a distributed
%x system, especially as processor speeds and memory sizes continue to
%x improve rapidly while disk latencies do not. Stateless-server
%x systems, such as NFS, cannot properly manage client file caches.
%x Stateful systems, such as Sprite, can use explicit cache
%x consistency protocols to improve both cache consistency and overall
%x performance.
%x By modifying NFS to use the Sprite cache consistency protocols, we
%x isolate the effects of the consistency mechanism from the other
%x features of Sprite. We find dramatic improvements on some, although
%x not all, benchmakrs, suggesting that an explicit cache consistency
%x protocol is necessary for both correctness and good performance.

%z InProceedings
%K Edwards89
%A David A. Edwards
%A Martin S. McKendry
%y FileNet Corporation
%T Exploiting read-mostly workloads in the FileNet file system
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 58 70
%k distributed file sytems, file servers, file caching
%k cache consistency protocols, performance, FileNet
%x Most recent studies of file system workloads have focussed on loads
%x imposed by general computing.  This paper introduces a
%x significantly different workload imposed by a {\em distributed
%x application system}.  The FileNet system is a distributed
%x application system that supports document image processing. The
%x FileNet file system was designed to support the workload imposed by
%x this application.  To characterize the {\em read-mostly} workload
%x applied to the file system and how it differs from general
%x computing environments, we present stalistics gathered from live
%x production installations. We contrast these statistics with
%x previously published data for more general computing.
%x We describe the key algorithms of the file system focusing on the
%x caching approach. A {\em bimodal} client caching approach is
%x employed, to match the file modification patterns observed.
%x Different cache consistency algorithms are used depending on usage
%x patterns observed for each file. Under most conditions, files
%x cached at workstations can be accessed without contacting servers.
%x When a file is subject to frequent modification that causes
%x excessive cache consistency traffic, caching is disabled for that
%x file, and servers participate in all open and close activities.
%x The data from production sites is examined to evaluate the success
%x of the approach under its applied load. Contrasts with alternative
%x approaches are made based on this data.

%z InProceedings
%K Braunstein89
%A Andrew Braunstein
%A Mark Riley
%A John Wilkes
%y Hewlett-Packard Company
%T Improving the efficiency of UNlX file buffer caches
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 71 82
%k file caching, XMF, HP-UX, file buffer caches, mapped files, performance
%k TLB miss rates, experimental method, virtual memory hardware
%x This paper reports on the effects of using hardware virtual memory
%x assists in managing file buffer caches in UNIX. A controlled
%x experimental environment was constructed from two systems whose
%x only difference was that one of them (XMF) used the virtual memory
%x hardware to assist file buffer cache search and retrieval.  An
%x extensive series of performance characterizations was used to study
%x the effects of varying the buffer cache size (from 3 Megabytes to
%x 70 MB); I/O transfer sizes (from 4 bytes to 64 KB); cache-resident
%x and non-cache-resident data; READs and WRITEs; and a range of
%x application programs.
%x The results: small READ/WRITE transfers from the cache ($\leq$1 KB)
%x were 50\% faster under XMF, while larger transfers ($\geq$8 KB)
%x were 20\% faster.  Retrieving data from disk, the XMF improvement
%x was 25\% and 10\% respectively, although OPEN/CLOSE system calls
%x took slightly longer in XMF.  Some individual programs ran as much
%x as 40\% faster on XMF, while an application benchmark suite showed
%x a 7--15\% improvement in overall execution time.  Perhaps
%x surprisingly, XMF had fewer translation lookaside buffer misses.

%z InProceedings
%K Schroeder89
%A Michael D. Schroeder
%A Michael Burrows
%y Digital Equipment Corporation Systems Research Center, Palo Alto, CA
%T Performance of Firefly RPC
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 83 90
%k remote procedure call, DEC, VAX, nub, UDP, IP, rpc stub compilers
%k argument marshalling, MP, multiprocessor
%x In this paper, we report on the performance of the remote procedure
%x call implementation for the Firefly multiprocessor and analyze the
%x implementation to account precisely for all measured latency. From
%x the analysis and measurements, we estimate how much faster RPC
%x could be if certain improvements were made.
%x The elapsed time for an inter-machine call to a remote procedure
%x that accepts no arguments and produces no results is 2.66
%x milliseconds. The elapsed time for an RPC that has a single
%x l440-byte result (the maximum result that will fit in a single
%x packet) is 6.35 milliseconds. Maximum inter-machine throughput of
%x application program data using RPC is 4.65 megabits/second,
%x achieved with 4 threads making parallel RPCs that return the
%x maximum sized result that fits in a single RPC result packet.  CPU
%x utilization at maximum throughput is about 1.2 CPU seconds per
%x second on the calling machine and a little less on the server.
%x These measurements are for RPCs from user space on one machine to
%x user space on another, using the installed system and a 10
%x megabit/second Ethernet. The RPC packet exchange protocol is built
%x on IP/UDP, and the times include calculating and verifying UDP
%x checksums. The Fireflies used in the tests had 5 MicroVAX II
%x processors and a DEQNA Ethernet controller.

%z InProceedings
%K Hutchinson89
%A Norman C. Hutchinson
%A Larry L. Peterson
%A Mark B. Abbott
%A Sean O'Malley
%y Department of Computer Science, University of Arizona
%T RPC in the {\it x}-kernel: evaluating new design techniques
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 91 101
%k remote procedure call, performance, protocol design and implementation
%k virtual protocols, layered protocols, OS, Sprite protocols
%k uniform protocol interface
%o Source code available via anonymous ftp from cs.arizona.edu, file
%o xkernel/xkernel.tar.Z.
%x This paper reports our experiences implementing remote procedure
%x call (RPC) protocols in the {\it x}-kernel. This exercise is interesting
%x because the RPC protocols exploit two novel design techniques:
%x {\em virtual protocols} and {\em layered protocols}. These techniques
%x are made possible because the x-kernel provides an object-oriented
%x infrastructure that supports three significant features: a uniform
%x interface to all protocols, a late binding between protocol layers,
%x and a small overhead for invoking any given protocol layer. For
%x each design technique, the paper motivates the technique with a
%x concrete example, describes how it is applied to the implementation
%x of RPC protocols, and presents the results of experiments designed
%x to evaluate the technique.

%z InProceedings
%K Bershad89
%A Brian N. Bershad
%A Thomas E. Anderson
%A Edward D. Lazowska
%A Henry M. Levy
%y Department of Computer Science and Engineering, University of Washington
%T Lightweight remote procedure call
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 102 113
%k remote procedure call, rpc, LRPC, performance, dispatching
%k fine-grained protection domains, domain crossing
%k Taos OS, DEC Firefly, MicroVAX multiprocessor, MP
%x Lightweight Remote Procedure Call (LRPC) is a communication
%x facility designed and optimized for communication between
%x protection domains on the same machine.
%x In contemporary small-kernel operating systems, existing RPC
%x systems incur an unnecessarily high cost when used for the type of
%x communication that predominates --- between protection domains on
%x the same machine. This cost leads system designers to coalesce
%x weakly-related subsystems into the same protection domain, trading
%x safety for performance. By reducing the overhead of same-machine
%x communication, LRPC encourages both safety and performance.
%x LRPC combines the control transfer and communication model of
%x capability systems with the programming semantics and large-grained
%x protection model of RPC. LRPC achieves a factor of three
%x performance improvement over more traditional approaches based on
%x independent threads exchanging messages, reducing the cost of
%x same-machine communication to nearly the lower bound imposed by
%x conventional hardware.
%x LRPC has been integrated into the Taos operating system of the DEC
%x SRC Firefly multiprocessor workstation.

%z InProceedings
%K Weiser89
%A Mark Weiser
%A Alan Demers
%A Carl Hauser
%y Xerox Palo Alto Research Center, CA
%T The Portable Common Runtime approach to interoperability
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 114 122
%k structural issues, PCR, garbage collection, threads, IO
%k incremental loader, symbol binding
%k C, CommonLisp, Cedar, language runtime system
%o Source code available via anonymous ftp from arisia.xerox.com,
%o directory pub, file pcr.tar.Z.
%x Operating system abstractions do not always reach high enough for
%x direct use by a language or applications designer. The gap is
%x filled by language-specific runtime environments, which become more
%x complex for richer languages (CommonLisp needs more than C++, which
%x needs more than C). But language-specific environments inhibit
%x integrated multi-lingual programming. and also make porting hard
%x (for instance, because of operating system dependencies). To help
%x solve these problems, we have built the Portable Common Runtime
%x (PCR), a language-independent and operating-system-independent base
%x for modern languages. PCR offers four interrelated facilities:
%x storage management (including universal garbage collection), symbol
%x binding (including static and dynamic linking and loading), threads
%x (lightweight processes), and low-level I/O (including network
%x sockets). PCR is ``common'' because these facilities simultaneously
%x support programs in several languages. PCR supports C, Cedar,
%x Scheme, and CommonLisp intercalling and runs pre-existing C and
%x CommonLisp (Kyoto) binaries. PCR is ``portable'' because it uses only
%x a small set of operating system features. The PCR source code is
%x available for use by other researchers and developers.

%z InProceedings
%K Abrossimov89
%A Vadim Abrossimov
%A Marc Rozier
%y Chorus syst\`emes
%A Marc Shapiro
%y INRIA.
%T Generic virtual memory management for operating system kernels
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 123 136
%k structural issues, GMI, PVM, OS, virtual memory, VM, IO, Chorus Nucleus
%k context, region, segment, Chorus/MIX, performance
%x We discuss the rationale and design of a Generic Memory management
%x Interface, for a family of scalable operating systems. It consists
%x of a general interface for managing virtual memory, independently
%x of the underlying hardware architecture (e.g. paged versus
%x segmented memory), and independently of the operating system kernel
%x in which it is to be integrated. In particular, this interface
%x provides abstractions for support of a single, consistent cache for
%x both mapped objects and explicit I/O, and control of data caching
%x in real memory. Data management policies are delegated to external
%x managers.
%x A portable implementation of the Generic Memory management
%x Interface for paged architectures, the Paged Virtual Memory
%x manager, is detailed. The PVM uses the novel history object
%x technique for efficient deferred copying. The GMI is used by the
%x Chorus Nucleus, in particular to support a distributed version of
%x Unix. Performance measurements compare favorably with other
%x systems.

%z InProceedings
%K Rosenburg89
%A Brian S. Rosenburg
%y IBM Thomas Journal Watson Research Center
%T Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 137 146
%k multiprocessors, MP, TLB consistency, coherence, Mach, IBM RP3, performance
%x Operating systems for most current shared-memory multiprocessors
%x must maintain translation lookaside buffer (TLB) consistency across
%x processors.  A processor that changes a shared page table must
%x flush outdated mapping information from its own TLB, and it must
%x force the other processors using the page table to do so as well.
%x Published algorithms for maintaining TLB consistency on some
%x popular commercial multiprocessors incur excessively high
%x synchronization costs We present an efficient TLB consistency
%x algorithm that can be implemented on multiprocessors that include a
%x small set of reasonable architectural features. This algorithm has
%x been incorporated in a version of the MACH operating system
%x developed for the IBM Research Parallel Processor prototype (RP3).

%z InProceedings
%K Chase89
%A Jeffery S. Chase
%A Franz G. Amador
%A Edward D. Lazowska
%A Henry M. Levy
%A Richard J. Littlefield
%y Department of Computer Science and Engineering, University of Washington
%T The Amber system: parallel programming on a network of multiprocessors
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 147 158
%k multiprocessors, MP, distributed object-based language, rpc
%k distributed object memory, object migration
%x This paper describes a programming system called Amber that permits
%x a single application program to use a homogeneous network of
%x computers in a uniform way, making the network appear to the
%x application as an integrated multiprocessor. Amber is specifically
%x designed for high performance in the case where each node in the
%x network is a shared-memory multiprocessor.
%x Amber shows that support for loosely-coupled multiprocessing can be
%x efficiently realized using an object-based programming model.
%x Amber programs execute in a uniform network-wide object space, with
%x memory coherence maintained at the object level. Careful data
%x placement and consistency control are essential for reducing
%x communication overhead in a loosely-coupled system.  Amber
%x programmers use object migration primitives to control the location
%x of data and processing.

%z InProceedings
%K Tucker89
%A Andrew Tucker
%A Anoop Gupta
%y Department of Computer Science, Stanford University
%T Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 159 166
%k multiprocessors, MP, gang scheduling performance
%x Shared-memory multiprocessors are frequently used in a time-sharing
%x style with multiple parallel applications executing at the same
%x time. In such an environment, where the machine load is
%x continuously varying, the question arises of how an application
%x should maximize its performance while being fair to other users of
%x the system. In this paper, we address this issue.  We first show
%x that if the number of runnable processes belonging to a parallel
%x application significantly exceeds the effective number of physical
%x processors executing it, its performance can be significantly
%x degraded. We then propose a way of controlling the number of
%x runnable processes associated with an application dynamically, to
%x ensure good performance. The optimal number of runnable processes
%x for each application is determined by a centralized server and
%x applications dynamically suspend or resume processes in order to
%x match that number. A preliminary implementation of the proposed
%x scheme is now running on the Encore Multimax and we show how it
%x helps improve the performance of several applications. In some
%x cases the improvement is more than a factor of two. We also discuss
%x implications of the proposed scheme for multiprocessor schedulers,
%x and how the scheme should interface with parallel programming
%x languages.

%z InProceedings
%K Barkley89
%A R. E. Barkley
%y AT\&T Bell Laboratories, Summit, NJ
%A T. Paul Lee
%y AT\&T Bell Laboratories, Holmdel, NJ
%T A lazy buddy system bounded by two coalescing delays per class
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 167 176
%k performance, dynamic memory management, DELAY-2
%x The watermark-based lazy buddy system for dynamic memory management
%x uses lazy coalescing rules controlled by watermark parameters to
%x achieve low operational costs. The correctness of the
%x watermark-based lazy buddy system is shown by defining a space of
%x legal states called the lazy space and proving that the
%x watermark-based lazy coalescing rules always keep the memory state
%x within that space. In this paper we describe a different lazy
%x coalescing policy, called the DELAY-2 algorithm, that focuses
%x directly on keeping the memory state within the lazy space. The
%x resulting implementation is simpler, and experimental data shows it
%x to be up to 12\% faster than the watermark-based buddy system and
%x about 33\% faster than the standard buddy system.  Inexpensive
%x operations make the DELAY-2 algorithm attractive as a memory
%x manager for an operating system.
%x The watermark-based lazy buddy policy offers fine control over the
%x coalescing policy of the buddy system.  However, applications such
%x as the UNIX System kernel memory manager do not need such fine
%x control. For these applications, the DELAY-2 buddy system provides
%x an efficient memory manager with low operational costs and low
%x request blocking probability. In the DELAY-2 buddy system, the
%x worst-case time for a free operation is bounded by two coalescing
%x delays per class, and when all blocks are returned to the system,
%x the system memory is coalesced back to its original state. This
%x ensures that the memory space can be completely shared.

%z InProceedings
%K Duchamp89
%A Dan Duchamp
%y Computer Science Department, Columbia University
%T Analysis of transaction management performance
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 177 190
%k performance, dbms, database, Camelot, Mach, non-blocking commit protocols
%k logging, multicasting
%x There is currently much interest in incorporating transactions into
%x both operating systems and general-purpose programming languages.
%x This paper provides a detailed examination of the design and
%x performance of the transaction manager of the Camelot system.
%x Camelot is a transaction facility that provides a rich model of
%x transactions intended to support a wide variety of general-purpose
%x applications. The transaction manager's principal function is to
%x execute the protocols that ensure atomicity.
%x The conclusions of this study are: a simple optimization to
%x two-phase commit reduces logging activity of distributed
%x transactions; non-blocking commit is practical for some
%x applications; multithreaded design improves throughput provided
%x that log batching is used; multicasting reduces the variance of
%x distributed commit protocols in a LAN environment; and the
%x performance of transaction mechanisms such as Camelot depend
%x heavily upon kernel performance.

%z InProceedings
%K Massalin89
%A Henry Massalin
%A Calton Pu
%y Department of Computer Science, Comulbia University
%T Threads and input/output in the Synthesis kernel
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 191 201
%k performance, OS kernel, dynamic code generation, code synthesis
%k fine-grain scheduling, optimistic synchronization, fast context switch
%k fast interrupt handling, music, digital signal processing, compact disc
%k Qua, Quamachine
%x The Synthesis operating system kernel combines several techniques
%x to provide high performance, including kernel code synthesis,
%x fine-grain scheduling, and optimistic synchronization. Kernel code
%x synthesis reduces the execution path for frequently used kernel
%x calls. Optimistic synchronization increases concurrency within the
%x kernel. Their combination results in significant performance
%x improvement over traditional operating system implementations.
%x Using hardware and software emulating a SUN 3/160 running SUNOS,
%x Synthesis achieves several times to several dozen times speedup for
%x UNIX kernel calls and context switch times of 21 microseconds or
%x faster.

%z InProceedings
%K Gray89
%A Cary G. Gray
%A David R. Cheriton
%y Computer Science Department, Stanford University
%T Leases: an efficient fault-tolerant mechanism for distributed file cache consistency
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 202 210
%k time-based distributed coherency, distributed file sytems, V
%k performance, lease
%x Caching introduces the overhead and complexity of ensuring
%x consistency, reducing some of its performance benefits. In a
%x distributed system, caching must deal with the additional
%x complications of communication and host failures.
%x {\em Leases} are proposed as a time-based mechanism that provides
%x efficient consistent access to cached data in distributed systems.
%x Non-Byzantine failures affect performance, not correctness, with
%x their effect minimized by short leases. An analytic model and an
%x evaluation for file access in the V system show that leases of
%x short duration provide good performance. The impact of leases on
%x performance grows more significant in systems of larger scale and
%x higher processor performance.

%z InProceedings
%K Fleisch89
%A Brett D. Fleisch
%A Gerald J. Popek
%y University of California, Los Angeles
%T Mirage: a coherent distributed shared memory design
%C Proceedings of the 12th ACM Symposium on Operating System Principles
%c Litchfield Park, AZ, 3--6 December 1989
%J Operating Systems Review
%V 23
%N 5
%D December 1989
%P 211 223
%k time-based distributed coherency, DSM, page coherence, cache consistency
%x Shared memory is an effective and efficient paradigm for
%x interprocess communication. We are concerned with software that
%x makes use of shared memory in a single site system and its
%x extension to a multimachine environment. Here we describe the
%x design of a {\em distributed shared memory} (DSM) system called Mirage
%x developed at UCLA. Mirage provides a form of network transparency
%x to make network boundaries invisible for shared memory and is
%x upward compatible with an existing interface to shared memory. We
%x present the rationale behind our design decisions and important
%x details of the implementation.  Mirage's basic performance is
%x examined by component timings, a worst case application, and a
%x ``representative'' application. In some instances of page contention,
%x the tuning parameter in our design improves application throughput.
%x In other cases, thrashing is reduced and overall system performance
%x improved using our tuning parameter.