From 0c004859f5e55004e2409113d937d2f14fe8e2e0 Mon Sep 17 00:00:00 2001
From: Reid Spencer Warning: This document is a work in progress. This document describes the requirements, design, and implementation
- details of LLVM's System Library. The library is composed of the header files
- in llvm/include/llvm/System and the source files in
- llvm/lib/System. The goal of this library is to completely shield
- LLVM from the variations in operating system interfaces. By centralizing
- LLVM's use of operating system interfaces, we make it possible for the LLVM
- tool chain and runtime libraries to be more easily ported to new platforms
- since (theoretically) only llvm/lib/System needs to be ported. This
- library also unclutters the rest of LLVM from #ifdef use and special
- cases for specific operating systems. Such uses are replaced with simple calls
- to the interfaces provided in llvm/include/llvm/System. This document provides some details on LLVM's System Library, located in
+ the source at lib/System and include/llvm/System. The
+ library's purpose is to shield LLVM from the differences between operating
+ systems for the few services LLVM needs from the operating system. Much of
+ LLVM is written using portability features of standard C++. However, in a few
+ areas, system dependent facilities are needed and the System Library is the
+ wrapper around those system calls. By centralizing LLVM's use of operating system interfaces, we make it
+ possible for the LLVM tool chain and runtime libraries to be more easily
+ ported to new platforms since (theoretically) only lib/System needs
+ to be ported. This library also unclutters the rest of LLVM from #ifdef use
+ and special cases for specific operating systems. Such uses are replaced
+ with simple calls to the interfaces provided in include/llvm/System.
+ Note that the System Library is not intended to be a complete operating
+ system wrapper (such as the Adaptive Communications Environment (ACE) or
+ Apache Portable Runtime (APR)), but only provides the functionality necessary
+ to support LLVM.
The System Library was written by Reid Spencer who formulated the
- design based on similar original work as part of the eXtensible Programming
- System (XPS). The System library's requirements are aimed at shielding LLVM from the
- variations in operating system interfaces. The following sections define the
- requirements needed to fulfill this objective. Of necessity, these requirements
- must be strictly followed in order to ensure the library's goal is reached. In order to keep LLVM portable, LLVM developers should adhere to a set of
+ portability rules associated with the System Library. Adherence to these rules
+ should help the System Library achieve its goal of shielding LLVM from the
+ variations in operating system interfaces and doing so efficiently. The
+ following sections define the rules needed to fulfill this objective. The library must shield LLVM from all system libraries. To obtain
- system level functionality, LLVM must #include "llvm/System/Thing.h"
- and nothing else. This means that Thing.h cannot expose any system
- header files. This protects LLVM from accidentally using system specific
- functionality except through the lib/System interface. Specifically this
- means that header files like "unistd.h", "windows.h", "stdio.h", and
- "string.h" are verbotten outside the implementation of lib/System.
+ Except in lib/System, no LLVM source code should directly
+ #include a system header. Care has been taken to remove all such
+ #includes from LLVM while lib/System was being
+ developed. Specifically this means that header files like "unistd.h",
+ "windows.h", "stdio.h", and "string.h" are forbidden to be included by LLVM
+ source code outside the implementation of lib/System. To obtain system-dependent functionality, existing interfaces to the system
+ found in include/llvm/System should be used. If an appropriate
+ interface is not available, it should be added to include/llvm/System
+ and implemented in lib/System for all supported platforms. The System Library must shield LLVM from all system headers. To
+ obtain system level functionality, LLVM source must
+ #include "llvm/System/Thing.h" and nothing else. This means that
+ Thing.h cannot expose any system header files. This protects LLVM
+ from accidentally using system specific functionality and only allows it
+ via the lib/System interface. The standard C headers (the ones beginning with "c") are allowed
+ to be exposed through the lib/System interface. These headers and
+ the things they declare are considered to be platform agnostic. LLVM source
+ files may include them directly or obtain their inclusion through
+ lib/System interfaces. The standard C++ headers from the standard C++ library and
+ standard template library may be exposed through the lib/System
+ interface. These headers and the things they declare are considered to be
+ platform agnostic. LLVM source files may include them or obtain their
+ inclusion through lib/System interfaces. The entry points specified in the interface of lib/System must be aimed at
+ completing some reasonably high level task needed by LLVM. We do not want to
+ simply wrap each operating system call. It would be preferable to wrap several
+ operating system calls that are always used in conjunction with one another by
+ LLVM. For example, consider what is needed to execute a program, wait for it to
+ complete, and return its result code. On Unix, this involves the following
+ operating system calls: getenv, fork, execve, and wait. The
+ correct thing for lib/System to provide is a function, say
+ ExecuteProgramAndWait, that implements the functionality completely.
+ what we don't want is wrappers for the operating system calls involved. There must not be a one-to-one relationship between operating
+ system calls and the System library's interface. Any such interface function
+ will be suspicious. There must be no functionality specified in the interface of lib/System
+ that isn't actually used by LLVM. We're not writing a general purpose
+ operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM
+ doesn't need much. This design goal aims to keep the lib/System interface
+ small and understandable which should foster its actual use and adoption. The implementation of a function for a given platform must be written
+ exactly once. This implies that it must be possible to apply a function's
+ implementation to multiple operating systems if those operating systems can
+ share the same implementation. This rule applies to the set of operating
+ systems supported for a given class of operating system (e.g. Unix, Win32).
The standard C headers (the ones beginning with "c") are allowed
- to be exposed through the lib/System interface. These headers and the things
- they declare are considered to be platform agnostic. LLVM source files may
- include them or obtain their inclusion through lib/System interfaces. The standard C++ headers from the standard C++ library and
- standard template library are allowed to be exposed through the lib/System
- interface. These headers and the things they declare are considered to be
- platform agnostic. LLVM source files may include them or obtain their
- inclusion through lib/System interfaces. The System Library interfaces can be called quite frequently by LLVM. In
+ order to make those calls as efficient as possible, we discourage the use of
+ virtual methods. There is no need to use inheritance for implementation
+ differences, it just adds complexity. The #include mechanism works
+ just fine.
@@ -52,68 +39,140 @@
-
For example, the stat system call is notorious for having - variations in the data it provides. lib/System must not declare stat - nor allow it to be declared. Instead it should provide its own interface to - discovering information about files and directories. Those interfaces may be - implemented in terms of stat but that is strictly an implementation - detail.
+ variations in the data it provides. lib/System must not declare + stat nor allow it to be declared. Instead it should provide its own + interface to discovering information about files and directories. Those + interfaces may be implemented in terms of stat but that is strictly + an implementation detail. The interface provided by the System Library must + be implemented on all platforms (even those without stat). @@ -140,6 +200,45 @@ of data that might not exist on all platforms. + + +Operating system interfaces will generally provide error results for every + little thing that could go wrong. In almost all cases, you can divide these + error results into two groups: normal/good/soft and abnormal/bad/hard. That + is, some of the errors are simply information like "file not found", + "insufficient privileges", etc. while other errors are much harder like + "out of space", "bad disk sector", or "system call interrupted". We'll call + the first group "soft" errors and the second group "hard" + errors.
+
lib/System must always attempt to minimize soft errors and always just + throw a std::string on hard errors. This is a design requirement because the + minimization of soft errors can affect the granularity and the nature of the + interface. In general, if you find that you're wanting to throw soft errors, + you must review the granularity of the interface because it is likely you're + trying to implement something that is too low level. The rule of thumb is to + provide interface functions that can't fail, except when faced with + hard errors.
+For a trivial example, suppose we wanted to add an "OpenFileForWriting" + function. For many operating systems, if the file doesn't exist, attempting + to open the file will produce an error. However, lib/System should not + simply throw that error if it occurs because its a soft error. The problem + is that the interface function, OpenFileForWriting is too low level. It should + be OpenOrCreateFileForWriting. In the case of the soft "doesn't exist" error, + this function would just create it and then open it for writing.
+This design principle needs to be maintained in lib/System because it + avoids the propagation of soft error handling throughout the rest of LLVM. + Hard errors will generally just cause a termination for an LLVM tool so don't + be bashful about throwing them.
+Rules of thumb:
+The implementation of a function for a given platform must be written - exactly once. This implies that it must be possible to apply a function's - implementation to multiple operating systems if those operating systems can - share the same implementation.
-In order to fulfill the requirements of the system library, strict design - objectives must be maintained in the library as it evolves. The goal here - is to provide interfaces to operating system concepts (files, memory maps, - sockets, signals, locking, etc) efficiently and in such a way that the - remainder of LLVM is completely operating system agnostic.
+Implementations of the System Library interface are separated by their + general class of operating system. Currently only Unix and Win32 classes are + defined but more could be added for other operating system classifications. + To distinguish which implementation to compile, the code in lib/System uses + the LLVM_ON_UNIX and LLVM_ON_WIN32 #defines provided via configure through the + llvm/Config/config.h file. Each source file in lib/System, after implementing + the generic (operating system independent) functionality needs to include the + correct implementation using a set of #if defined(LLVM_ON_XYZ) + directives. For example, if we had lib/System/File.cpp, we'd expect to see in + that file:
++ #if defined(LLVM_ON_UNIX) + #include "Unix/File.cpp" + #endif + #if defined(LLVM_ON_WIN32) + #include "Win32/File.cpp" + #endif ++
The implementation in lib/System/Unix/File.cpp should handle all Unix + variants. The implementation in lib/System/Win32/File.cpp should handle all + Win32 variants. What this does is quickly differentiate the basic class of + operating system that will provide the implementation. The specific details + for a given platform must still be determined through the use of + #ifdef.
There must be no functionality specified in the interface of lib/System - that isn't actually used by LLVM. We're not writing a general purpose - operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM - doesn't need much. This design goal aims to keep the lib/System interface - small and understandable which should foster its actual use and adoption.
-The entry points specified in the interface of lib/System must be aimed at - completing some reasonably high level task needed by LLVM. We do not want to - simply wrap each operating system call. It would be preferable to wrap several - operating system calls that are always used in conjunction with one another by - LLVM.
-For example, consider what is needed to execute a program, wait for it to - complete, and return its result code. On Unix, this involves the following - operating system calls: getenv, fork, execve, and wait. The - correct thing for lib/System to provide is a function, say - ExecuteProgramAndWait, that implements the functionality completely. - what we don't want is wrappers for the operating system calls involved.
-There must not be a one-to-one relationship between operating - system calls and the System library's interface. Any such interface function - will be suspicious.
-Operating system interfaces will generally provide errors results for every - little thing that could go wrong. In almost all cases, you can divide these - error results into two groups: normal/good/soft and abnormal/bad/hard. That - is, some of the errors are simply information like "file not found", - "insufficient privileges", etc. while other errors are much harder like - "out of space", "bad disk sector", or "system call interrupted". Well call the - first group "soft" errors and the second group "hard" errors.
-
lib/System must always attempt to minimize soft errors and always just - throw a std::string on hard errors. This is a design requirement because the - minimization of soft errors can affect the granularity and the nature of the - interface. In general, if you find that you're wanting to throw soft errors, - you must review the granularity of the interface because it is likely you're - trying to implement something that is too low level. The rule of thumb is to - provide interface functions that "can't" fail, except when faced with hard - errors.
-For a trivial example, suppose we wanted to add an "OpenFileForWriting" - function. For many operating systems, if the file doesn't exist, attempting - to open the file will produce an error. However, lib/System should not - simply throw that error if it occurs because its a soft error. The problem - is that the interface function, OpenFileForWriting is too low level. It should - be OpenOrCreateFileForWriting. In the case of the soft "doesn't exist" error, - this function would just create it and then open it for writing.
-This design principle needs to be maintained in lib/System because it - avoids the propagation of soft error handling throughout the rest of LLVM. - Hard errors will generally just cause a termination for an LLVM tool so don't - be bashful about throwing them.
-Rules of thumb:
--Notes: -10. The implementation of a lib/System interface can vary drastically between - platforms. That's okay as long as the end result of the interface function is - the same. For example, a function to create a directory is pretty straight - forward on all operating system. System V IPC on the other hand isn't even - supported on all platforms. Instead of "supporting" System V IPC, lib/System - should provide an interface to the basic concept of inter-process - communications. The implementations might use System V IPC if that was - available or named pipes, or whatever gets the job done effectively for a - given operating system. - -11. Implementations are separated first by the general class of operating system - as provided by the configure script's $build variable. This variable is used - to create a link from $BUILD_OBJ_ROOT/lib/System/platform to a directory in - $BUILD_SRC_ROOT/lib/System directory with the same name as the $build - variable. This provides a retargetable include mechanism. By using the link's - name (platform) we can actually include the operating specific - implementation. For example, support $build is "Darwin" for MacOS X. If we - place: - #include "platform/File.cpp" - into a a file in lib/System, it will actually include - lib/System/Darwin/File.cpp. What this does is quickly differentiate the basic - class of operating system that will provide the implementation. - -12. Implementation files in lib/System need may only do two things: (1) define - functions and data that is *TRULY* generic (completely platform agnostic) and - (2) #include the platform specific implementation with: - - #include "platform/Impl.cpp" - - where Impl is the name of the implementation files. - -13. Platform specific implementation files (platform/Impl.cpp) may only #include - other Impl.cpp files found in directories under lib/System. The order of - inclusion is very important (from most generic to most specific) so that we - don't inadvertently place an implementation in the wrong place. For example, - consider a fictitious implementation file named DoIt.cpp. Here's how the - #includes should work for a Linux platform - - lib/System/DoIt.cpp - #include "platform/DoIt.cpp" // platform specific impl. of Doit - DoIt - - lib/System/Linux/DoIt.cpp // impl that works on all Linux - #include "../Unix/DoIt.cpp" // generic Unix impl. of DoIt - #include "../Unix/SUS/DoIt.cpp // SUS specific impl. of DoIt - #include "../Unix/SUS/v3/DoIt.cpp // SUSv3 specific impl. of DoIt - - Note that the #includes in lib/System/Linux/DoIt.cpp are all optional but - should be used where the implementation of some functionality can be shared - across some set of Unix variants. We don't want to duplicate code across - variants if their implementation could be shared. --
no public data
-onlyprimitive typed private/protected data
-data size is "right" for platform, not max of all platforms
-each class corresponds to O/S concept
-To be written.
-To be written.
-To be written.
-To be written.
-To be written.
+The implementation of a lib/System interface can vary drastically between + platforms. That's okay as long as the end result of the interface function + is the same. For example, a function to create a directory is pretty straight + forward on all operating system. System V IPC on the other hand isn't even + supported on all platforms. Instead of "supporting" System V IPC, lib/System + should provide an interface to the basic concept of inter-process + communications. The implementations might use System V IPC if that was + available or named pipes, or whatever gets the job done effectively for a + given operating system. In all cases, the interface and the implementation + must be semantically consistent.
In order to provide different implementations of the lib/System interface - for different platforms, it is necessary for the library to "sense" which - operating system is being compiled for and conditionally compile only the - applicable parts of the library. While several operating system wrapper - libraries (e.g. APR, ACE) choose to use #ifdef preprocessor statements in - combination with autoconf variable (HAVE_* family), lib/System chooses an - alternate strategy.
-
To put it succinctly, the lib/System strategy has traded "#ifdef hell" for - "#include hell". That is, a given implementation file defines one or more - functions for a particular operating system variant. The functions defined in - that file have no #ifdef's to disambiguate the platform since the file is only - compiled on one kind of platform. While this leads to the same function being - implemented differently in different files, it is our contention that this - leads to better maintenance and easier portability.
-For example, consider a function having different implementations on a - variety of platforms. Many wrapper libraries choose to deal with the different - implementations by using #ifdef, like this:
-- void SomeFunction(void) { - #if defined __LINUX - // .. Linux implementation - #elif defined __WIN32 - // .. Win32 implementation - #elif defined __SunOS - // .. SunOS implementation - #else - #warning "Don't know how to implement SomeFunction on this platform" - #endif - } --
The problem with this is that its very messy to read, especially as the - number of operating systems and their variants grow. The above example is - actually tame compared to what can happen when the implementation depends on - specific flavors and versions of the operating system. In that case you end up - with multiple levels of nested #if statements. This is what we mean by "#ifdef - hell".
-To avoid the situation above, we've chosen to locate all functions for a - given implementation file for a specific operating system into one place. This - has the following advantages:
-
So, given that we have decided to use #include instead of #if to provide - platform specific implementations, there are actually three ways we can go - about doing this. None of them are perfect, but we believe we've chosen the - lesser of the three evils. Given that there is a variable named $OS which - names the platform for which we must build, here's a summary of the three - approaches we could use to determine the correct directory:
-Let's look at the pitfalls of each approach.
-In approach #1, we end up with some confusion as to what gets included. - Suppose we have lib/System/File.cpp that includes just File.cpp to get the - platform specific part of the implementation. In this case, the include - directive with the <> syntax will include the right file but the include - directive with the "" syntax will recursively include the same file, - lib/System/File.cpp. In the case of #include <File.cpp>, the -I options - to the compiler are searched first so it works. But in the #include "File.cpp" - case, the current directory is searched first. Furthermore, in both cases, - neither include directive documents which File.cpp is getting included.
-In approach #2, we have the problem of needing to reconfigure repeatedly. - Developer's generally hate that and we don't want lib/System to be a thorn in - everyone's side because it will constantly need updating as operating systems - change and as new operating systems are added. The problem occurs when a new - implementation file is added to the library. First of all, you have to add a - file with the .in suffix, then you have to add that file name to the list of - configurable files in the autoconf/configure.ac file, then you have to run - AutoRegen.sh to rebuild the configure script, then you have to run the - configure script. This is deemed to be a pretty large hassle.
-In approach #3, we have the problem that not all platforms support links. - Fortunately the autoconf macro used to create the link can compensate for - this. If a link can't be made, the configure script will copy the correct - directory from $BUILD_SRC_DIR to $BUILD_OBJ_DIR under the new name. The only - problem with this is that if a copy is made, the copy doesn't get updated if - the programmer adds or modifies files in the $BUILD_SRC_DIR. A reconfigure or - manual copying is needed to get things to compile.
-
The approach we have taken in lib/System is #3. Here's why:
-
The linux implementation of the system library will always be the - reference implementation. This means that (a) the concepts defined by the - linux must be identically replicated in the other implementations and (b) the - linux implementation must always be complete (provide implementations for all - concepts).
-