Linux STREAMS (LiS) |
||
|
LiS Driver/Kernel Interface (DKI) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contents
Introduction In the Linux kernel, much of the interface between drivers and other kernel modules and the core kernel services, such as memory allocation and synchronization primitives, is implemented in macros and inline functions declared in kernel header files. This technique was used (probably) out of considerations of efficiency (defined as execution speed) and a consideration that there were no version problems with such constructs because one could always recompile one's drivers in the context of the new kernel. The only "kernel primitives" compatibility that has been attempted from one kernel release to the next is source code compatibility. The real world of paying customers is quite different. And, as it happens, the world of paying customers seems to impinge upon LiS considerably. In this world, the customers do not want to rebuild the kernel. They don't want to build the kernel at all. They want to install a distribution with a binary kernel that was configured only at install time. They then want to install add-on binary packages, and they expect these packages to operate correctly with their kernel. When these add-on packages consist of STREAMS based protocol drivers, LiS is usually the only piece of code that is recompiled from source upon installation into the customer's environment. The STREAMS drivers themselves are typically distributed in binary and linked in with LiS. The resulting module is then typically loaded using "modprobe" or some equivalent command. In these circumstances it is highly desirable for LiS to "buffer" the interface between the STREAMS drivers and the kernel environment. This allows the STREAMS driver writers to deliver smaller binary packages to their customers and minimizes the number of different versions of those packages that must be maintained by the STREAMS driver writers. Ideally, LiS would be able to present a uniform DKI that would support one version of a user's STREAMS driver across all versions of the Linux kernel. This ultimate goal is probably not achievable, but it is possible to insulate STREAMS drivers from the Linux kernel to a considerable extent. This is possible in part due to the implied DKI of a STREAMS driver. A STREAMS driver most likely will confine itself to the SVR4 types of DKI calls which have syntax and semantics that do not change over time. The main challenges come from the use of constructs, such as PCI configuration and interrupt service routines, that go outside the SVR4 DKI and must use services of the Linux kernel more-or-less directly. In general, LiS attempts to replace inline functions and macros with actual subroutine calls to perform kernel operations. This allows the STREAMS driver to be compiled once with references to these routines, with the routines themselves being compiled in the context of the specific kernel version at package installation time. Thus, the STREAMS drivers do not have to be sensitive to differences in kernel versions.
Operating System Interface Routines
In order to use this interface, you include the header files that you would normally include to use the kernel functions, and then include <sys/osif.h> after all of the kernel include files. This allows for the redefinition of the names. The kernel functions provided via <sys/osif.h> are as follows, grouped by type of function.
PCI BIOS Interface
Back to Contents#if LINUX_VERSION_CODE < 0x020100 /* 2.0 kernel */ unsigned long pcibios_init(unsigned long memory_start, unsigned long memory_end) ; #else /* 2.1 or 2.2 kernel */ void pcibios_init(void) ; #endif int pcibios_find_class(unsigned int class_code, unsigned short index, unsigned char *bus, unsigned char *dev_fn) ; int pcibios_find_device(unsigned short vendor, unsigned short dev_id, unsigned short index, unsigned char *bus, unsigned char *dev_fn) ; int pcibios_read_config_byte(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned char *val) ; int pcibios_read_config_word(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned short *val) ; int pcibios_read_config_dword(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned int *val) ; int pcibios_write_config_byte(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned char val) ; int pcibios_write_config_word(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned short val) ; int pcibios_write_config_dword(unsigned char bus, unsigned char dev_fn, unsigned char where, unsigned int val) ; const char *pcibios_strerror(int error) ;
PCI InterfaceThese routines constitute the PCI interface as implemented in the 2.2
series of kernels. Please note that these are filtered calls to
the operating system and still depend directly upon the kernel structure
"struct pci_dev".
LiS provides a more abstract interface to PCI that does not depend upon
the direct definition kernel structures. The LiS
PCI interface is to be preferred since it provides more insulation
against changes in the kernel. Back to Contentsstruct pci_dev *pci_find_device(unsigned int vendor, unsigned int device, struct pci_dev *from); struct pci_dev *pci_find_class(unsigned int class, struct pci_dev *from); struct pci_dev *pci_find_slot(unsigned int bus, unsigned int devfn); int pci_read_config_byte(struct pci_dev *dev, u8 where, u8 *val); int pci_read_config_word(struct pci_dev *dev, u8 where, u16 *val); int pci_read_config_dword(struct pci_dev *dev, u8 where, u32 *val); int pci_write_config_byte(struct pci_dev *dev, u8 where, u8 val); int pci_write_config_word(struct pci_dev *dev, u8 where, u16 val); int pci_write_config_dword(struct pci_dev *dev, u8 where, u32 val); void pci_set_master(struct pci_dev *dev);
IRQ InterfaceThese are the routines that are used to attach and detach interrupt service routines to hardware interrupts. Back to Contentsint request_irq(unsigned int irq, void (*handler)(int, void *, void *), unsigned long flags, const char *device, void *dev_id) ; void free_irq(unsigned int irq, void *dev_id) ; void disable_irq(unsigned int irq) ; oid enable_irq(unsigned int irq) ;
I/O Memory MappingThese are the routines that are typically used to map PCI bus or physical addresses to CPU virtual addresses. LiS includes some backward compatibility here to older kernel versions. Back to Contentsvoid *ioremap_nocache(unsigned long offset, unsigned long size) ; void iounmap(void *addr) ; void *vremap(unsigned long offset, unsigned long size) ; unsigned long virt_to_phys(volatile void *addr) ; void *phys_to_virt(unsigned long addr) ;
I/O Port AccessThese are the routines that allow a driver to register I/O ports. Back to Contentsint check_region(unsigned int from, unsigned int extent) ; void request_region(unsigned int from, unsigned int extent, const char *name) ; void release_region(unsigned int from, unsigned int extent) ;
Memory Allocation
Back to Contentsvoid *kmalloc(size_t nbytes, int type) ; void kfree(const void *ptr) ; void *vmalloc(unsigned long size); void vfree(void *ptr) ;
DMA Routines
Back to Contentsint request_dma(unsigned int dma_nr, const char *device_id) ; void free_dma(unsigned int dma_nr) ;
Delay RoutinesThis is the routine that simply spins the CPU for a given number of microseconds. LiS also redefines the symbol "jiffies" to a subroutine call to help insulate STREAMS drivers from changes in the way the kernel keeps track of time. Remember, the redefinition is accomplished using C language defines, so the following declarations describe the effective usage of these symbols, not their literal definition. Back to Contentsvoid udelay(long micro_secs) ; unsigned long jiffies ;
Printing Routines
int printk(const char *fmt, ...) ; int sprintf(char *bfr, const char *fmt, ...) ; int vsprintf(char *bfr, const char *fmt, va_list args) ;
Timer Routines
void add_timer(struct timer_list * timer); int del_timer(struct timer_list * timer); The following routine converts time in micro seconds to system "ticks". The "ticks" value is suitable for use with the timeout routine. Note that if the micro_sec parameter is less than the number of micro seconds in a system tick then the routine returns zero. unsigned lis_usectohz(unsigned micro_sec); The following routine is an LiS abstraction of the C library routine gettimeofday. Note the absence of the time zone parameter. void lis_gettimeofday(struct timeval *tv); The following two kernel routines are called via the LiS osif.c code. void do_gettimeofday( struct timeval *tp ) ; void do_settimeofday( struct timeval *tp ) ;
Sleep and Wakeup Routines
void sleep_on(OSIF_WAIT_Q_ARG) ; void interruptible_sleep_on(OSIF_WAIT_Q_ARG) ; void wake_up(OSIF_WAIT_Q_ARG) ; void wake_up_interruptible(OSIF_WAIT_Q_ARG) ; Thread CreationA STREAMS driver in LiS can create kernel threads if it so chooses. The following routine simplifies this task. It consolidates all of the kernel manipulations involved with the creation of a kernel thread into one place, thus removing references to these kernel functions from STREAMS driver code. Prototypepid_t lis_thread_start(int (*fcn)(void *), void *arg, const char *name) ; int lis_thread_stop(pid_t pid) ; Arguments
Operationlis_thread_start creates a new thread, performs some operations prior to entering the fcn, and then calls fcn which acts as the "main" routine for the thread. The arg parameter is passed to fcn. Before fcn is entered, the newly created thread will have shed all user space files and mapped memory. Thus, it is a kernel-only thread. All signals are still enabled. Note that when the kernel goes down for reboot all processes are first sent a SIGTERM. Once those have been processed, all processes are then sent a SIGKILL. It is the implementor's choice which of these it pays attention to in order to exit prior to a reboot. The fcn is entered with the "big kernel lock" NOT held, just as it would be for calling the "kernel_thread" function directly. On 2.2 kernels, the fcn should get this lock so that it can utilize kernel services safely. The user's fcn returns a value when it exits and that value is returned to the kernel. It is not clear that anything actually pays any attention to this returned value. It particular, it is not visible to the thread that started the new thread. lis_thread_start itself returns the process id of the new thread, or a negative error number. This value can be used to kill the thread. lis_thread_stop kills a thread started by lis_thread_start. It returns 0 for success or a negative error number for failure. Major/Minor Device Numbering (dev_t)In STREAMS the dev_t structure is used to combine a major device number and a minor device number into a single integer length quantity. The Linux kernel restricts these numbers to the range 0 to 255 (8-bit values). LiS provides a typedef for dev_t that results in an unsigned integer quantity. The Linux kernel, on the other hand, defines a structure of type kdev_t that it uses internally for this same purpose. The kdev_t type is an actual structure, and not an integer. Furthermore, the kernel defines a routine named makedevice that generates one of these structures, given the major and minor numbers as integers. It appears that eventually the kernel will use this structure exclusively and will expand the numbering space for both major and minor device numbers. This has caused LiS to use non-standard nomenclature for handling its dev_t structures, since the routine makedevice is not compatible with the LiS dev_t structure. LiS now provides the following operations on dev_t structures. Most of these functions are provided by macros, so the following are "virtual" prototypes.
The sample drivers that come with LiS now use these constructs to manipulate device structures and can serve as examples for their usage.
LiS Memory AllocationLiS provides for several different styles of memory allocation, all of them insulated from the Linux kernel. These routines allow your driver to allocate memory in several different ways while still maintaining compatibility with different versions of the Linux kernel, with no driver recompilation required. To use the LiS memory allocation routines include the file <sys/lismem.h>
in your STREAMS driver source code. LiS "malloc" and "free" EquivalentsThe first group of memory allocation routines are the routines that play the role of "malloc" and "free." These routines keep a master linked list of all allocated memory areas. This list can be printed out via an ioctl to LiS. Each allocated area is tagged with the file name and line number of the code that caused it to be allocated. Each area contains a guard word at the front and back to enable the allocator to detect "off by one" accesses outside the allocated area. LiS uses this allocator internally for allocating queues, messages and other internal data structures. This would be the allocator of choice for STREAMS drivers to use to allocate instance structures. Memory allocated in this manner is ultimately allocated by the kernel routine "kmalloc". As such, it is not guaranteed to be DMA-able (in the old style), or to occupy physically contiguous memory locations. See below for routines that can be used to allocate these types of memory areas. The routines are as follows: void *ALLOC(int nbytes) ; void *ALLOCF(int nbytes, char *tag) ; void FREE(void *ptr) ; The ALLOC and FREE routines are analogous to "malloc" and "free". The ALLOCF routine includes a character string which is prepended to the file name stored as the location from which the allocation occurred. It can serve as a tag for the type of memory being allocated. Usage examples: Back to Contentsptr = ALLOC(456) ; FREE(ptr) ; ptr = ALLOCF(578, "Instance: ") ; FREE(ptr) ;
LiS Kernel Memory AllocatorsThese routines use the LiS malloc/free internal routines to allow for more flexibility in the options used when calling the kernel allocator. These routines all lead to a call on "kmalloc" with appropriate options. It is worth noting that the numerical value of the constants used in calling the kernel's "kmalloc" routine changed between the 2.2 and 2.4 versions of the kernel. Thus, drivers which called the kernel's "kmalloc" directly have to be recompiled to run in a 2.4 kernel. STREAMS drivers using the memory allocation interface defined here could run without modification and without a recompilation on both kernels, assuming that the drivers otherwise did not use any direct kernel functions. void *lis_alloc_atomic(int nbytes) ; void *lis_alloc_kernel(int nbytes) ; void *lis_alloc_dma(int nbytes) ; void *lis_free_mem(void *mem_area) ; These routines pass the allocation options GFP_ATOMIC, GFP_KERNEL, and GFP_DMA, respectively, to "kmalloc" when allocating the memory. LiS takes care of passing the proper values to the kernel routine so that driver code can remain portable. The routine lis_free_mem returns a NULL pointer for the convenience of the caller. The kernel's kmalloc is restricted as to the number of bytes that it will allocate. The LiS routines do not have this restriction. If the number of requested bytes is larger than 16K the LiS allocation routines will call the page allocator to allocate the memory. The lis_free_mem routine knows whether to free pages or to use the kernel's kfree routine. Usage Examples: ptr = lis_alloc_kernel(sizeof(structure)) ; ptr = lis_free_mem(ptr) ; /* returns NULL pointer */
LiS Page AllocatorThese routines allow a STREAMS driver to allocate memory directly from the kernel's page allocator. Memory allocated in this manner occupies physically contiguous locations and is suitable for use with bus master DMA PCI devices. Unlike the kernel's page allocator, the size that is specified when calling the LiS page allocator is in bytes, not "order", or other encoding of page size. LiS calculates the number of pages based upon the requested size. Also, LiS does not require you to pass the size of the area when freeing the page. The routines are as follows: void *lis_get_free_pages(int nbytes) ; void *lis_free_pages(void *ptr) ; The lis_free_pages routine returns a NULL pointer for the convenience of the caller. Usage Examples: ptr = lis_get_free_pages(1024*kbytes) ; ptr = lis_free_pages(ptr) ;
LiS PCI InterfaceIn order to assist in the portability of STREAMS drivers across different versions of the Linux kernel, LiS provides an abstraction of the PCI configuration interface. It defines a data structure that is used to describe a PCI device and a set of routines that perform operations on PCI configuration space. Using these abstractions, a STREAMS driver can be portable from the 2.2 kernel to the 2.4 kernel with no recompilation required. The LiS structures completely hide the kernel data structures and PCI configuration space operations from the STREAMS driver. To use this interface include the file <sys/lispci.h> in your STREAMS driver source code.
The LiS PCI Device StructureThis structure is distinct from a similar structure which is defined by the Linux kernel, but which differs significantly between the 2.2 and 2.4 kernels. The LiS version of this structure is oriented towards providing just enough information to allow a driver to operate the PCI device, without being concerned about the details of PCI bus topology. This structure is used to return information to the STREAMS driver concerning devices that meet certain criteria, such as device class or manufacturer devide identification. #define LIS_PCI_MEM_CNT 12 /* # mem addrs */ typedef struct lis_pci_dev { unsigned bus ; /* bus number */ unsigned dev_fcn ; /* device/function code */ unsigned vendor ; /* vendor id */ unsigned device ; /* device id */ unsigned class ; /* class type */ unsigned hdr_type ; /* PCI header type */ unsigned irq ; /* IRQ number */ unsigned long mem_addrs[LIS_PCI_MEM_CNT] ; void *user_ptr ; /* private for user */} lis_pci_dev_t ; The bus field contains the bus number on which the device is located. LiS obtains this information from the kernel. The dev_fcn field contains an encoding of the device number on the bus and the function number within the device that this particular structure pertains to. The pair bus and dev_fcn uniquely identifies a device in the PCI subsystem. Devices can be searched for on the PCI bus by bus number and dev_fcn value (see below). Given a dev_fcn value, a pair of macros will extract the "device" portion and the "function number" portion from it.
Given a device number and a function number, the following macro will synthesize a dev_fcn value suitable for use in searching the bus.
The class field contains the class code associated with the device. Devices can be searched for on the PCI bus by class code (see below). The hdr_type field gives the type information for the PCI configuration space header. The irq field gives the IRQ number that is assigned to this device. This is the number that is used to attach an interrupt service routine to the device. The mem_addrs field contains a list of addresses associated with the device. These are raw PCI bus addresses and are not mapped into the address space of the processor. Empty slots contain the value zero.
LiS PCI Search RoutinesThese routines allow the STREAMS driver to find devices on the PCI bus and obtain a pointer to the lis_pci_dev_t structure for the device.
lis_pci_dev_t *lis_pci_find_device(unsigned vendor, unsigned device, lis_pci_dev_t *previous_struct) ; Find the device by vendor identification and vendor device identification. By passing in the pointer to the previous structure returned it is possible to find all devices of a given type. The routine returns NULL if there are no (more) devices for the given vender and device identifiers. Usage example:
lis_pci_dev_t *pcip = NULL ;while ((pcip = lis_pci_find_device(0x109e, 0x8474, pcip)) != NULL) { pcip points to a unique device from this vendor } lis_pci_dev_t *lis_pci_find_class(unsigned class, lis_pci_dev_t *previous_struct) ; Find the device by class. The usage is similar to lis_pci_find_device in that you can use a pointer to loop through all devices of a given class. The function returns NULL if there are no (more) devices of the given class.
lis_pci_dev_t *lis_pci_find_slot(unsigned bus, unsigned dev_fcn) ; Find the device by slot number. If you know the bus number (zero for most simple Intel PC systems) and the dev_fcn, you can obtain the PCI configuration information for that particular "slot". Use the LIS_MK_DEV_FCN macro to synthesize the dev_fcn value from the "device" (slot) number and the function number. The function returns NULL if there is no device in that slot. Note that this routine only returns one structure since it is not meaningful
to process a list of devices for the same slot.
LiS PCI Configuration Space RoutinesThe following routines are used to read and write PCI configuration space for a particular device. Configuration space can be accessed by byte, word (16 bit) or dword (32 bit). Each routine takes a pointer to an lis_pci_dev_t structure as an argument. It also takes an index value which is the byte offset from the base of the configuration space for the device at which the given byte/word/dword is to be read or written. Care should be exercised when writing to configuration space since many of these values are determined by the PCI BIOS at system boot time. The lis_pci_set_master routine sets the "bus master DMA" bit for the given device. This is used for devices that perform bus master DMA. The routines are as follows: int lis_pci_read_config_byte(lis_pci_dev_t *dev, unsigned index, unsigned char *rtn_val);int lis_pci_read_config_word(lis_pci_dev_t *dev, unsigned index, unsigned short *rtn_val);int lis_pci_read_config_dword(lis_pci_dev_t *dev, unsigned index, unsigned long *rtn_val);int lis_pci_write_config_byte(lis_pci_dev_t *dev, unsigned index, unsigned char val);int lis_pci_write_config_word(lis_pci_dev_t *dev, unsigned index, unsigned short val);int lis_pci_write_config_dword(lis_pci_dev_t *dev, unsigned index, unsigned long val);void lis_pci_set_master(lis_pci_dev_t *dev); LiS PCI DMA RoutinesThese routines are used to allocate memory suitable for use with PCI bus master DMA devices or to map page-allocated memory for those purposes. In order to understand what these routines do, please refer to the file /usr/src/linux/Documentation/DMA-mapping.txt in a fairly recent 2.4 kernel source tree. The kernel provides more functionality than is provided in LiS, so there are more routines documented there than are found in this interface. You can use these routines in 2.2 kernels but the functions perfomed are simply approximations of the 2.4 semantics and may not work in all cases. Note that the LiS routines have simplified the kernel interface involving "DMA handles" in such a way as to make these constructs easier to use and less error prone. The following routines are used to allocate memory which the hardware keeps consistent between CPU access and DMA access. void *lis_pci_alloc_consistent(lis_pci_dev_t *dev, size_t size, lis_dma_addr_t *dma_handle); void *lis_pci_free_consistent(lis_dma_addr_t *dma_handle); The following routines are used to obtain a DMA address from a returned
DMA handle. You need to know whether or not your hardware environment
is using 32-bit or 64-bit DMA addresses.
u32 lis_pci_dma_handle_to_32(lis_dma_addr_t *dma_handle); u64 lis_pci_dma_handle_to_64(lis_dma_addr_t *dma_handle); The following routines are usd to map page-allocated memory for DMA purposes. The direction indicator of LIS_SYNC_FOR_CPU means that you intend to use the memory for DMA transfers into memory. The direction indicator of LIS_SYNC_FOR_DMA means that you intend to use the memory for DMA transfers out of memory. If the DMA operation goes both ways then use LIS_SYNC_FOR_BOTH. void lis_pci_map_single(lis_pci_dev_t *dev, void *ptr, size_t size, lis_dma_addr_t *dma_handle, int direction); void *lis_pci_unmap_single(lis_dma_addr_t *dma_handle); The direction indicators are as follows: LIS_SYNC_FOR_CPU LIS_SYNC_FOR_DMA LIS_SYNC_FOR_BOTH With mapped memory, i.e., non-consistent memory, you need to synchronize the memory whenever the CPU writes into it and the DMA needs to read it, or when the DMA has written into it and the CPU needs to read it. The following routine is used for that purpose. void lis_pci_dma_sync_single(lis_dma_addr_t *dma_handle, size_t size, int direction); The following routines can be used at driver initialization time to discover and control the addressing boundary restrictions of a device. int lis_pci_dma_supported(lis_pci_dev_t *dev, u64 mask); int lis_pci_set_dma_mask(lis_pci_dev_t *dev, u64 mask);
LiS Atomic FunctionsLiS provides for atomic integers implemented in a portable fashion. To declare an LiS portable atomic integer use the following declaration syntax: lis_atomic_t myatom ; LiS then provides the following operations on variables of this type. void lis_atomic_set(lis_atomic_t *atomic_addr, int valu) ; int lis_atomic_read(lis_atomic_t *atomic_addr) ; void lis_atomic_add(lis_atomic_t *atomic_addr, int amt) ; void lis_atomic_sub(lis_atomic_t *atomic_addr, int amt) ; void lis_atomic_inc(lis_atomic_t *atomic_addr) ; void lis_atomic_dec(lis_atomic_t *atomic_addr) ; int lis_atomic_dec_and_test(lis_atomic_t *atomic_addr) ; Of these, only lis_atomic_dec_and_test needs any explanation. This routine performs an atomic_dec on the variable and returns true if the counter reached zero via that decrement operation. Note that by the time the routine returns some other CPU with access to the same variable may have changed its value. So the return reports only on the instantaneous value of the variable.
LiS LocksLiS provides an abstraction and an insulated interface to the Linux kernel for spin locks, interrupt disabling and semaphores. If you use this interface in your STREAMS driver you can utilize these kernel services on different versions of the Linux kernel without the necessity of recompiling your driver for each version of the kernel. The LiS locks are especially useful in consideration of Linux kernels compiled with and without the SMP option set. The spin locks and semaphores of the Linux kernel are implemented using external inline functions. These functions are coded in assembly language and generate different sequences of instructions depending upon the compile time setting of the SMP option. Spin locks and semaphores compiled with SMP reset will not function properly on a multi-CPU system running an SMP kernel. The LiS locks mechanism solves this problem by abstracting the locking primitives into actual subroutines, not inlines, defined within LiS. Since LiS is compiled from source code when it is installed the subroutines in LiS have the correct setting of SMP for the locking primitives. This allows the STREAMS driver code to be compiled once and the object code reused for multiple installations with varying options. The following sections document the spin locks, interrupt disabling and semaphore mechanisms offered by LiS. To use these mechanisms include the file <sys/lislocks.h> in your STREAMS driver source code. In choosing the appropriate type of lock to use, one must bear in mind that STREAMS drivers are not allowed to "sleep" in "put" and "service" procedures, only in "open" and "close" routines. That means that spin locks are the mutual exclusion mechanism of choice for "put" and "service" procedures. It is reasonable to use sleeping semaphores in "open" and "close" routines. The simple interrupt exclusion mechanism can be used to exclude only interrupt routine execution for a section of code. However, this mechanism does not exclude other "put" or "service" procedures that may be executed on other CPUs. This may not be much of a consideration since LiS acquires a lock in the queue structure before executing the "put" or "service" procedure pointed to by that queue. However, it could happen that the "read put/service" and "write put/service" procedures get executed simultaneously since there are two different locks in the STREAMS queues, one in the read queue and one in the write queue. In this case, the STREAMS driver code would need to use spin locks to protect data structures shared between the read and write "put" or "service" procedures.
LiS Spin LocksLiS provides an implementation of spin locks that utilizes the Linux kernel's spin lock mechanism to perform the actual locking functions. The LiS implementation adds features to the kernel spin locks such as the following:
For these reasons I highly recommend that STREAMS drivers use the LiS spin lock implementation in place of the direct kernel spin locks. The portability aspect of LiS spin locks cannot be overemphasized. Different Linux kernel compile-time options can lead to a proliferation of STREAMS driver code versions, or the necessity of always compiling the driver from source when it is installed. LiS spin locks allow a STREAMS driver to be compiled independently of kernel options with only the binary needed at driver installation time. To declare a spin lock, use the typedef lis_spin_lock_t, as in the following: lis_spin_lock_t mylock ; LiS spin locks must be initialized before they are used. There is one initialization routine no matter which style of locking you intend to use. void lis_spin_lock_init(lis_spin_lock_t *lock, const char *name) ; This routine initializes the spin lock and associates an ASCII string name with it. The pointer name is saved in the lock structure for later use in printing out the lock trace table. It is the caller's responsibility to ensure that the name resides in memory that will persist for the duration of the existence of the lock. You can also use dynamically allocated spin locks. This technique allows your STREAMS driver to be completely immune from changes in kernel version regarding the size of a spin lock since your driver only has to store a pointer to the allocated lock. The allocation and deallocation routines are as follows. lis_spin_lock_t *lis_spin_lock_alloc(const char *name); lis_spin_lock_t *lis_spin_lock_free(lis_spin_lock_t *lock, const char *name); The allocation function returns a pointer to the spin lock, or NULL if the memory could not be allocated. The free function returns a NULL pointer for the convenience of the caller. For further information on spin locks, see the section on debugging spin locks.
To lock and unlock a spinlock, use any of the following pairs of routines. If you use the first routine to lock the spin lock then be sure to use its companion unlock routine. For nesting considerations, see below.
void lis_spin_lock(lis_spin_lock_t *lock) ; void lis_spin_unlock(lis_spin_lock_t *lock) ; int lis_spin_trylock(lis_spin_lock_t *lock) ; These routines are to be called only from background processing to lock and unlock a spin lock. The trylock routine locks the spin lock if it is available, returning "true", or leaves it unlocked if it is unavailable, returning "false". Background processing means any STREAMS driver processing that does not occur at interrupt time. These routines lock the lock but do not exclude interrupt routines from execution. Thus, your interrupt service routine can still be called whether or not your driver is holding a spin lock that was locked with one of these routines. You can nest pairs of calls to these routines from the same thread of execution. See below for more information on lock nesting. Usage example: Back to Contentslis_spin_lock(&mylock) ; ... lis_spin_unlock(&mylock) ;
void lis_spin_lock_irq(lis_spin_lock_t *lock) ; void lis_spin_unlock_irq(lis_spin_lock_t *lock); This pair of routines locks the spin lock with interrupts disabled for the duration of the holding of the lock. The routine lis_spin_lock_irq re-enables interrupts after unlocking the lock. You can use this technique to exclude interrupt routine execution. However, it is not advisable for interrupt routines themselves, or any routines called from an interrupt routine, to use this mechanism since the unlock primitive unconditionally enables interrupts, which may not be desirable from inside an interrupt routine. These routines may be used in nested fashion. Only the outermost unlock routine will actually enable interrupts. See below for more information about lock nesting. Usage example: Back to Contentslis_spin_lock_irq(&mylock) ; ... lis_spin_unlock_irq(&mylock) ;
void lis_spin_lock_irqsave(lis_spin_lock_t *lock, int *flags) ; void lis_spin_unlock_irqrestore(lis_spin_lock_t *lock,int *flags) ; This pair of routines is similar to the "spin_lock_irq" routines in that the locking routine disables interrupts. However, it saves the interrupt state in the integer argument whose pointer is passed to the locking routine. The unlock routine then restores the interrupt state after unlocking the lock. These routines are suitable for use by routines that are called both from interrupt level and from background. They also have the effect, when used in an interrupt routine, of excluding multiple execution of an interrupt routine on multiple CPUs in an SMP system. These routines may be used in nested fashion. Only the outermost unlock routine will actually restore the interrupt state. See below for more information about lock nesting. Usage example: lis_spin_lock_t mylock ; int flags ;lis_spin_lock_irqsave(&mylock, &flags) ; ... lis_spin_unlock_irqrestore(&mylock, &flags) ; Note that the unlock routine is passed the address of the flags just as in calling the lock routine.
Lock NestingLiS spin locks can be locked and unlocked in nested fashion. When doing so, it is always best to use the same pair of lock and unlock routines at all levels of nesting for the same lock. Mixing different types of locking can lead to unexpected results and non-portable behavior. LiS allows a single thread to lock spin locks in nested fashion. That is, the second and subsequent calls to the lock routine from a single thread will not spin on the lock because of finding it in a locked state from the first call. Also, every unlock call except the last one, the one that balances the first locking call, does not unlock the lock. Only the outermost unlock call causes the lock to be unlocked. If the nesting is via lis_spin_lock_irq, then only the outermost unlock call enables interrupts. If the nesting is via lis_spin_lock_irqsave, then only the outermost unlock call restores the interrupt state. When two or more threads attempt to lock a spin lock "simultaneously" only one thread is allowed to proceed at a time. The other threads "spin", that is, the CPUs executing the other threads are executing a loop that tests the lock repeatedly until it becomes available. Consequently, it is advisable to use locks to protect the execution of fairly short pieces of code if there is any likelihood of contention for the lock. While one thread is holding the lock, other CPUs may be idling waiting for it. In the context of locking, "simultaneously" means any time from the moment of the first thread locking the spin lock until that thread unlocks the lock. If another thread attempts to lock the spin lock at any point in that interval then it will "spin." When multiple threads use multiple spin locks to protect multiple resources,
it is always a good idea if all threads execute "lock" operations on the
multiple spin locks in the same order. It is also highly recommended
that they execute "unlock" operations in the exact reverse order as the
"lock" operations. This avoids so-called "deadly embrace" situations
in which process A acquires spin lock A, process B acquires spin lock
B, and then process A waits on B while process B waits on A.
LiS Read/Write LocksLiS offers an abstraction of the kernel's read/write locks. The LiS abstractions allow STREAMS drivers to use these locks without concern for changes that occur from one version of the kernel to the next. A read/write lock is declared as a special data object of type lis_rw_lock_t. There are two types of routines to manipulate these locks. One set operates on the lock as a "read" lock. The other set operates on the lock as a "write" lock. There can be multiple threads owning the lock in read mode. There can only be one thread that owns the lock in write mode. Furthermore, in order to acquire the lock in write mode, all the owners of the read mode lock must give it up. The locks are used in the obvious way. If you only need to read the protected structure you use the read lock routine. If you need to change the structure you use the write lock routine. Note that once you have a read lock you must give it up in order to get the same lock as a write lock. The lock manipulation routines also allow for "regular", "irq" and "irqsave" manipulations of the read/write locks, just as with spin locks. You must initialize your lock before using it, just as with spin locks. And in parallel to spin locks LiS provides two initialization routines. One operates directly on the read/write lock, and the other allocates memory dynamically for the lock. You can deallocate the dynamically allocated lock by calling the "free" routine. The following is a listing of the read/write lock routines in LiS. The prototypes are in the file <sys/lislocks.h>. void lis_rw_read_lock(lis_rw_lock_t *lock) ; void lis_rw_write_lock(lis_rw_lock_t *lock) ; void lis_rw_read_unlock(lis_rw_lock_t *lock) ; void lis_rw_write_unlock(lis_rw_lock_t *lock) ; void lis_rw_read_lock_irq(lis_rw_lock_t *lock) ; void lis_rw_write_lock_irq(lis_rw_lock_t *lock) ; void lis_rw_read_unlock_irq(lis_rw_lock_t *lock) ; void lis_rw_write_unlock_irq(lis_rw_lock_t *lock) ; void lis_rw_read_lock_irqsave(lis_rw_lock_t *lock, int *flags) ; void lis_rw_write_lock_irqsave(lis_rw_lock_t *lock, int *flags) ; void lis_rw_read_unlock_irqrestore(lis_rw_lock_t *lock, int *flags) ; void lis_rw_write_unlock_irqrestore(lis_rw_lock_t *lock, int *flags); void lis_rw_lock_init(lis_rw_lock_t *lock, const char *name) ; lis_rw_lock_t *lis_rw_lock_alloc(const char *name) ; lis_rw_lock_t *lis_rw_lock_free(lis_rw_lock_t *lock, const char *name) ;
LiS Interrupt Enable/DisableLiS provides primitives for enabling and disabling interrupts modelled after the SVR4 SPL mechanism. There is one routine that is used to disable interrupts and another one for enabling interrupts. The routines are as follows: int lis_splstr(void) ; void lis_splx(int x) ; The lis_splstr routine is used to disable interrupts. It returns a value that must be passed to lis_splx when it it desired to restore the interrupt level to its previous state. These two routines are implemented using the primitives lis_spin_lock_irqsave and lis_spin_unlock_irqrestore. These routines can be used from background code ("put" and "service" procedures, or "open" and "close" routines), or from interrupt level. LiS itself uses these routines to protect STREAMS structures from ill-timed modification by interrupt routines. Many LiS utility routines, such as putq, getq and qenable, call these routines within themselves. It is safe, and occurs frequently, to use these routines in a nested fashion. When using these routines in a nested fashion be sure that the value returned by the call to lis_splstr at level n is the value passed back to lis_splx at level n. The nesting rules for these routines are otherwise the same as for the pair lis_spin_lock_irqsave and lis_spin_unlock_irqrestore. Usage examples: int x, y ;x = lis_splstr() ; ... y = lis_splstr() ; ... lis_splx(y) ; ... lis_splx(x) ; For further information on these routines see the section on debugging spin locks.
LiS SemaphoresLiS provides an implementation of semaphores that is built upon the Linux kernel's semaphores. The LiS implementation adds features to the kernel semaphores such as the following:
For these reasons I highly recommend that STREAMS drivers use the LiS semaphore implementation in place of the direct kernel semaphores. The portability aspect of LiS semaphores cannot be overemphasized. Different Linux kernel compile-time options can lead to a proliferation of STREAMS driver code versions, or the necessity of always compiling the driver from source when it is installed. LiS semaphores allow a STREAMS driver to be compiled independently of kernel options with only the binary needed at driver installation time. To declare an LiS semaphore, use a declaration similar to the following: lis_semaphore_t mysem ; LiS semaphores must be initialized before they are used. Use the following routine to initialize a declared semaphore. void lis_sem_init(lis_semaphore_t *,int); If you initialize the semaphore to 0, then the first "down" operation on the semaphore will wait. If you initialize it to 1, then the first "down" operation will not wait. If you initialize it to n, then the first n "down" operations will not wait. You can also allocate semaphores dynamically using the following routine. lis_semaphore_t *lis_sem_alloc(int); This routine uses the kernel's memory allocator to allocate space for the semaphore. The lis_sem_destroy routine will deallocate it for you. The advantage of using this routine is that your STREAMS driver only has to have a pointer to the semaphore, not a semaphore structure itself. This adds an extra level of protection of your driver from kernel version considerations. You can use the semaphore value to manage a pool of resources by initializing a semaphore to the number of items in the resource and having a driver open routine perform a "down" operation on the semaphore. This causes the open operations to be queued until the resource is available. LiS semaphores should be explicitly destroyed when they are no longer needed, typically from your STREAMS driver close routine. This operation is accomplished via the following routine. lis_semaphore_t *lis_sem_destroy(lis_semaphore_t *,int); This routine returns a NULL pointer for the convenience of the caller. For further information on semaphores, see the section on debugging semaphores.
The following two routines are used to acquire and release a semaphore. int lis_down(lis_semaphore_t *sem) ; void lis_up(lis_semaphore_t *sem) ; The routine lis_down returns 0 for success and a negative error code for failure. The caller has not acquired the semaphore unless the routine returns zero. One reason for a negative return could be that the calling task was signalled while waiting for the semaphore to become available. If this has occurred the return code will be set to -EINTR. Semaphores cannot be used in nested fashion. Care must be exercised that a single thread only performs one "down" operation on a given semaphore. When multiple threads use multiple semaphores to protect multiple resources, it is always a good idea if all threads execute "down" operations on the multiple semaphores in the same order. It is also highly recommended that they execute "up" operations in the exact reverse order as the "down" operations. This avoids so-called "deadly embrace" situations in which process A acquires semaphore A, process B acquires semaphore B, and then process A waits on B while process B waits on A. Semaphores should be used only in STREAMS driver "open" and "close" routines. STREAMS driver "put" and "service" procedures are not allowed to sleep. They should use spin locks instead of semaphores. Usage example: Back to Contentsif (lis_down(&mysem) == 0) { ... lis_up(&mysem) ; } Debugging Spin LocksLiS spin lock structures contain fields that assist in the debugging of spin-lock related problems. The LiS spin lock structure contains the following fields.
If a thread owns the lock then its value of the current task pointer will be in taskp. If there is no other thread spinning on the lock, and if the lock has not been acquired in a nested fashion, then the spinner and owner fields will indicate the same file and line number. If the spinner and owner fields are different and if the taskp is non-NULL then if the thread that most recently called one of the lis_spin_lock routines is different from the task that owns the lock, then that other task is spinning on the lock. By examination of the lock you can see which task owns the lock and where in the code it was acquired. This is often enough information to figure out why a deadlock is occurring. A "deadly embrace" occurs when two threads each need to acquire two spin locks but they acquire them in the opposite order from each other. Under circumstances of contention each process owns the lock that the other is spinning on and will not release the lock until it acquires the other lock. Thus, both threads spin forever. Note that the LiS splstr and splx functions are written in terms of LiS spin locks. LiS does not use these routines internally. They are provided to the user for backwards compatibility. However, it is important to know that these routines are spin locks in disguise. This means that the order of use of these functions mixed in with explicit spin lock manipulations may also lead to deadly embraces. An effective technique for troubleshooting these kinds of problems is to use the two-machine kernel debugger, kgdb. With this setup you can break into the target machine and look at memory using high level debugging techniques, including printing out of structures. Using kgdb you can find out where each CPU is executing, look at the corresponding source code lines, observe the locks that are involved, and then print out the lis_spin_lock_t structures for the specific locks. Oftentimes the information contained in the two locks will immediately reveal the nature of the deadly embrace. It is also possible to have LiS trace all lock and semaphore operations. One of the LiS debug bits enables this function. To set this debug bit use the following command.
This causes LiS to make entries in a global trace buffer named lis_spl_track. The global pointer lis_spl_track_ptr indicates the next location in the table into which an entry is to be placed, which means that it points to the oldest entry in the buffer. Entries in the buffer are of type spl_track_t. The fields of this structure are as follows.
The trace buffer contains 4096 of these entries, maintained in a circular fashion. By printing out these entries you can see the history of lock manipulation within LiS. The command streams -p causes LiS to print out this table from within the kernel. The resulting output can be found in /var/log/messages (typically). However, in practice the system is usually hung when you need this information so you end up printing it from within the debugger. Debugging SemaphoresLiS semaphore structures contain fields that assist in the debugging of semaphore related problems. The LiS semaphore structure contains the following fields.
If the taskp field is non-NULL then the semaphore is owned by the task so indicated. If it is NULL then the semaphore is unowned. The upper fields show where the semaphore was last released. If the downer and owner fields both indicate the same file and line number then that is an indication that the semaphore was acquired at that location in the program. If they are different, and if the taskp is non-NULL, that is an indication that there is a task waiting on the semaphore at the downer location. The owner fields show where the semaphore was acquired. Bear in mind that semaphore acquisitions do not nest as is the case with spin locks. Therefore, if the same thread calls lis_down without calling lis_up on the same semaphore then the thread will be deadlocked. The downer and owner fields will usually offer a clue to this type of deadlock. You can also use the LiS lock trace buffer mechanism to assist in debugging semaphore usage.
STREAMS Utility RoutinesThe following routines are available to LiS STREAMS drivers. These are standard AT&T SVR4 utility routines. They (hopefully) have the same semantics in LiS as they do in SVR4 STREAMS. These routines are presented here in alphabetical order with no description. Please refer to the AT&T SVR4 STREAMS documentation for the descriptions of these routines. Flushing Queue BandsA special note on flushing queue bands is in order. The rules for flushing queues are a bit complex, so we wish to review them here in some detail. First some definitions and some things that affect all queue flushing. The term "data message" in the context of queue flushing means messages of type M_DATA, M_PROTO, M_PCPROTO or M_DELAY. All other message types are considered "non-data messages". You may find it less than intuitive that M_PCPROTO is considered a "data message". The term "ordinary message" in the context of queue flushing means messages of type M_DATA, M_PROTO, M_BREAK, M_CTL, M_DELAY, M_IOCTL, M_PASSFP, M_RSE, M_SETOPTS or M_SIG. Please note that M_PCPROTO is not on this list. The flag argument of FLUSHDATA means that only "data messages" are to be flushed. The flag argument of FLUSHALL means that "all" messages are to be flushed. As we shall see, in flushing queue bands whether a message gets flushed or not depends upon what the meaning of the word "all" is. First, let's take the case of the routine flushq(q,flag). If flag is set to FLUSHDATA then all "data messages" in the entire queue, including all queue bands, are flushed. If the flag is set to FLUSHALL then the entire queue is flushed. The case of the routine flushband(q,band,flag) is more complicated. If the band argument is zero then special rules apply. In this case, only "ordinary" messages are flushed from the queue. The value of the flag parameter does not influence the operation. In Solaris STREAMS this behavior does not occur. They flush either "data messages" or "all" messages on band zero. Comments in the Solaris 8 source code indicate that the author of the flush code was somewhat confused on this point. If the band argument is non-zero then the specific band of the queue is flushed in a manner similar to that of flushq. That is, the flag argument of FLUSHDATA means just flush "data messages" and the value of FLUSHALL means flush "all" messages from the specific band. One further item needs some attention. Whenever an M_PCPROTO (or other "high priority") message is inserted into a STREAMS queue it is queued ahead of all messages in any queue band. This means that an M_PCPROTO cannot be directed to a queue band. It also means that flushband can never flush an M_PCPROTO, or any other "high priority" message from the queue. In order to flush M_PCPROTOs you must call flushq and flush the entire queue of either "data messages" or "all" messages. Utility PrototypesA-Dint adjmsg(mblk_t *mp, int length); struct msgb *allocb(int size, unsigned int priority); queue_t *backq(queue_t *q); int bcanput(queue_t *q, unsigned char band); int bcanputnext(queue_t *q, unsigned char band); void bcopy(void *src, void *dst, int nbytes) ; int bufcall(unsigned size, int priority, void (*function)(long), long arg); void bzero(void *addr, int nbytes) ; int canput(queue_t *q); int canputnext(queue_t *q); void cmn_err(int err_lvl, char *fmt, ...) ; mblk_t *copyb(mblk_t *mp); mblk_t *copymsg(mblk_t *mp); #define datamsg(type) -- true if msg->b_datap->db_type is data mblk_t *dupb(mblk_t *mp); mblk_t *dupmsg(mblk_t *mp); E-Kvoid enableok(queue_t *q); mblk_t *esballoc(unsigned char *base, int size, int priority, frtn_t *freeinfo); int esbbcall(int priority, void (*function)(long), long arg); void flushband(queue_t *q, unsigned char band, int flag); void flushq(queue_t *q, int flag); void freeb(mblk_t *bp); void freemsg(mblk_t *mp); int getmajor(dev_t dev) ; int getminor(dev_t dev) ; mblk_t *getq(queue_t *q); int insq(queue_t *q, mblk_t *emp, mblk_t *mp); void *kmem_alloc(int siz, int wait_code); void *kmem_zalloc(int siz, int wait_code); void kmem_free(void *ptr,int siz); L-Pvoid linkb(mblk_t *mp1, mblk_t *mp2); int msgdsize(mblk_t *mp); mblk_t *msgpullup(mblk_t *mp, int length); int msgsize(mblk_t *mp); void noenable(queue_t *q); queue_t *OTHERQ(queue_t *q); int pullupmsg(mblk_t *mp, int length); int putbq(queue_t *q, mblk_t *mp); int putctl(queue_t *q, int type); int putctl1(queue_t *q, int type, int param); void putnext(queue_t *q, mblk_t *mp); int putnextctl(queue_t *q, int type); int putnextctl1(queue_t *q, int type, int param); int putq(queue_t *q, mblk_t *mp); Q-Svoid qenable(queue_t *q); void qreply(queue_t *q, mblk_t *mp); int qsize(queue_t *q); void qprocsoff(queue_t *rdq) ; void qprocson(queue_t *rdq) ; queue_t *RD(queue_t *q); queue_t *WR(queue_t *q); queue_t *OTHERQ(queue_t *q); mblk_t *rmvb(mblk_t *mp, mblk_t *bp); void rmvq(queue_t *q, mblk_t *mp); int SAMESTR(queue_t *q); int strqget(queue_t *q, qfields_t what, unsigned char band, long *val); int strqset(queue_t *q, qfields_t what, unsigned char band, long val); T-Zint testb(int size, unsigned int priority); #define HZ -- ticks per second typedef void timo_fcn_t(caddr_t arg) ; toid_t timeout(timo_fcn_t *timo_fcn, caddr_t arg, long ticks); toid_t lis_untimeout(toid_t id) ; void unbufcall(int bcid); mblk_t *unlinkb(mblk_t *mp); int untimeout(int id) ; queue_t *WR(queue_t *q); int xmsgsize(mblk_t *mp);
System Calls from within the KernelLiS provides STREAMS drivers with a few system calls that can be made from within the kernel. These calls are intended to allow STREAMS drivers to manage their device special files through which the drivers are accessed. For example, by using the lis_mknod function a dynamically loaded driver can register itself with LiS, obtain a major device number and make its "/dev" entries at module load time. Using the lis_unlink function it can remove these "/dev" entries when the module unloads. The semantics of the following routines are exactly the same as the user level routines of the same names without the "lis_" prefix. This is so because these routines are really just wrappers on a kernel system call. We list the function prototypes here but leave the detailed documentation to "man pages" and other documentation. The following function prototypes exist in the file <sys/dki.h>. int lis_mknod(char *name, int mode, dev_t dev) ; int lis_unlink(char *name) ; int lis_mount(char *dev_name, char *dir_name, char *fstype, unsigned long rwflag, void *data) ; int lis_umount(char *file, int flags) ;
|