Linux STREAMS (LiS)


LiS Driver/Kernel Interface (DKI)

Contents

Introduction

Operating System Interface Routines

PCI BIOS Interface
PCI Interface
IRQ Interface
I/O Memory Mapping
I/O Port Access
Memory Allocation
DMA Routines
Delay Routines
Printing Routines
Timer Routines
Sleep and Wakeup Routines
Thread Creation

LiS Memory Allocation
LiS "malloc" and "free" Equivalents
LiS Kernel Memory Allocators
LiS Page Allocator
LiS PCI Interface
The LiS PCI Device Structure
LiS PCI Search Routines
LiS PCI Configuration Space Routines

LiS Atomic Functions

LiS Locks

LiS Spin Locks
Lock Nesting
LiS Interrupt Enable/Disable
LiS Semaphores
Debugging Spin Locks
Debugging Semaphores

STREAMS Utility Routines

Flushing Queue Bands


Introduction


Linux STREAMS (LiS) provides for an interface between STREAMS drivers and the surrounding kernel environment.  This interface has grown over time and is likely to expand in the future.

In the Linux kernel, much of the interface between drivers and other kernel modules and the core kernel services, such as memory allocation and synchronization primitives, is implemented in macros and inline functions declared in kernel header files.  This technique was used (probably) out of considerations of efficiency (defined as execution speed) and a consideration that there were no version problems with such constructs because one could always recompile one's drivers in the context of the new kernel.  The only "kernel primitives" compatibility that has been attempted from one kernel release to the next is source code compatibility.

The real world of paying customers is quite different.  And, as it happens, the world of paying customers seems to impinge upon LiS considerably.

In this world, the customers do not want to rebuild the kernel.  They don't want to build the kernel at all.  They want to install a distribution with a binary kernel that was configured only at install time.  They then want to install add-on binary packages, and they expect these packages to operate correctly with their kernel.

When these add-on packages consist of STREAMS based protocol drivers, LiS is usually the only piece of code that is recompiled from source upon installation into the customer's environment.  The STREAMS drivers themselves are typically distributed in binary and linked in with LiS.  The resulting module is then typically loaded using "modprobe" or some equivalent command.

In these circumstances it is highly desirable for LiS to "buffer" the interface between the STREAMS drivers and the kernel environment.  This allows the STREAMS driver writers to deliver smaller binary packages to their customers and minimizes the number of different versions of those packages that must be maintained by the STREAMS driver writers.  Ideally, LiS would be able to present a uniform DKI that would support one version of a user's STREAMS driver across all versions of the Linux kernel.

This ultimate goal is probably not achievable, but it is possible to insulate STREAMS drivers from the Linux kernel to a considerable extent.  This is possible in part due to the implied DKI of a STREAMS driver.  A STREAMS driver most likely will confine itself to the SVR4 types of DKI calls which have syntax and semantics that do not change over time.  The main challenges come from the use of constructs, such as PCI configuration and interrupt service routines, that go outside the SVR4 DKI and must use services of the Linux kernel more-or-less directly.

In general, LiS attempts to replace inline functions and macros with actual subroutine calls to perform kernel operations.  This allows the STREAMS driver to be compiled once with references to these routines, with the routines themselves being compiled in the context of the specific kernel version at package installation time.  Thus, the STREAMS drivers do not have to be sensitive to differences in kernel versions.

Back to Contents


Operating System Interface Routines


In the file <sys/osif.h>, LiS provides insulation routines for a number of commonly used kernel functions.  These functions are used with their Linux kernel names, but those names are redefined in <sys/osif.h> to be subroutine calls on functions that are actually defined in the file osif.c within LiS.  The osif.c file is compiled at LiS installation time and is sensitive to kernel version information.

In order to use this interface, you include the header files that you would normally include to use the kernel functions, and then include <sys/osif.h> after all of the kernel include files.  This allows for the redefinition of the names.

The kernel functions provided via <sys/osif.h> are as follows, grouped by type of function.


PCI BIOS Interface


These are routines that utilize or simulate the original PCI BIOS interface of the 2.0 series of kernels.  The names of these routines are changed via defines.  Use them as if the prototypes were as follows.  You can use these routines on 2.2 kernels even though they represent the 2.0 style of inteface.
 

#if LINUX_VERSION_CODE < 0x020100                        /* 2.0 kernel */ 
unsigned long pcibios_init(unsigned long memory_start, 
                          unsigned long memory_end) ; 
#else                                                    /* 2.1 or 2.2 kernel */ 
void pcibios_init(void) ; 
#endif 
int pcibios_find_class(unsigned int   class_code, 
                      unsigned short index, 
                      unsigned char *bus, 
                      unsigned char *dev_fn) ; 
int pcibios_find_device(unsigned short vendor, 
                       unsigned short dev_id, 
                       unsigned short index, 
                       unsigned char *bus, 
                       unsigned char *dev_fn) ; 
int pcibios_read_config_byte(unsigned char  bus, 
                            unsigned char  dev_fn, 
                            unsigned char  where, 
                            unsigned char *val) ; 
int pcibios_read_config_word(unsigned char   bus, 
                             unsigned char   dev_fn, 
                             unsigned char   where, 
                             unsigned short *val) ; 
int pcibios_read_config_dword(unsigned char  bus, 
                             unsigned char  dev_fn, 
                             unsigned char  where, 
                             unsigned int  *val) ; 
int pcibios_write_config_byte(unsigned char  bus, 
                             unsigned char  dev_fn, 
                             unsigned char  where, 
                             unsigned char  val) ; 
int pcibios_write_config_word(unsigned char   bus,          
                             unsigned char   dev_fn, 
                             unsigned char   where, 
                             unsigned short  val) ; 
int pcibios_write_config_dword(unsigned char  bus, 
                              unsigned char  dev_fn, 
                              unsigned char  where, 
                              unsigned int   val) ; 
const char *pcibios_strerror(int error) ;
Back to Contents


PCI Interface

These routines constitute the PCI interface as implemented in the 2.2 series of kernels.  Please note that these are filtered calls to the operating system and still depend directly upon the kernel structure "struct pci_dev".  LiS provides a more abstract interface to PCI that does not depend upon the direct definition kernel structures.  The LiS PCI interface is to be preferred since it provides more insulation against changes in the kernel.
 

struct pci_dev  *pci_find_device(unsigned int vendor,          
                                unsigned int device, 
                                struct pci_dev *from); 
struct pci_dev  *pci_find_class(unsigned int class, struct pci_dev *from); 
struct pci_dev  *pci_find_slot(unsigned int bus, unsigned int devfn); 
int     pci_read_config_byte(struct pci_dev *dev, u8 where, u8 *val); 
int     pci_read_config_word(struct pci_dev *dev, u8 where, u16 *val); 
int     pci_read_config_dword(struct pci_dev *dev, u8 where, u32 *val); 
int     pci_write_config_byte(struct pci_dev *dev, u8 where, u8 val); 
int     pci_write_config_word(struct pci_dev *dev, u8 where, u16 val); 
int     pci_write_config_dword(struct pci_dev *dev, u8 where, u32 val); 
void    pci_set_master(struct pci_dev *dev);
Back to Contents


IRQ Interface

These are the routines that are used to attach and detach interrupt service routines to hardware interrupts.

int  request_irq(unsigned int  irq, 
                void        (*handler)(int, void *, void *), 
                unsigned long flags, 
                const char   *device, 
                void         *dev_id) ; 
void free_irq(unsigned int irq, void *dev_id) ; 
void disable_irq(unsigned int irq) ; 
oid enable_irq(unsigned int irq) ;
Back to Contents


I/O Memory Mapping

These are the routines that are typically used to map PCI bus or physical addresses to CPU virtual addresses.  LiS includes some backward compatibility here to older kernel versions.

void         *ioremap_nocache(unsigned long offset, unsigned long size) ; 
void          iounmap(void *addr) ; 
void         *vremap(unsigned long offset, unsigned long size) ; 
unsigned long virt_to_phys(volatile void *addr) ; 
void         *phys_to_virt(unsigned long addr) ;
Back to Contents


I/O Port Access

These are the routines that allow a driver to register I/O ports.

int  check_region(unsigned int from, unsigned int extent) ; 
void request_region(unsigned int from, 
                   unsigned int extent, 
                   const char  *name) ; 
void release_region(unsigned int from, unsigned int extent) ;
Back to Contents


Memory Allocation


These are the kernel routines that can be used to allocate memory.  LiS also has a more insulated abstraction for kernel memory allocation.  It is recommended that you use the LiS memory allocator versions rather than the direct kernel versions.

void *kmalloc(size_t nbytes, int type) ; 
void  kfree(const void *ptr) ; 
void *vmalloc(unsigned long size); 
void  vfree(void *ptr) ;
Back to Contents


DMA Routines


These are the routines that are used to allocate a main-board old-style DMA channel for use by your driver.  These are not much used anymore.

int  request_dma(unsigned int dma_nr, const char *device_id)          ; 
void free_dma(unsigned int dma_nr) ;
Back to Contents


Delay Routines

This is the routine that simply spins the CPU for a given number of microseconds.  LiS also redefines the symbol "jiffies" to a subroutine call to help insulate STREAMS drivers from changes in the way the kernel keeps track of time.  Remember, the redefinition is accomplished using C language defines, so the following declarations describe the effective usage of these symbols, not their literal definition.

void udelay(long micro_secs) ; 
unsigned long jiffies ;
Back to Contents


Printing Routines


These are the most commonly used printf-like routines in the kernel.  STREAMS drivers would be more portable if they used the cmn_err routine instead or printk.

int printk(const char *fmt, ...) ;
int sprintf(char *bfr, const char *fmt, ...) ;
int vsprintf(char *bfr, const char *fmt, va_list args) ; 

  Back to Contents


Timer Routines


These are the the routines that start and stop kernel timers.  STREAMS drivers would be more portable if they used the standard "timeout" routine.

void add_timer(struct timer_list * timer); 
int  del_timer(struct timer_list * timer);

The following routine is an LiS abstraction of the C library routine gettimeofday. Note the absence of the time zone parameter.

void lis_gettimeofday(struct timeval *tv); 

Back to Contents


Sleep and Wakeup Routines


These are the kernel routines for sleeping using wait queues.  STREAMS drivers should not be using these since only "open" and "close" routines are allowed to sleep, and for those cases, LiS semaphores would provide better insulation from the kernel.  STREAMS "put" and "service" routines should use LiS spin locks for mutual exclusion.

void sleep_on(OSIF_WAIT_Q_ARG) ; 
void interruptible_sleep_on(OSIF_WAIT_Q_ARG) ; 
void wake_up(OSIF_WAIT_Q_ARG) ; 
void wake_up_interruptible(OSIF_WAIT_Q_ARG) ;

Back to Contents

Thread Creation

A STREAMS driver in LiS can create kernel threads if it so chooses. The following routine simplifies this task. It consolidates all of the kernel manipulations involved with the creation of a kernel thread into one place, thus removing references to these kernel functions from STREAMS driver code.

Prototype

int lis_thread_start(int (*fcn)(void *), void *arg, char *name) ;

Arguments

 
fcn

The function that is to be used as the entry point for the thread.

 
arg

The argument passed to the function.

 
name

An ASCII name associated with the thread. This name should be less than 16 characters in length. It will be the name of the thread that displays in a ps listing.

Operation

lis_thread_start creates a new thread, performs some operations prior to entering the fcn, and then calls fcn which acts as the "main" routine for the thread. The arg parameter is passed to fcn.

Before fcn is entered, the newly created thread will have shed all user space files and mapped memory. Thus, it is a kernel-only thread.

All signals are still enabled. Note that when the kernel goes down for reboot all processes are first sent a SIGTERM. Once those have been processed, all processes are then sent a SIGKILL. It is the implementor's choice which of these it pays attention to in order to exit prior to a reboot.

The fcn is entered with the "big kernel lock" NOT held, just as it would be for calling the "kernel_thread" function directly. On 2.2 kernels, the fcn should get this lock so that it can utilize kernel services safely.

The user's fcn returns a value when it exits and that value is returned to the kernel. It is not clear that anything actually pays any attention to this returned value. It particular, it is not visible to the thread that started the new thread.

Back to Contents


LiS Memory Allocation

LiS provides for several different styles of memory allocation, all of them insulated from the Linux kernel.  These routines allow your driver to allocate memory in several different ways while still maintaining compatibility with different versions of the Linux kernel, with no driver recompilation required.

To use the LiS memory allocation routines include the file <sys/lismem.h> in your STREAMS driver source code.
 

LiS "malloc" and "free" Equivalents

The first group of memory allocation routines are the routines that play the role of "malloc" and "free."  These routines keep a master linked list of all allocated memory areas.  This list can be printed out via an ioctl to LiS.  Each allocated area is tagged with the file name and line number of the code that caused it to be allocated.  Each area contains a guard word at the front and back to enable the allocator to detect "off by one" accesses outside the allocated area.

LiS uses this allocator internally for allocating queues, messages and other internal data structures.  This would be the allocator of choice for STREAMS drivers to use to allocate instance structures.

Memory allocated in this manner is ultimately allocated by the kernel routine "kmalloc".  As such, it is not guaranteed to be DMA-able (in the old style), or to occupy physically contiguous memory locations.  See below for routines that can be used to allocate these types of memory areas.

The routines are as follows:

void *ALLOC(int nbytes) ; 
void *ALLOCF(int nbytes, char *tag) ; 
void  FREE(void *ptr) ;

The ALLOC and FREE routines are analogous to "malloc" and "free".  The ALLOCF routine includes a character string which is prepended to the file name stored as the location from which the allocation occurred.  It can serve as a tag for the type of memory being allocated.

Usage examples:

ptr = ALLOC(456) ; 
FREE(ptr) ; 
ptr = ALLOCF(578, "Instance: ") ; 
FREE(ptr) ;
Back to Contents



 

LiS Kernel Memory Allocators

These routines use the LiS malloc/free internal routines to allow for more flexibility in the options used when calling the kernel allocator.  These routines all lead to a call on "kmalloc" with appropriate options.  It is worth noting that the numerical value of the constants used in calling the kernel's "kmalloc" routine changed between the 2.2 and 2.4 versions of the kernel.  Thus, drivers which called the kernel's "kmalloc" directly have to be recompiled to run in a 2.4 kernel.  STREAMS drivers using the memory allocation interface defined here could run without modification and without a recompilation on both kernels, assuming that the drivers otherwise did not use any direct kernel functions.

void    *lis_alloc_atomic(int nbytes) ;          
void    *lis_alloc_kernel(int nbytes) ; 
void    *lis_alloc_dma(int nbytes) ; 
void    *lis_free_mem(void *mem_area) ;

These routines pass the allocation options GFP_ATOMIC, GFP_KERNEL, and GFP_DMA, respectively, to "kmalloc" when allocating the memory.  LiS takes care of passing the proper values to the kernel routine so that driver code can remain portable.

The routine lis_free_mem returns a NULL pointer for the convenience of the caller.

Usage Examples:

ptr = lis_alloc_kernel(sizeof(structure)) ; 
ptr = lis_free_mem(ptr) ;                     /* returns NULL pointer */


Back to Contents


LiS Page Allocator

These routines allow a STREAMS driver to allocate memory directly from the kernel's page allocator.  Memory allocated in this manner occupies physically contiguous locations and is suitable for use with bus master DMA PCI devices.

Unlike the kernel's page allocator, the size that is specified when calling the LiS page allocator is in bytes, not "order", or other encoding of page size.  LiS calculates the number of pages based upon the requested size.

Also, LiS does not require you to pass the size of the area when freeing the page.

The routines are as follows:

void    *lis_get_free_pages(int nbytes) ;          
void    *lis_free_pages(void *ptr) ;

The lis_free_pages routine returns a NULL pointer for the convenience of the caller.

Usage Examples:

ptr = lis_get_free_pages(1024*kbytes) ; 
ptr = lis_free_pages(ptr) ;


Back to Contents


LiS PCI Interface

In order to assist in the portability of STREAMS drivers across different versions of the Linux kernel, LiS provides an abstraction of the PCI configuration interface.  It defines a data structure that is used to describe a PCI device and a set of routines that perform operations on PCI configuration space.

Using these abstractions, a STREAMS driver can be portable from the 2.2 kernel to the 2.4 kernel with no recompilation required.  The LiS structures completely hide the kernel data structures and PCI configuration space operations from the STREAMS driver.

To use this interface include the file <sys/lispci.h> in your STREAMS driver source code.

Back to Contents


The LiS PCI Device Structure

This structure is distinct from a similar structure which is defined by the Linux kernel, but which differs significantly between the 2.2 and 2.4 kernels.  The LiS version of this structure is oriented towards providing just enough information to allow a driver to operate the PCI device, without being concerned about the details of PCI bus topology.

This structure is used to return information to the STREAMS driver concerning devices that meet certain criteria, such as device class or manufacturer devide identification.

#define LIS_PCI_MEM_CNT                  12       /* # mem addrs */ 
typedef struct lis_pci_dev 
{ 
    unsigned                     bus ;            /* bus number */ 
    unsigned                     dev_fcn ;        /* device/function code */ 
    unsigned                     vendor ;         /* vendor id */ 
    unsigned                     device ;         /* device id */ 
    unsigned                     class ;          /* class type */ 
    unsigned                     hdr_type ;       /* PCI header type */ 
    unsigned                     irq ;            /* IRQ number */
    unsigned long                mem_addrs[LIS_PCI_MEM_CNT] ;
    void                        *user_ptr ;       /* private for user */ 
} lis_pci_dev_t ; 

The bus field contains the bus number on which the device is located.  LiS obtains this information from the kernel.

The dev_fcn field contains an encoding of the device number on the bus and the function number within the device that this particular structure pertains to.  The pair bus and dev_fcn uniquely identifies a device in the PCI subsystem.  Devices can be searched for on the PCI bus by bus number and dev_fcn value (see below).

Given a dev_fcn value, a pair of macros will extract the "device" portion and the "function number" portion from it.

 
 
#define LIS_PCI_DEV(devfcn)

Extracts the "device" portion

 
#define LIS_PCI_FCN(devfcn)

Extracts the "function number" portion

Given a device number and a function number, the following macro will synthesize a dev_fcn value suitable for use in searching the bus.

 
#define LIS_MK_DEV_FCN(dev,fcn)

Put dev and fcn together


The vendor and device fields contain the vendor id (manuracturer code) and the vendor's device identifier for the device.  Devices can be searched for on the PCI bus by vendor and device identifier (see below).

The class field contains the class code associated with the device.  Devices can be searched for on the PCI bus by class code (see below).

The hdr_type field gives the type information for the PCI configuration space header.

The irq field gives the IRQ number that is assigned to this device.  This is the number that is used to attach an interrupt service routine to the device.

The mem_addrs field contains a list of addresses associated with the device.  These are raw PCI bus addresses and are not mapped into the address space of the processor.  Empty slots contain the value zero.

Back to Contents


LiS PCI Search Routines

These routines allow the STREAMS driver to find devices on the PCI bus and obtain a pointer to the lis_pci_dev_t structure for the device.


lis_pci_dev_t   *lis_pci_find_device(unsigned vendor, unsigned device, 
                                    lis_pci_dev_t *previous_struct) ; 

Find the device by vendor identification and vendor device identification.  By passing in the pointer to the previous structure returned it is possible to find all devices of a given type.

The routine returns NULL if there are no (more) devices for the given vender and device identifiers.

Usage example:


lis_pci_dev_t    *pcip = NULL ;
while ((pcip = lis_pci_find_device(0x109e, 0x8474, pcip)) != NULL)
{
     pcip points to a unique device from this vendor            
} 

lis_pci_dev_t   *lis_pci_find_class(unsigned class, 
                                    lis_pci_dev_t *previous_struct) ; 

Find the device by class.  The usage is similar to lis_pci_find_device in that you can use a pointer to loop through all devices of a given class.

The function returns NULL if there are no (more) devices of the given class.


lis_pci_dev_t   *lis_pci_find_slot(unsigned bus, unsigned dev_fcn) ; 

Find the device by slot number.  If you know the bus number (zero for most simple Intel PC systems) and the dev_fcn, you can obtain the PCI configuration information for that particular "slot".  Use the LIS_MK_DEV_FCN macro to synthesize the dev_fcn value from the "device" (slot) number and the function number.

The function returns NULL if there is no device in that slot.

Note that this routine only returns one structure since it is not meaningful to process a list of devices for the same slot.
 

Back to Contents


LiS PCI Configuration Space Routines

The following routines are used to read and write PCI configuration space for a particular device.  Configuration space can be accessed by byte, word (16 bit) or dword (32 bit).

Each routine takes a pointer to an lis_pci_dev_t structure as an argument.  It also takes an index value which is the byte offset from the base of the configuration space for the device at which the given byte/word/dword is to be read or written.

Care should be exercised when writing to configuration space since many of these values are determined by the PCI BIOS at system boot time.

The lis_pci_set_master routine sets the "bus master DMA" bit for the given device.  This is used for devices that perform bus master DMA.

The routines are as follows:

int    lis_pci_read_config_byte(lis_pci_dev_t *dev, 
                               unsigned       index, 
                               unsigned char *rtn_val);          
int    lis_pci_read_config_word(lis_pci_dev_t  *dev, 
                               unsigned        index, 
                               unsigned short *rtn_val); 
int    lis_pci_read_config_dword(lis_pci_dev_t *dev, 
                                unsigned        index, 
                                unsigned long  *rtn_val); 
int    lis_pci_write_config_byte(lis_pci_dev_t *dev, 
                                unsigned        index, 
                                unsigned char   val); 
int    lis_pci_write_config_word(lis_pci_dev_t  *dev, 
                                unsigned         index,            
                                unsigned short   val); 
int    lis_pci_write_config_dword(lis_pci_dev_t *dev, 
                                 unsigned         index,            
                                 unsigned long    val); 
void   lis_pci_set_master(lis_pci_dev_t *dev); 
Back to Contents


LiS Atomic Functions

LiS provides for atomic integers implemented in a portable fashion.  To declare an LiS portable atomic integer use the following declaration syntax:

lis_atomic_t        myatom ; 

LiS then provides the following operations on variables of this type.

void    lis_atomic_set(lis_atomic_t *atomic_addr, int valu) ; 
int     lis_atomic_read(lis_atomic_t *atomic_addr) ; 
void    lis_atomic_add(lis_atomic_t *atomic_addr, int amt) ; 
void    lis_atomic_sub(lis_atomic_t *atomic_addr, int amt) ; 
void    lis_atomic_inc(lis_atomic_t *atomic_addr) ;          
void    lis_atomic_dec(lis_atomic_t *atomic_addr) ;          
int     lis_atomic_dec_and_test(lis_atomic_t *atomic_addr) ; 

Of these, only lis_atomic_dec_and_test needs any explanation.  This routine performs an atomic_dec on the variable and returns true if the counter reached zero via that decrement operation.  Note that by the time the routine returns some other CPU with access to the same variable may have changed its value.  So the return reports only on the instantaneous value of the variable.

Back to Contents


LiS Locks

LiS provides an abstraction and an insulated interface to the Linux kernel for spin locks, interrupt disabling and semaphores.  If you use this interface in your STREAMS driver you can utilize these kernel services on different versions of the Linux kernel without the necessity of recompiling your driver for each version of the kernel.

The LiS locks are especially useful in consideration of Linux kernels compiled with and without the SMP option set.  The spin locks and semaphores of the Linux kernel are implemented using external inline functions.  These functions are coded in assembly language and generate different sequences of instructions depending upon the compile time setting of the SMP option.  Spin locks and semaphores compiled with SMP reset will not function properly on a multi-CPU system running an SMP kernel.

The LiS locks mechanism solves this problem by abstracting the locking primitives into actual subroutines, not inlines, defined within LiS.  Since LiS is compiled from source code when it is installed the subroutines in LiS have the correct setting of SMP for the locking primitives.  This allows the STREAMS driver code to be compiled once and the object code reused for multiple installations with varying options.

The following sections document the spin locks, interrupt disabling and semaphore mechanisms offered by LiS.  To use these mechanisms include the file <sys/lislocks.h> in your STREAMS driver source code.

In choosing the appropriate type of lock to use, one must bear in mind that STREAMS drivers are not allowed to "sleep" in "put" and "service" procedures, only in "open" and "close" routines.  That means that spin locks are the mutual exclusion mechanism of choice for "put" and "service" procedures.  It is reasonable to use sleeping semaphores in "open" and "close" routines.

The simple interrupt exclusion mechanism can be used to exclude only interrupt routine execution for a section of code.  However, this mechanism does not exclude other "put" or "service" procedures that may be executed on other CPUs.  This may not be much of a consideration since LiS acquires a lock in the queue structure before executing the "put" or "service" procedure pointed to by that queue.

However, it could happen that the "read put/service" and "write put/service" procedures get executed simultaneously since there are two different locks in the STREAMS queues, one in the read queue and one in the write queue.  In this case, the STREAMS driver code would need to use spin locks to protect data structures shared between the read and write "put" or "service" procedures.

Back to Contents


LiS Spin Locks

LiS provides an implementation of spin locks that utilizes the Linux kernel's spin lock mechanism to perform the actual locking functions.  The LiS implementation adds features to the kernel spin locks such as the following:

  • LiS spin locks are nestable.  The same thread can acquire the same lock and release it in nested fashion.
  • LiS spin locks are more debuggable.  The LiS lock structure contains an ASCII name for the lock which makes it easier to identify in debugging situations.
  • LiS maintains a lock trace table.  A debugging option for LiS causes it to log all spin lock operations to a trace table which can be printed out via an option to the streams command.
  • LiS spin locks are portable.  A STREAMS driver can utilize the same LiS lock mechanism across different versions of the Linux kernel.  This pushes the kernel differences into LiS and out of the STREAMS driver code.
  • LiS spin locks are documented.  You don't have to read the kernel source code to figure out how to use them.

For these reasons I highly recommend that STREAMS drivers use the LiS spin lock implementation in place of the direct kernel spin locks.  The portability aspect of LiS spin locks cannot be overemphasized.  Different Linux kernel compile-time options can lead to a proliferation of STREAMS driver code versions, or the necessity of always compiling the driver from source when it is installed.  LiS spin locks allow a STREAMS driver to be compiled independently of kernel options with only the binary needed at driver installation time.

To declare a spin lock, use the typedef lis_spin_lock_t, as in the following:

lis_spin_lock_t    mylock ;

LiS spin locks must be initialized before they are used.  There is one initialization routine no matter which style of locking you intend to use.

void    lis_spin_lock_init(lis_spin_lock_t *lock, const char *name) ; 

This routine initializes the spin lock and associates an ASCII string name with it.  The pointer name is saved in the lock structure for later use in printing out the lock trace table.  It is the caller's responsibility to ensure that the name resides in memory that will persist for the duration of the existence of the lock.

For further information on spin locks, see the section on debugging spin locks.

Back to Contents


To lock and unlock a spinlock, use any of the following pairs of routines.  If you use the first routine to lock the spin lock then be sure to use its companion unlock routine.  For nesting considerations, see below.

void    lis_spin_lock(lis_spin_lock_t *lock) ;
void    lis_spin_unlock(lis_spin_lock_t *lock) ;
int     lis_spin_trylock(lis_spin_lock_t *lock) ;

These routines are to be called only from background processing to lock and unlock a spin lock.  The trylock routine locks the spin lock if it is available, returning "true", or leaves it unlocked if it is unavailable, returning "false".

Background processing means any STREAMS driver processing that does not occur at interrupt time.  These routines lock the lock but do not exclude interrupt routines from execution.  Thus, your interrupt service routine can still be called whether or not your driver is holding a spin lock that was locked with one of these routines.

You can nest pairs of calls to these routines from the same thread of execution.  See below for more information on lock nesting.

Usage example:
 

lis_spin_lock(&mylock) ;
...
lis_spin_unlock(&mylock) ;
Back to Contents


void    lis_spin_lock_irq(lis_spin_lock_t *lock) ;
void    lis_spin_unlock_irq(lis_spin_lock_t *lock);

This pair of routines locks the spin lock with interrupts disabled for the duration of the holding of the lock.  The routine lis_spin_lock_irq re-enables interrupts after unlocking the lock.

You can use this technique to exclude interrupt routine execution.  However, it is not advisable for interrupt routines themselves, or any routines called from an interrupt routine, to use this mechanism since the unlock primitive unconditionally enables interrupts, which may not be desirable from inside an interrupt routine.

These routines may be used in nested fashion.  Only the outermost unlock routine will actually enable interrupts.  See below for more information about lock nesting.

Usage example:

lis_spin_lock_irq(&mylock) ;
...
lis_spin_unlock_irq(&mylock) ;
Back to Contents


void    lis_spin_lock_irqsave(lis_spin_lock_t *lock, int *flags) ;
void    lis_spin_unlock_irqrestore(lis_spin_lock_t *lock,int *flags) ;

This pair of routines is similar to the "spin_lock_irq" routines in that the locking routine disables interrupts.  However, it saves the interrupt state in the integer argument whose pointer is passed to the locking routine.  The unlock routine then restores the interrupt state after unlocking the lock.

These routines are suitable for use by routines that are called both from interrupt level and from background.  They also have the effect, when used in an interrupt routine, of excluding multiple execution of an interrupt routine on multiple CPUs in an SMP system.

These routines may be used in nested fashion.  Only the outermost unlock routine will actually restore the interrupt state.  See below for more information about lock nesting.

Usage example:

lis_spin_lock_t    mylock ;
int                flags ;
lis_spin_lock_irqsave(&mylock, &flags) ;
...
lis_spin_unlock_irqrestore(&mylock, &flags) ;

Note that the unlock routine is passed the address of the flags just as in calling the lock routine.

Back to Contents


Lock Nesting

LiS spin locks can be locked and unlocked in nested fashion.  When doing so, it is always best to use the same pair of lock and unlock routines at all levels of nesting for the same lock.  Mixing different types of locking can lead to unexpected results and non-portable behavior.

LiS allows a single thread to lock spin locks in nested fashion.  That is, the second and subsequent calls to the lock routine from a single thread will not spin on the lock because of finding it in a locked state from the first call.  Also, every unlock call except the last one, the one that balances the first locking call, does not unlock the lock.  Only the outermost unlock call causes the lock to be unlocked.

If the nesting is via lis_spin_lock_irq, then only the outermost unlock call enables interrupts.  If the nesting is via lis_spin_lock_irqsave, then only the outermost unlock call restores the interrupt state.

When two or more threads attempt to lock a spin lock "simultaneously" only one thread is allowed to proceed at a time.  The other threads "spin", that is, the CPUs executing the other threads are executing a loop that tests the lock repeatedly until it becomes available.  Consequently, it is advisable to use locks to protect the execution of fairly short pieces of code if there is any likelihood of contention for the lock.  While one thread is holding the lock, other CPUs may be idling waiting for it.

In the context of locking, "simultaneously" means any time from the moment of the first thread locking the spin lock until that thread unlocks the lock.  If another thread attempts to lock the spin lock at any point in that interval then it will "spin."

When multiple threads use multiple spin locks to protect multiple resources, it is always a good idea if all threads execute "lock" operations on the multiple spin locks in the same order.  It is also highly recommended that they execute "unlock" operations in the exact reverse order as the "lock" operations.  This avoids so-called "deadly embrace" situations in which process A acquires spin lock A, process B acquires spin lock B, and then process A waits on B while process B waits on A.
 

Back to Contents



 

LiS Interrupt Enable/Disable

LiS provides primitives for enabling and disabling interrupts modelled after the SVR4 SPL mechanism.  There is one routine that is used to disable interrupts and another one for enabling interrupts.  The routines are as follows:

int     lis_splstr(void) ;
void    lis_splx(int x) ;

The lis_splstr routine is used to disable interrupts.  It returns a value that must be passed to lis_splx when it it desired to restore the interrupt level to its previous state.  These two routines are implemented using the primitives lis_spin_lock_irqsave and lis_spin_unlock_irqrestore.

These routines can be used from background code ("put" and "service" procedures, or "open" and "close" routines), or from interrupt level.  LiS itself uses these routines to protect STREAMS structures from ill-timed modification by interrupt routines.  Many LiS utility routines, such as putq, getq and qenable, call these routines within themselves.

It is safe, and occurs frequently, to use these routines in a nested fashion.  When using these routines in a nested fashion be sure that the value returned by the call to lis_splstr at level n is the value passed back to lis_splx at level n.  The nesting rules for these routines are otherwise the same as for the pair lis_spin_lock_irqsave and lis_spin_unlock_irqrestore.

Usage examples:

int        x, y ;
x = lis_splstr() ;
...
y = lis_splstr() ;
...
lis_splx(y) ;
...
lis_splx(x) ;

For further information on these routines see the section on debugging spin locks.


Back to Contents


LiS Semaphores

LiS provides an implementation of semaphores that is built upon the Linux kernel's semaphores.  The LiS implementation adds features to the kernel semaphores such as the following:

  • LiS semaphores are more debuggable.  The LiS semaphore structure contains fields that save the file name and line number of the semaphore owner.  This makes it easier to debug drivers which utilize semaphores.
  • LiS semaphores retain error information.  When a "down" operation fails, LiS saves the error number in the semaphore structure for post mortem analysis.
  • LiS semaphores are portable.  A STREAMS driver can utilize the same LiS semaphore mechanism across different versions of the Linux kernel.  This pushes the kernel differences into LiS and out of the STREAMS driver code.
  • LiS semaphores are documented.  You don't have to read the kernel source code to figure out how to use them.

For these reasons I highly recommend that STREAMS drivers use the LiS semaphore implementation in place of the direct kernel semaphores.  The portability aspect of LiS semaphores cannot be overemphasized.  Different Linux kernel compile-time options can lead to a proliferation of STREAMS driver code versions, or the necessity of always compiling the driver from source when it is installed.  LiS semaphores allow a STREAMS driver to be compiled independently of kernel options with only the binary needed at driver installation time.

To declare an LiS semaphore, use a declaration similar to the following:

lis_semaphore_t    mysem ;

LiS semaphores must be initialized before they are used.  Use the following routine to initialize a declared semaphore.

void     lis_sem_init(lis_semaphore_t *,int);

If you initialize the semaphore to 0, then the first "down" operation on the semaphore will wait.  If you initialize it to 1, then the first "down" operation will not wait.  If you initialize it to n, then the first n "down" operations will not wait.

You can use the semaphore value to manage a pool of resources by initializing a semaphore to the number of items in the resource and having a driver open routine perform a "down" operation on the semaphore.  This causes the open operations to be queued until the resource is available.

For further information on semaphores, see the section on debugging semaphores.

Back to Contents


The following two routines are used to acquire and release a semaphore.

int      lis_down(lis_semaphore_t *sem) ;
void     lis_up(lis_semaphore_t *sem) ;

The routine lis_down returns 0 for success and a negative error code for failure.  The caller has not acquired the semaphore unless the routine returns zero.

One reason for a negative return could be that the calling task was signalled while waiting for the semaphore to become available.  If this has occurred the return code will be set to -EINTR.

Semaphores cannot be used in nested fashion.  Care must be exercised that a single thread only performs one "down" operation on a given semaphore.

When multiple threads use multiple semaphores to protect multiple resources, it is always a good idea if all threads execute "down" operations on the multiple semaphores in the same order.  It is also highly recommended that they execute "up" operations in the exact reverse order as the "down" operations.  This avoids so-called "deadly embrace" situations in which process A acquires semaphore A, process B acquires semaphore B, and then process A waits on B while process B waits on A.

Semaphores should be used only in STREAMS driver "open" and "close" routines.  STREAMS driver "put" and "service" procedures are not allowed to sleep.  They should use spin locks instead of semaphores.

Usage example:

if (lis_down(&mysem) == 0)
{
    ...
    lis_up(&mysem) ;
}
Back to Contents
 

Debugging Spin Locks

LiS spin lock structures contain fields that assist in the debugging of spin-lock related problems. The LiS spin lock structure contains the following fields.

  Field Description
  spin_lock_mem An opaque memory area that contains the kernel's spin lock structure.
  name Pointer to an ASCII name for the lock. This allows one to readily identify the function of the lock (assuming that it is aptly named).
  taskp A (void *) which is really a (struct task_struct *) pointer. It points to the task that originally acquired the lock, or is NULL if no task has acquired the lock.
  spinner_file, spinner_line File and line number of the most recent call to one of the lis_spin_lock functions. This tells which line of code most recently tried to get the lock.
  owner_file, owner_line File and line number of the call to one of the lis_spin_lock functions that first acquired the lock. These fields are set at the same time as the taskp field.
  unlocker_file, unlocker_line File and line number of the call to one of the lis_spin_unlock functions that performed the final unlock on the lock, thus making it available for another thread. These fields are set at the same time as the taskp field is set to NULL.

If a thread owns the lock then its value of the current task pointer will be in taskp. If there is no other thread spinning on the lock, and if the lock has not been acquired in a nested fashion, then the spinner and owner fields will indicate the same file and line number.

If the spinner and owner fields are different and if the taskp is non-NULL then if the thread that most recently called one of the lis_spin_lock routines is different from the task that owns the lock, then that other task is spinning on the lock. By examination of the lock you can see which task owns the lock and where in the code it was acquired. This is often enough information to figure out why a deadlock is occurring.

A "deadly embrace" occurs when two threads each need to acquire two spin locks but they acquire them in the opposite order from each other. Under circumstances of contention each process owns the lock that the other is spinning on and will not release the lock until it acquires the other lock. Thus, both threads spin forever.

Note that the LiS splstr and splx functions are written in terms of LiS spin locks. LiS does not use these routines internally. They are provided to the user for backwards compatibility. However, it is important to know that these routines are spin locks in disguise. This means that the order of use of these functions mixed in with explicit spin lock manipulations may also lead to deadly embraces.

An effective technique for troubleshooting these kinds of problems is to use the two-machine kernel debugger, kgdb. With this setup you can break into the target machine and look at memory using high level debugging techniques, including printing out of structures. Using kgdb you can find out where each CPU is executing, look at the corresponding source code lines, observe the locks that are involved, and then print out the lis_spin_lock_t structures for the specific locks. Oftentimes the information contained in the two locks will immediately reveal the nature of the deadly embrace.

It is also possible to have LiS trace all lock and semaphore operations. One of the LiS debug bits enables this function. To set this debug bit use the following command.

streams -d0x0x80000

This causes LiS to make entries in a global trace buffer named lis_spl_track. The global pointer lis_spl_track_ptr indicates the next location in the table into which an entry is to be placed, which means that it points to the oldest entry in the buffer. Entries in the buffer are of type spl_track_t.

The fields of this structure are as follows.

  Field Description
  type
The type of entry as follows.
Value
Meaning
1
splstr
2
splx
3
spin lock
4
spin unlock
5
semaphore down
6
semaphore up
  cpu The cpu number of the processor which made this entry.
  addr The address of the spin lock or semaphore involved in the operation.
  tskp The task pointer for the task that made this entry.
  state Nesting value for spin locks, count field of the semaphore.
  file, line File and line number of the call to the LiS locking or semaphore routine that caused this entry to be made.

The trace buffer contains 4096 of these entries, maintained in a circular fashion. By printing out these entries you can see the history of lock manipulation within LiS. The command streams -p causes LiS to print out this table from within the kernel. The resulting output can be found in /var/log/messages (typically). However, in practice the system is usually hung when you need this information so you end up printing it from within the debugger.

Back to Contents


Debugging Semaphores

LiS semaphore structures contain fields that assist in the debugging of semaphore related problems. The LiS semaphore structure contains the following fields.

  Field Description
  sem_mem An opaque memory area that contains the kernel's semaphore structure.
  taskp A (void *) which is really a (struct task_struct *) pointer. It points to the task that most recently acquired the semaphore, or is NULL if no task has acquired the semaphore. The taskp is set to NULL just prior to calling the kernel's up routine on the semaphore. Thus it stays NULL if no other task is pending on the semaphore.
  downer_file, downer_line File and line number of the most recent call to the lis_down function. This tells which line of code most recently tried to get the semaphore.
  owner_file, owner_line File and line number of the call to the lis_down function that acquired the semaphore. These fields are set at the same time as the taskp field.
  upper_file, upper_line File and line number of the call to the lis_up function. These fields are set at the same time as the taskp field is set to NULL.

If the taskp field is non-NULL then the semaphore is owned by the task so indicated. If it is NULL then the semaphore is unowned. The upper fields show where the semaphore was last released.

If the downer and owner fields both indicate the same file and line number then that is an indication that the semaphore was acquired at that location in the program. If they are different, and if the taskp is non-NULL, that is an indication that there is a task waiting on the semaphore at the downer location. The owner fields show where the semaphore was acquired.

Bear in mind that semaphore acquisitions do not nest as is the case with spin locks. Therefore, if the same thread calls lis_down without calling lis_up on the same semaphore then the thread will be deadlocked. The downer and owner fields will usually offer a clue to this type of deadlock.

You can also use the LiS lock trace buffer mechanism to assist in debugging semaphore usage.

Back to Contents


STREAMS Utility Routines

The following routines are available to LiS STREAMS drivers.  These are standard AT&T SVR4 utility routines.  They (hopefully) have the same semantics in LiS as they do in SVR4 STREAMS.

These routines are presented here in alphabetical order with no description.  Please refer to the AT&T SVR4 STREAMS documentation for the descriptions of these routines.


Flushing Queue Bands

A special note on flushing queue bands is in order. The rules for flushing queues are a bit complex, so we wish to review them here in some detail.

First some definitions and some things that affect all queue flushing. The term "data message" in the context of queue flushing means messages of type M_DATA, M_PROTO, M_PCPROTO or M_DELAY. All other message types are considered "non-data messages". You may find it less than intuitive that M_PCPROTO is considered a "data message".

The term "ordinary message" in the context of queue flushing means messages of type M_DATA, M_PROTO, M_BREAK, M_CTL, M_DELAY, M_IOCTL, M_PASSFP, M_RSE, M_SETOPTS or M_SIG. Please note that M_PCPROTO is not on this list.

The flag argument of FLUSHDATA means that only "data messages" are to be flushed. The flag argument of FLUSHALL means that "all" messages are to be flushed. As we shall see, in flushing queue bands whether a message gets flushed or not depends upon what the meaning of the word "all" is.

First, let's take the case of the routine flushq(q,flag). If flag is set to FLUSHDATA then all "data messages" in the entire queue, including all queue bands, are flushed. If the flag is set to FLUSHALL then the entire queue is flushed.

The case of the routine flushband(q,band,flag) is more complicated.

If the band argument is zero then special rules apply. In this case, only "ordinary" messages are flushed from the queue. The value of the flag parameter does not influence the operation. In Solaris STREAMS this behavior does not occur. They flush either "data messages" or "all" messages on band zero. Comments in the Solaris 8 source code indicate that the author of the flush code was somewhat confused on this point.

If the band argument is non-zero then the specific band of the queue is flushed in a manner similar to that of flushq. That is, the flag argument of FLUSHDATA means just flush "data messages" and the value of FLUSHALL means flush "all" messages from the specific band.

One further item needs some attention. Whenever an M_PCPROTO (or other "high priority") message is inserted into a STREAMS queue it is queued ahead of all messages in any queue band. This means that an M_PCPROTO cannot be directed to a queue band. It also means that flushband can never flush an M_PCPROTO, or any other "high priority" message from the queue. In order to flush M_PCPROTOs you must call flushq and flush the entire queue of either "data messages" or "all" messages.

Back to Contents


int          adjmsg(mblk_t *mp, int length);
struct msgb *allocb(int size, unsigned int priority); 

queue_t *backq(queue_t *q);
int      bcanput(queue_t *q, unsigned char band);
int      bcanputnext(queue_t *q, unsigned char band);
void     bcopy(void *src, void *dst, int nbytes) ;
int      bufcall(unsigned size, int priority, void (*function)(long), long arg); 
void     bzero(void *addr, int nbytes) ; 

int     canput(queue_t *q);
int     canputnext(queue_t *q);
void    cmn_err(int err_lvl, char *fmt, ...) ;       
mblk_t *copyb(mblk_t *mp);
mblk_t *copymsg(mblk_t *mp); 

#define datamsg(type)   -- true if msg->b_datap->db_type is data
mblk_t *dupb(mblk_t *mp);
mblk_t *dupmsg(mblk_t *mp); 

void    enableok(queue_t *q); 
mblk_t *esballoc(unsigned char *base, int size, int priority, frtn_t *freeinfo); 
int     esbbcall(int priority, void (*function)(long), long arg);

void    flushband(queue_t *q, unsigned char band, int flag);
void    flushq(queue_t *q, int flag);
void    freeb(mblk_t *bp);
void    freemsg(mblk_t *mp);

int     getmajor(dev_t dev) ; 
int     getminor(dev_t dev) ; 
mblk_t *getq(queue_t *q);

int  insq(queue_t *q, mblk_t *emp, mblk_t *mp); 

void   *kmem_alloc(int siz, int wait_code);
void   *kmem_zalloc(int siz, int wait_code);
void    kmem_free(void *ptr,int siz);

void    linkb(mblk_t *mp1, mblk_t *mp2);

int     msgdsize(mblk_t *mp); 
mblk_t *msgpullup(mblk_t *mp, int length); 
int     msgsize(mblk_t *mp); 

void noenable(queue_t *q); 

queue_t *OTHERQ(queue_t *q); 

int  pullupmsg(mblk_t *mp, int length); 
int  putbq(queue_t *q, mblk_t *mp); 
int  putctl(queue_t *q, int type); 
int  putctl1(queue_t *q, int type, int param); 
void putnext(queue_t *q, mblk_t *mp); 
int  putnextctl(queue_t *q, int type); 
int  putnextctl1(queue_t *q, int type, int param); 
int  putq(queue_t *q, mblk_t *mp); 

void qenable(queue_t *q); 
void qreply(queue_t *q, mblk_t *mp); 
int  qsize(queue_t *q); 
void qprocsoff(queue_t *rdq) ; 
void qprocson(queue_t *rdq) ; 

queue_t *RD(queue_t *q);
queue_t *WR(queue_t *q);
queue_t *OTHERQ(queue_t *q);
mblk_t  *rmvb(mblk_t *mp, mblk_t *bp); 
void     rmvq(queue_t *q, mblk_t *mp); 

int  SAMESTR(queue_t *q); 
int  strqget(queue_t *q, qfields_t what, unsigned char band, long *val); 
int  strqset(queue_t *q, qfields_t what, unsigned char band, long val); 

int  testb(int size, unsigned int priority); 

#define HZ      -- ticks per second 
typedef void    timo_fcn_t(caddr_t arg) ; 
toid_t          timeout(timo_fcn_t *timo_fcn, caddr_t arg, long ticks);
toid_t          lis_untimeout(toid_t id) ;        

void    unbufcall(int bcid); 
mblk_t *unlinkb(mblk_t *mp); 
int     untimeout(int id) ; 

queue_t *WR(queue_t *q); 

int  xmsgsize(mblk_t *mp); 
                

Back to Contents