Pipe, Fork and Exec and Related Topics

Pipe, Fork, Exec and Related Topics

exec* is the only way to execute programs. fork is the only way to create a new process. The only exception is for the system boot (kernel start).

Exec

exec replaces the instruction and data segments by those inferred from the indicated file and starts the process running. The system data segment is unaltered. So the PID, current working directory and file descriptors are unaltered. Open files remain open except if fcntl(2) has been used to set the close-on-exec flag. Note that unflushed I/O is lost since the malloc arena which contains stdio buffers is in the data segment and is lost. Ignored signals stay that way while all others are reset to the default. (See signal notes.) The effective user or group id of the process may change if the file has set-user-id or set-group-id mode. For example, /usr/bin/ps is effectively run as superuser even if the owner of the shell process that exec'ed it is a non-privileged user.

execve(2) is the system call. The other members of the exec family do some processing prior to making the system call. Usually execlp(2) or execvp(2) is used.


    execvp(char *filename, char *argv[ ]);


    execlp(char *filename, char *argv0, char *argv1, ..., NULL);

Here p means look for executable filename on user's PATH if filename does not contain a /, v means argument vector, and l means argument list. The argv parameters set the process's argument list. The contents of the argv style array pointed to by char **environ are passed to the process as the process's environment. For example, ls -l could be exec'ed with either execlp("ls","ls","-l",NULL) or execvp("ls", argv) assuming argv[0]="ls", argv[1]="-l", and argv[2]=NULL.

A successful exec will not return. It typically fails because the file does not exist or is of the wrong type. Typical code is:

    execvp( ... );

    /* exec failed. Put code to handle failure here. */

Fork

fork(2) creates a child. fork( ) returns -1 if it fails (no child), 0 if the process is the child and the child's pid if the process is the parent. It looks tricky and it is, because the code must handle two processes.

The child is an exact copy of the parent process except it has a different PID, PPID and its accumulated usages are zero. It has its own text, data and system data segments. This implies that a malloc( ) in the child does not affect the parent's malloc arena. The child has its own file descriptors but the descriptor refers to the same open file table entry. This is very useful for pipes. However, since read's or write's are done via the same file pointer in the open file table, the child's and parent's read's or write's on the same descriptor will be mixed or interleaved unless these processes reopen the descriptors on new files.

For stdio functions, the child and parent have separate buffers (in their separate malloc arenas) and the input or output blocks will be mixed when the buffers are flushed. Obviously, stdio buffers should be flushed before a fork( ) which would be the parent's responsibility. Otherwise, both the child and the parent will have buffers with a copy of the unflushed data, which could result in duplicate copies on the I/O device. Alternately make sure the child does no I/O before it exec's; if it fails to exec use _exit( ) not exit( ) to prevent flushing. In any case, the parent and child should make sure that active file descriptors which are not used for communication are closed or reopened on other files.

    pid_t pid;

    pid = fork();
    if (pid == -1) { 
	/* fork error - cannot create child */
	perror("Demo program");
	exit(1);
	}
    else if (pid == 0) {
        /* code for child */
	/* probably a few statements then an execvp() */ 
	_exit(1);  /* the execvp() failed - a failed child
                      should not flush parent files */
	}
    else { /* code for parent */ 
        printf("Pid of latest child is %d\n", pid);
            /* more code */
	exit(0);
        }

Usually the child will do a few things and then execvp( ). Making a copy of a process only to have it replaced shortly seemed wasteful so BSD Unix introduced a kludge, vfork( ), which makes the parent wait until the child does an exec, and in the meantime the child uses the parent's space. Most modern UNIX's have paging systems which make vfork( ) unnecessary. Pages are marked Copy-On-Write and since the child will write few or no pages before the exec, fork( ) achieves most of the savings but without the kludge.

Pipe

Having declared int pfildes[2]; the system call


pipe(pfildes)

creates a pipe or returns -1 on error. pfildes[0] is the descriptor for the ``read from'' pipe end. pfildes[1] is the descriptor for the ``write to'' pipe end.

[1]

>-pipe->

[0]

One implementation for pipes is as a circular buffer with a read and a write pointer that cannot cross. On SYS V (but not BSD) pipes are implemented as ``streams'' so they are bidirectional, so you can read and write on each end and could interchange 0 and 1.

Related Functions

close(fildes) deletes fildes and closes the underlying file if this is the last reference to the file (and if the device driver allows it).
newfildes = dup(fildes) duplicate fildes entry at lowest available entry: newfildes.
newfildes = fcntl(fildes, F_DUPFD,min) is like dup but newfildes is greater than or equal min.
dup2(fildes,newfildes) closes newfildes if not equal to fildes, then duplicates fildes entry at newfildes.

The shell's 4>&7 is dup2(7,4).

To illustrate pipes we implement cat a | /usr/bin/wc making /usr/bin/wc the parent.

cat a [1] >-pipe-> [0] /usr/bin/wc

    if (pipe(pfildes) == -1)    {perror("demo"); exit(1);}
    if ((pid = fork()) == -1)   {perror("demo"); exit(1);}
    else if (pid == 0) {        /* child:  "cat a"                      */
          close(pfildes[0]);    /* close read end of pipe               */
          dup2(pfildes[1],1);   /* make 1 same as write-to end of pipe  */
          close(pfildes[1]);    /* close excess fildes                  */
          execlp("cat","cat","a",NULL);
          perror("demo");       /* still around?  exec failed           */
          _exit(1);             /* no flush                             */ 
        }
    else {                      /* parent:  "/usr/bin/wc"               */
          close(pfildes[1]);    /* close write end of pipe              */
          dup2(pfildes[0],0);   /* make 0 same as read-from end of pipe */
          close(pfildes[0]);    /* close excess fildes                  */
          execlp("/usr/bin/wc","wc",NULL);
          perror("demo");       /* still around?  exec failed           */
          exit(1);	        /* parent flushes                       */
        }

Stdio and File Descriptors

The stdio functions are based on system calls. They buffer I/O and enforce type checking.

fileno(FILE *fp) returns the file descriptor associated to fopen'ed fp. fopen( ) creates an I/O buffer and FILE structure after calling:

    openedfildes = open("filename", flags, access_mode)

fopen's flags translate into open's flags as follows:

"r" becomes O_RDONLY,

"w" becomes O_WRONLY | O_CREAT | O_TRUNC,

"a" becomes O_WRONLY | O_CREAT | O_APPEND, and

"r+" becomes O_RDWR.
Check open(2) for additional flags. WRONLY means write, RDONLY read, RDWR read and write, CREAT create if does not exist, TRUNC truncate. For debugging use O_EXCL which prevents overwrite. access_mode is typically 0666, the octal for permissions rw_rw_rw_, which will then be masked by the process umask. If the umask is 0077, the result is rw_______ which is the permissions the file will be created with. Use chmod(2) to override.

scanf, getchar, [printf, putchar,] etc. read [write] from the buffer, but once empty [full] the system call read(2) [write(2)] is made to read [write] up to a specified number of bytes from [to] the file. Obviously stdio function calls should not ordinarily be mixed with system I/O calls.

Redirection

To implement >> outfile, for example, use:

    fildes = open("outfile", O_WRONLY|O_CREAT|O_APPEND, 0666);
    if (fildes == -1) { /* bad file name */ perror("demo"); exit(1);}
    dup2(fildes,1);   /* copy fildes to 1 */
    close(fildes);    /* conserve file descriptors by closing excess one */

Process ID Get or Set

PID's are of type pid_t (usually signed longs). Use: #include <sys/types.h>


    pid = getpid()    /* gives PID of current process               */


    ppid = getppid()          /* gives PPID (parent PID) of current process */

To facilitate signalling, processes are organized into process groups. Every process is a member of a process group, which is identified by the PGID (process group id). If the PGID is the same as the PID then the process is called a process group leader. So the PGID of a process is the PID of its leader. The PGID does not change on a fork( ) or exec( ). Often, the PGID will be the PID of the closest ancestor interactive shell, which is the leader.

Get or set the PGID with the system calls:


    pgid = getpgid(of_pid)      /* get process group id for of_pid */


    setpgid(of_pid, to_pgid)    /* move of_pid into group to_gpid  */


    setpgid(of_pid, 0)          /* make of_pid a leader            */

If of_pid is 0 in the above then the current process's id is substituted.

Termination Status and Waiting for Children

When a process exits, for example, with exit(13) or with return 13 from main, a value is returned to the operating system. Also when a process stops due to a signal, for example, if a user types ^C or ^Z or if there is a division by zero, then the signal number is returned to the operating system. Use the macros provided in wait.h (see wstat(5)) to test or process termination status. Typically 16 bits are used as follows but you should use the macros to make your code portable:

	bits 15-8	bit 7	bits 6-0	Access macros
`exit(n)` or `return n`:	`n`	0	0000000	WIFEXITED WEXITSTATUS
signal (core dump):	00000000	1	signal number	WIFSIGNALED WTERMSIG
signal (no core dump):	00000000	0	signal number	WIFSIGNALED WTERMSIG
stop/suspend signals:	signal number	0	1111111	WIFSTOPPED WSTOPSIG

The best call for getting the status of a child is waitpid(2):


    childpid = waitpid( pid, & status , style)

childpid is the pid whose status is being returned.
If pid is positive then wait for child with that PID, if -1 then wait for any child, if 0 then wait for any child in same process group and if negative then wait for any child in process group: -pid.
status is an int but note the address operator &.
style is 0 to wait for some child, WNOHANG to poll and return immediately, or WUNTRACED to also report children that stopped.

When a child dies, it disappears except for the PID and status. The child is said to become a zombie, and appears as <defunct> on ps -ef listings. Zombies only disappear once they have been waited on, so every process should be waited on by somebody ultimately. A user whose process's do not wait on their zombies could easily run out of processes so fork will fail. If the parent of a child dies, then the orphaned child is adopted not by its grandparent but instead by init by using the simple device of setting the child's PPID to 1. init is careful to wait on all its children including zombies.

File and I/O Control Functions


fcntl(2)

- file control.


stat(2)

- file info (i-node).


ioctl(2)

- I/O control: disk labels, files, mag tape, sockets, terminals.

tcget*, tcset* - strongly recommended replacements for most terminal related ioctl. See termios(3).
See the example on setting a terminal to raw mode. Another example follows:

  /* get window size */
    #include <termios.h>   /* termio(7I) */
    #include <stdio.h>
    main() { 
      struct winsize wsz;
 	
      ioctl(1, TIOCGWINSZ, (void *) &wsz);   /* Get window size. */
      printf("Window has %d rows, %d cols\n", wsz.ws_row, wsz.ws_col);
      wsz.ws_row = 10;  
      wsz.ws_col = 90;
      ioctl(1, TIOCSWINSZ, (void *) &wsz);   /* Attempt to set window size. */
        /* This sends SIGWINCH (ignored by default) to process group
           which must catch signal and change the window. */
      return 0; }

Problems With Multiple Processes

Deadlock: If a child and a parent both need exclusive use of a pair of files and one opens and locks one file and the other opens the other file then neither will ever complete. A similar situation can occur if both are reading and writing (say to each other).

Race: The outcome depends on who gets there first, that is, which process gets the most CPU cycles. Consider the following innocent looking code:

    ppid = getpid(); /* ppid is the parent id */
    if (fork() > 0)  /* parent */ exit(0);
    else { /* child */ 
           if (ppid == getppid()) 
                printf("parent alive");  /* if child gets CPU */
           else printf("parent dead");   /* if parent gets CPU */
         }

Job Control

A job is an executing pipeline list. Job control refers to the following features:

A job can be stopped, for possible later restart.
A job can be restarted in the background; the job runs until I/O with the terminal is required, at which point it is automatically stopped.
A job can be restarted in the foreground; the job runs with I/O passing freely to the terminal.

Shells ksh and csh but not sh have job control. The user requests (1) by typing ^Z while the job is in the foreground. (2) is requested by typing bg job identifier. A job can also be run in the background initially by terminating the pipeline list with an &. (3) is requested by typing fg job identifier. Usually the command stty tostop is put in the .profile file. This prevents background jobs from writing to the terminal, which ordinarily they are allowed to do.

System Facilities for Implementing Job Control

To facilitate signalling, processes are organized into process groups. See above.

Process groups are organized into sessions (basically logins) which are capable of having controlling terminals. Session IDs are used. A process may get or become the session leader subject to restrictions by using: getsid( ), setsid( ).

Although the terminal is a (device) file and not a process, a terminal has an associated PGID. A process is allowed to read or write to the terminal only if its process group is the same as the terminal's PGID. First open a file descriptor for the terminal:


    ttyd = open("/dev/tty",O_RDWR,0700); /* portable: ctermid(NULL) is "/dev/tty" */

Then get or set the terminal group with termios functions:


    pgid = tcgetpgrp(ttyd);

       
    tcsetpgrp(ttyd, to_pgid);        /* but first ignore SIGTTOU - see below */

or with ioctl on older systems:


    ioctl(ttyd,TIOCGPGRP, &pgid);   /* note a pointer & is passed */


    ioctl(ttyd,TIOCSPGRP, &to_pgid);

There are four kinds of stop signals: SIGTSTP, SIGSTOP (an uncatchable SIGTSTP), SIGTTOU and SIGTTIN and a continue signal: SIGCONT. Usually these signals are sent to process groups instead of individual processes. When a process not in the terminal's group attempts to read from the terminal all members of the first process group are sent the SIGTTIN signal; the default action on SIGTTIN is to suspend the process. SIGTTOU is used similarly for write attempts. SIGTSTP and SIGSTOP also suspend a process. The SIGCONT signal restarts a process (but, of course, if the terminal's group is not changed a SIGCONT after a SIGTTIN results in another SIGTTIN).

kill(2) and killpg(2) are used to send these signals to processes and process groups. kill( ) is usually used for both since by convention, a negative PID is taken to mean the process group with PGID equal to -PID.

waitpid(2) is used to wait for children in a specific process group, looking also for children that became stopped since the last report and returning immediately:


    waitpid( -pgid, &status, WUNTRACED | WNOHANG );

Implementing Job Control

The shell sh or a noninteractive shell, such as called from system(3S), popen(3S), from a makefile or from emacs's compile feature, does not change process groups. That way whenever one process stops, all others will stop too. From now on we consider interactive shells only.

The shell maintains a table of jobs. Before every prompt for a command, the shell does a

waitpid(-1, &status, WNOHANG | WUNTRACED)
to check if any of its children have changed status. Status changes are reported to the user and table of jobs is updated.

The shell executes a pipeline list by forking off a child to handle the list. This child immediately makes itself a process group leader by changing its PGID to its own PID. To avoid race conditions the shell should also issue the command to change the process group of the child. Then the child and any of its descendants can be signaled as a unit. No descendent will leave this group (excepting perhaps a new interactive subshell).

If the shell wishes to run this child in the foreground, then it will change the terminal's process group to that of the child and then do a waitpid( ) on the child's pid. The child now has control of the terminal and the shell disappears as far as the user is concerned. A ^Z, if not caught by the child, will stop the child (and the group) and the shell's waitpid( ) will return. The shell determines that a stop was sent to the child by looking at the status value. At this point the child still has control of the terminal. The shell uses an tcsetpgrp( ) to get the terminal back but here is a tricky detail: since the shell is not in the terminal's group the tcsetpgrp( ) will be sent a SIGTTOU signal. In order to prevent being stopped by this signal, the shell must make sure that it has ignored (SIG_IGN) signal SIGTTOU before doing the tcsetpgrp( ). After recording the child as a suspended job in the shell's table of jobs and resetting SIGTTOU, the shell proceeds probably prompting for the next command.

If the shell wishes to run the child in the background, then no waitpid( ) is done and the terminal's process group remains that of the shell. The entire child process group will be sent a SIGTTIN or SIGTTOU and become suspended if any descendent attempts I/O on the terminal. If no attempt is made the child runs to completion and turns into a zombie until the shell finally does a waitpid( ) for it.

If the shell receives a fg command referring to a background child, a SIGSTSP is sent to the child's process group, the terminal is given to the child via an tcsetpgrp( ), a SIGCONT is sent to the child's process group, and then the shell does a waitpid( ) for this child.

If a bg command is received for a currently suspended job, then a SIGCONT is sent to the child's process group and the shell proceeds without doing a waitpid( ).

Last update: 2001 April 28