Pipe, Fork, Exec and Related Topics | 
exec* is the only way to 
execute programs.  fork is the only way to 
create a new process.  The only exception is for 
the system boot (kernel start).
| Exec | 
exec replaces the 
instruction and data segments by those inferred from the indicated
file and starts the process running.  
The system data segment is unaltered.  So the PID, current working
directory and file descriptors are unaltered.  
Open files remain open except if
fcntl(2) has been used to set 
the close-on-exec flag.  Note that unflushed
I/O is lost since the malloc arena which contains stdio buffers is in the data
segment and is lost.  Ignored signals stay that way while all others are reset
to the default.  (See signal notes.)  
The effective user or group id of the
process may change if the file has set-user-id or set-group-id mode.  
For example,
/usr/bin/ps is effectively run 
as superuser even if the owner of the shell process
that exec'ed it is a non-privileged user.
execve(2) is the system call.  
The other members of the exec family do some
processing prior to making the system call.  Usually 
execlp(2) 
or execvp(2) is used.
    execvp(char *filename, char *argv[ ]);
    execlp(char *filename, char *argv0, char *argv1, ..., NULL);
p means 
look for executable filename
on user's PATH if 
filename does not contain a /, 
v means argument
vector, and l means argument list.  
The argv parameters set the process's
argument list.  The contents of the argv style array pointed to by 
char **environ are passed to the process 
as the process's environment.
For example, ls -l could 
be exec'ed with 
either execlp("ls","ls","-l",NULL)
or execvp("ls", argv) assuming 
argv[0]="ls",
argv[1]="-l", and
argv[2]=NULL.
A successful exec will not return.  
It typically fails because the file does not
exist or is of the wrong type.  Typical code is:
    execvp( ... );
    /* exec failed. Put code to handle failure here. */
| Fork | 
fork(2) creates a child.  
fork( ) returns -1 if it fails (no child), 
0 if the process is the child and the
child's pid if the process is the parent.  
It looks tricky and it is, because the code
must handle two processes.
The child is an exact copy of the parent process
except it has a different PID, PPID and its accumulated usages are zero.  
It has
its own text, data and system data segments.  
This implies that a malloc( ) in
the child does not affect the parent's malloc arena.  
The child has its own file descriptors but
the descriptor refers to the same open file table entry.  This
is very useful for pipes.  
However, since read's or
write's are 
done via the same
file pointer in the open file table, 
the child's and parent's read's or
write's 
on the same descriptor will be mixed or interleaved 
unless these processes reopen the
descriptors on new files.  
For stdio functions, the child and parent have
separate buffers (in their separate malloc arenas)
and the input or output blocks
will be mixed when the buffers are flushed.
Obviously, stdio buffers should be flushed before a fork( )
which would be the parent's responsibility.  Otherwise,
both the child and the parent will have buffers with a copy 
of the unflushed data, which could result in duplicate copies on the
I/O device.  Alternately make sure the child does no I/O before it
exec's;  
if it fails to exec 
use _exit( ) not
exit( ) to prevent flushing.  In any case, the parent and
child should make sure that active file descriptors which are not
used for communication are closed or reopened on other files.
    pid_t pid;
    pid = fork();
    if (pid == -1) { 
	/* fork error - cannot create child */
	perror("Demo program");
	exit(1);
	}
    else if (pid == 0) {
        /* code for child */
	/* probably a few statements then an execvp() */ 
	_exit(1);  /* the execvp() failed - a failed child
                      should not flush parent files */
	}
    else { /* code for parent */ 
        printf("Pid of latest child is %d\n", pid);
            /* more code */
	exit(0);
        }
Usually the child will do a few things and then 
execvp( ).  Making a copy of a
process only to have it replaced shortly seemed wasteful 
so BSD Unix introduced a kludge, vfork( ), 
which makes the parent wait until the child does an exec, and
in the meantime the child uses the parent's space.  Most modern UNIX's have
paging systems which make vfork( ) unnecessary.  Pages are marked 
Copy-On-Write
and since the child will write few or no pages before the 
exec, 
fork( ) achieves
most of the savings but without the kludge.
| Pipe | 
int pfildes[2]; the system call 
pipe(pfildes) creates a pipe or returns -1 on error.
pfildes[0] is the descriptor 
for the ``read from'' pipe end.
pfildes[1] is the descriptor 
for the ``write to'' pipe end.
| [1] | >-pipe-> | [0] | 
One implementation for pipes is as a circular buffer with a read and a write pointer that cannot cross. On SYS V (but not BSD) pipes are implemented as ``streams'' so they are bidirectional, so you can read and write on each end and could interchange 0 and 1.
| Related Functions | 
close(fildes)    
deletes fildes and closes the underlying file if this
is the last reference to the file (and if the device driver allows it).
newfildes = 
dup(fildes)        
duplicate fildes entry at lowest available
entry:  newfildes.
newfildes = 
fcntl(fildes,
F_DUPFD,min) is like 
dup but 
newfildes is greater than 
or equal min.
dup2(fildes,newfildes)  
closes newfildes 
if not equal to fildes,
then duplicates fildes 
entry at newfildes.
4>&7 is 
dup2(7,4).
To illustrate pipes we implement cat a | /usr/bin/wc making 
/usr/bin/wc the parent.
cat a | [1] | >-pipe-> | [0] | /usr/bin/wc | 
    if (pipe(pfildes) == -1)    {perror("demo"); exit(1);}
    if ((pid = fork()) == -1)   {perror("demo"); exit(1);}
    else if (pid == 0) {        /* child:  "cat a"                      */
          close(pfildes[0]);    /* close read end of pipe               */
          dup2(pfildes[1],1);   /* make 1 same as write-to end of pipe  */
          close(pfildes[1]);    /* close excess fildes                  */
          execlp("cat","cat","a",NULL);
          perror("demo");       /* still around?  exec failed           */
          _exit(1);             /* no flush                             */ 
        }
    else {                      /* parent:  "/usr/bin/wc"               */
          close(pfildes[1]);    /* close write end of pipe              */
          dup2(pfildes[0],0);   /* make 0 same as read-from end of pipe */
          close(pfildes[0]);    /* close excess fildes                  */
          execlp("/usr/bin/wc","wc",NULL);
          perror("demo");       /* still around?  exec failed           */
          exit(1);	        /* parent flushes                       */
        }
| Stdio and File Descriptors | 
  returns the file descriptor associated to 
fileno(FILE *fp)fopen'ed 
fp.
fopen( ) creates an I/O buffer 
and FILE structure after calling:
    openedfildes = open("filename", flags, access_mode)
fopen's flags translate into 
open's flags as follows:
"r" becomes 
O_RDONLY,
"w" becomes  
O_WRONLY | O_CREAT | O_TRUNC,
"a" becomes  
O_WRONLY | O_CREAT | O_APPEND, and
"r+" becomes 
O_RDWR.
open(2) for additional flags. 
WRONLY means write, 
RDONLY read, 
RDWR read and write,
CREAT create if does not exist, 
TRUNC truncate.  
For debugging use O_EXCL 
which prevents overwrite.  
access_mode is typically 
0666,
the octal for permissions rw_rw_rw_, 
which will then be masked by the process umask. 
If the umask is 0077, the result is 
rw_______
which is the permissions the file 
will be created with.  
Use chmod(2) to override.
scanf, 
getchar, 
[printf, 
putchar,] etc.  
read [write] from the buffer, but once
empty [full] the system call 
read(2) 
[write(2)] is made to 
read [write] up to a
specified number of bytes from [to] the file.  Obviously stdio function calls
should not ordinarily be mixed with system I/O calls.
| Redirection | 
>> outfile, for example, use:
    fildes = open("outfile", O_WRONLY|O_CREAT|O_APPEND, 0666);
    if (fildes == -1) { /* bad file name */ perror("demo"); exit(1);}
    dup2(fildes,1);   /* copy fildes to 1 */
    close(fildes);    /* conserve file descriptors by closing excess one */
| Process ID Get or Set | 
pid_t (usually signed longs).  Use: 
#include <sys/types.h>
    pid = getpid()    /* gives PID of current process               */
    ppid = getppid()          /* gives PPID (parent PID) of current process */
To facilitate signalling, processes are organized into process groups.
Every process is a member of a process group, which is identified by the PGID
(process group id).  If the 
PGID is the same as the PID then the process is 
called a process group leader.  
So the PGID of a process is the PID of its leader.
The PGID does not change on a fork( ) or 
exec( ).  
Often, the PGID
will be the PID of the closest ancestor interactive shell, which is 
the leader.
Get or set the PGID with the system calls:
    pgid = getpgid(of_pid)      /* get process group id for of_pid */
    setpgid(of_pid, to_pgid)    /* move of_pid into group to_gpid  */ 
    setpgid(of_pid, 0)          /* make of_pid a leader            */
of_pid is 0 in the above 
then the current process's id is substituted.
| Termination Status and Waiting for Children | 
exit(13) or with 
return 13 from
main, a value is returned to the operating system.  
Also when a process stops
due to a signal, for example, if a user 
types ^C or 
^Z or if there is a division
by zero, then the signal number is returned to the operating system.  Use the
macros provided in wait.h 
(see wstat(5)) to test or process termination status.
Typically 16 bits are used as follows but 
you should use the macros to make your
code portable:
| bits 15-8 | bit 7 | bits 6-0 | Access macros | |
exit(n) or 
return n: 
 |  
n | 0 | 0000000 |   
WIFEXITED   WEXITSTATUS  | 
| signal (core dump): | 00000000 | 1 | signal number |  
 WIFSIGNALED   WTERMSIG  | 
| signal (no core dump): | 00000000 | 0 | signal number | 
 WIFSIGNALED   WTERMSIG  | 
| stop/suspend signals: | signal number | 0 | 1111111 | 
 WIFSTOPPED   WSTOPSIG  | 
waitpid(2):
    childpid = waitpid( pid, & status , style)
childpid is the pid whose status is being returned.pid is positive then wait for child with that PID, if
            -1       then wait for any child, if
            0        then wait for any child in same process group and if
            negative then wait for any child in process group: -pid.
status is an int 
but note the address operator &.style is  
0 to wait for some child,
WNOHANG to poll and return immediately, or
WUNTRACED to also report children that stopped.
When a child dies, it disappears except for the PID and status.  The child is
said to become a 
zombie, 
and appears as <defunct>
on ps -ef listings.
Zombies only disappear once they have been waited on, 
so every process should be
waited on by somebody ultimately.  
A user whose process's do not wait on their
zombies could easily run out of processes so fork will fail.  
If the parent of a
child dies, then the orphaned child is adopted not by its grandparent but
instead by init by using the simple device 
of setting the child's PPID to 1.
init is careful to wait on all its children including zombies.
| File and I/O Control Functions | 
fcntl(2) - file control.
stat(2)  - file info (i-node).
ioctl(2) - I/O control:  disk labels, files, mag tape, sockets, terminals.
tcget*, 
tcset* - strongly recommended replacements for 
most terminal related ioctl. 
See termios(3).
  /* get window size */
    #include <termios.h>   /* termio(7I) */
    #include <stdio.h>
    main() { 
      struct winsize wsz;
 	
      ioctl(1, TIOCGWINSZ, (void *) &wsz);   /* Get window size. */
      printf("Window has %d rows, %d cols\n", wsz.ws_row, wsz.ws_col);
      wsz.ws_row = 10;  
      wsz.ws_col = 90;
      ioctl(1, TIOCSWINSZ, (void *) &wsz);   /* Attempt to set window size. */
        /* This sends SIGWINCH (ignored by default) to process group
           which must catch signal and change the window. */
      return 0; }
| Problems With Multiple Processes | 
Race: The outcome depends on who gets there first, that is, which process gets the most CPU cycles. Consider the following innocent looking code:
    ppid = getpid(); /* ppid is the parent id */
    if (fork() > 0)  /* parent */ exit(0);
    else { /* child */ 
           if (ppid == getppid()) 
                printf("parent alive");  /* if child gets CPU */
           else printf("parent dead");   /* if parent gets CPU */
         }
             		   
| Job Control | 
^Z while the job is in the foreground.  
(2) is requested by typing bg job identifier.  
A job can also be run in the background initially by
terminating the pipeline list with an &.  
(3) is requested by typing fg job identifier.  
Usually the command stty tostop
is put in the .profile file.
This prevents background jobs from writing to the terminal, which ordinarily
they are allowed to do.
| System Facilities for Implementing Job Control | 
Process groups are organized into sessions (basically logins)
which are capable of having controlling terminals.  
Session IDs are used.  
A process may get or become the session leader 
subject to restrictions by using:
getsid( ), 
setsid( ).
Although the terminal is a (device) file and not a process, a terminal has an associated PGID. A process is allowed to read or write to the terminal only if its process group is the same as the terminal's PGID. First open a file descriptor for the terminal:
    ttyd = open("/dev/tty",O_RDWR,0700); /* portable: ctermid(NULL) is "/dev/tty" */
termios functions:
    pgid = tcgetpgrp(ttyd);
       
    tcsetpgrp(ttyd, to_pgid);        /* but first ignore SIGTTOU - see below */
ioctl on older systems:
    ioctl(ttyd,TIOCGPGRP, &pgid);   /* note a pointer & is passed */
    ioctl(ttyd,TIOCSPGRP, &to_pgid); 
There are four kinds of stop signals: 
SIGTSTP, 
SIGSTOP (an uncatchable
SIGTSTP), 
SIGTTOU and 
SIGTTIN and a continue signal: 
SIGCONT.  Usually these
signals are sent to process groups instead of individual processes.  When a
process not in the terminal's group attempts to read from the terminal 
all
members of the first process group are sent the 
SIGTTIN signal; the default action on 
SIGTTIN is to suspend the process.  
SIGTTOU is used similarly for write attempts.  
SIGTSTP and 
SIGSTOP also suspend a process.  The 
SIGCONT signal restarts a process 
(but, of course, if the terminal's group is not changed a
SIGCONT after a 
SIGTTIN results in another 
SIGTTIN).
kill(2) and 
killpg(2) 
are used to send these signals to processes and process groups.  
kill( ) is usually used for both since by convention, 
a negative PID
is taken to mean the process group with PGID equal to -PID.
waitpid(2) is used to wait for children in a specific process group, 
looking also for children that became stopped since the last report
and returning immediately:
    waitpid( -pgid, &status, WUNTRACED | WNOHANG ); 
| Implementing Job Control | 
system(3S), 
popen(3S), from a makefile or 
from emacs's 
compile feature, does not change
process groups.  That way whenever one process stops, all others will stop too.
From now on we consider interactive shells only.
The shell maintains a table of jobs. Before every prompt for a command, the shell does a
waitpid(-1, &status, WNOHANG | WUNTRACED)
The shell executes a pipeline list by forking off a child to handle the list. This child immediately makes itself a process group leader by changing its PGID to its own PID. To avoid race conditions the shell should also issue the command to change the process group of the child. Then the child and any of its descendants can be signaled as a unit. No descendent will leave this group (excepting perhaps a new interactive subshell).
If the shell wishes to run this child in the foreground, then 
it will change the terminal's process group to that of
the child and then do a
waitpid( ) on the child's pid. 
The child now has control of the terminal and the shell disappears
as far as the user is concerned.  A 
^Z, if not caught by the child, will stop
the child (and the group) and the shell's waitpid( ) will return.  
The shell
determines that a stop was sent to the 
child by looking at the status value.  At
this point the child still has control of the terminal.  The shell uses an
tcsetpgrp( ) to get the terminal back 
but here is a tricky detail:  since the
shell is not in the terminal's group the 
tcsetpgrp( ) will be sent a 
SIGTTOU signal.  
In order to prevent being stopped by this signal, the shell must make
sure that it has ignored (SIG_IGN) signal 
SIGTTOU 
before doing the
tcsetpgrp( ).  After recording the child as a 
suspended job in the shell's table
of jobs and resetting SIGTTOU, the shell proceeds 
probably prompting for the next command.
If the shell wishes to run the child in the background, then no 
waitpid( ) is
done and the terminal's process group remains that of the shell.  
The entire child process group will be sent a 
SIGTTIN or 
SIGTTOU and become suspended if
any descendent attempts I/O on the terminal.  
If no attempt is made the child
runs to completion and turns into a zombie until the shell finally
does a waitpid( ) for it.
If the shell receives a fg command referring to a 
background child,  
a SIGSTSP is sent to the child's process group, 
the terminal is given to the child via an tcsetpgrp( ), 
a SIGCONT is sent to the child's process group,
and then the shell does a waitpid( ) for this child.
If a bg command is received for a currently 
suspended job, then a SIGCONT is
sent to the child's process group and the shell proceeds without doing a
waitpid( ).
Last update: 2001 April 28