Pipe, Fork, Exec and Related Topics |
exec*
is the only way to
execute programs. fork
is the only way to
create a new process. The only exception is for
the system boot (kernel start).
Exec |
exec
replaces the
instruction and data segments by those inferred from the indicated
file and starts the process running.
The system data segment is unaltered. So the PID, current working
directory and file descriptors are unaltered.
Open files remain open except if
fcntl(2)
has been used to set
the close-on-exec
flag. Note that unflushed
I/O is lost since the malloc arena which contains stdio buffers is in the data
segment and is lost. Ignored signals stay that way while all others are reset
to the default. (See signal notes.)
The effective user or group id of the
process may change if the file has set-user-id or set-group-id mode.
For example,
/usr/bin/ps
is effectively run
as superuser even if the owner of the shell process
that exec
'ed it is a non-privileged user.
execve(2)
is the system call.
The other members of the exec
family do some
processing prior to making the system call. Usually
execlp(2)
or execvp(2)
is used.
execvp
(char *filename, char *argv[ ]);
execlp
(char *filename, char *argv0, char *argv1, ..., NULL);
p
means
look for executable filename
on user's PATH
if
filename
does not contain a /,
v
means argument
vector, and l
means argument list.
The argv
parameters set the process's
argument list. The contents of the argv style array pointed to by
char **environ
are passed to the process
as the process's environment.
For example, ls -l
could
be exec
'ed with
either execlp("ls","ls","-l",NULL)
or execvp("ls", argv)
assuming
argv[0]="ls"
,
argv[1]="-l"
, and
argv[2]=NULL
.
A successful exec
will not return.
It typically fails because the file does not
exist or is of the wrong type. Typical code is:
execvp( ... );
/* exec failed. Put code to handle failure here. */
Fork |
fork(2)
creates a child.
fork( )
returns -1 if it fails (no child),
0 if the process is the child and the
child's pid if the process is the parent.
It looks tricky and it is, because the code
must handle two processes.
The child is an exact copy of the parent process
except it has a different PID, PPID and its accumulated usages are zero.
It has
its own text, data and system data segments.
This implies that a malloc( )
in
the child does not affect the parent's malloc arena.
The child has its own file descriptors but
the descriptor refers to the same open file table entry. This
is very useful for pipes.
However, since read
's or
write
's are
done via the same
file pointer in the open file table,
the child's and parent's read
's or
write
's
on the same descriptor will be mixed or interleaved
unless these processes reopen the
descriptors on new files.
For stdio functions, the child and parent have
separate buffers (in their separate malloc arenas)
and the input or output blocks
will be mixed when the buffers are flushed.
Obviously, stdio buffers should be flushed before a fork( )
which would be the parent's responsibility. Otherwise,
both the child and the parent will have buffers with a copy
of the unflushed data, which could result in duplicate copies on the
I/O device. Alternately make sure the child does no I/O before it
exec
's;
if it fails to exec
use _exit( )
not
exit( )
to prevent flushing. In any case, the parent and
child should make sure that active file descriptors which are not
used for communication are closed or reopened on other files.
pid_t pid;
pid = fork();
if (pid == -1) {
/* fork error - cannot create child */
perror("Demo program");
exit(1);
}
else if (pid == 0) {
/* code for child */
/* probably a few statements then an execvp() */
_exit(1); /* the execvp() failed - a failed child
should not flush parent files */
}
else { /* code for parent */
printf("Pid of latest child is %d\n", pid);
/* more code */
exit(0);
}
Usually the child will do a few things and then
execvp( )
. Making a copy of a
process only to have it replaced shortly seemed wasteful
so BSD Unix introduced a kludge, vfork( )
,
which makes the parent wait until the child does an exec
, and
in the meantime the child uses the parent's space. Most modern UNIX's have
paging systems which make vfork( )
unnecessary. Pages are marked
Copy-On-Write
and since the child will write few or no pages before the
exec
,
fork( )
achieves
most of the savings but without the kludge.
Pipe |
int pfildes[2];
the system call
pipe
(pfildes)
creates a pipe or returns -1 on error.
pfildes[0]
is the descriptor
for the ``read from'' pipe end.
pfildes[1]
is the descriptor
for the ``write to'' pipe end.
[1] | >-pipe-> | [0] |
One implementation for pipes is as a circular buffer with a read and a write pointer that cannot cross. On SYS V (but not BSD) pipes are implemented as ``streams'' so they are bidirectional, so you can read and write on each end and could interchange 0 and 1.
Related Functions |
close
(fildes)
deletes fildes
and closes the underlying file if this
is the last reference to the file (and if the device driver allows it).
newfildes =
dup
(fildes)
duplicate fildes
entry at lowest available
entry: newfildes
.
newfildes =
fcntl
(fildes,
F_DUPFD
,min)
is like
dup
but
newfildes
is greater than
or equal min
.
dup2
(fildes,newfildes)
closes newfildes
if not equal to fildes
,
then duplicates fildes
entry at newfildes
.
4>&7
is
dup2(7,4)
.
To illustrate pipes we implement cat a | /usr/bin/wc
making
/usr/bin/wc
the parent.
cat a | [1] | >-pipe-> | [0] | /usr/bin/wc |
if (pipe(pfildes) == -1) {perror("demo"); exit(1);}
if ((pid = fork()) == -1) {perror("demo"); exit(1);}
else if (pid == 0) { /* child: "cat a" */
close(pfildes[0]); /* close read end of pipe */
dup2(pfildes[1],1); /* make 1 same as write-to end of pipe */
close(pfildes[1]); /* close excess fildes */
execlp("cat","cat","a",NULL);
perror("demo"); /* still around? exec failed */
_exit(1); /* no flush */
}
else { /* parent: "/usr/bin/wc" */
close(pfildes[1]); /* close write end of pipe */
dup2(pfildes[0],0); /* make 0 same as read-from end of pipe */
close(pfildes[0]); /* close excess fildes */
execlp("/usr/bin/wc","wc",NULL);
perror("demo"); /* still around? exec failed */
exit(1); /* parent flushes */
}
Stdio and File Descriptors |
returns the file descriptor associated to
fileno
(FILE *fp)fopen
'ed
fp
.
fopen( )
creates an I/O buffer
and FILE
structure after calling:
openedfildes = open("filename", flags, access_mode)
fopen
's flags translate into
open
's flags as follows:
"r"
becomes
O_RDONLY
,
"w"
becomes
O_WRONLY | O_CREAT | O_TRUNC
,
"a"
becomes
O_WRONLY | O_CREAT | O_APPEND
, and
"r+"
becomes
O_RDWR
.
open(2)
for additional flags.
WRONLY
means write,
RDONLY
read,
RDWR
read and write,
CREAT
create if does not exist,
TRUNC
truncate.
For debugging use O_EXCL
which prevents overwrite.
access_mode
is typically
0666
,
the octal for permissions rw_rw_rw_
,
which will then be masked by the process umask.
If the umask is 0077, the result is
rw_______
which is the permissions the file
will be created with.
Use chmod(2)
to override.
scanf
,
getchar
,
[printf
,
putchar
,] etc.
read [write] from the buffer, but once
empty [full] the system call
read(2)
[write(2)
] is made to
read [write] up to a
specified number of bytes from [to] the file. Obviously stdio function calls
should not ordinarily be mixed with system I/O calls.
Redirection |
>> outfile
, for example, use:
fildes = open("outfile", O_WRONLY|O_CREAT|O_APPEND, 0666);
if (fildes == -1) { /* bad file name */ perror("demo"); exit(1);}
dup2(fildes,1); /* copy fildes to 1 */
close(fildes); /* conserve file descriptors by closing excess one */
Process ID Get or Set |
pid_t
(usually signed longs). Use:
#include <sys/types.h>
pid = getpid
() /* gives PID of current process */
ppid = getppid
() /* gives PPID (parent PID) of current process */
To facilitate signalling, processes are organized into process groups.
Every process is a member of a process group, which is identified by the PGID
(process group id). If the
PGID is the same as the PID then the process is
called a process group leader.
So the PGID of a process is the PID of its leader.
The PGID does not change on a fork( )
or
exec( )
.
Often, the PGID
will be the PID of the closest ancestor interactive shell, which is
the leader.
Get or set the PGID with the system calls:
pgid = getpgid
(of_pid) /* get process group id for of_pid */
setpgid
(of_pid, to_pgid) /* move of_pid into group to_gpid */
setpgid
(of_pid, 0) /* make of_pid a leader */
of_pid
is 0 in the above
then the current process's id is substituted.
Termination Status and Waiting for Children |
exit(13)
or with
return 13
from
main, a value is returned to the operating system.
Also when a process stops
due to a signal, for example, if a user
types ^C
or
^Z
or if there is a division
by zero, then the signal number is returned to the operating system. Use the
macros provided in wait.h
(see wstat(5)
) to test or process termination status.
Typically 16 bits are used as follows but
you should use the macros to make your
code portable:
bits 15-8 | bit 7 | bits 6-0 | Access macros | |
exit(n) or
return n :
|
n | 0 | 0000000 |
WIFEXITED WEXITSTATUS |
signal (core dump): | 00000000 | 1 | signal number |
WIFSIGNALED WTERMSIG |
signal (no core dump): | 00000000 | 0 | signal number |
WIFSIGNALED WTERMSIG |
stop/suspend signals: | signal number | 0 | 1111111 |
WIFSTOPPED WSTOPSIG |
waitpid(2)
:
childpid = waitpid
( pid, & status , style)
childpid
is the pid whose status is being returned.pid
is positive then wait for child with that PID, if
-1 then wait for any child, if
0 then wait for any child in same process group and if
negative then wait for any child in process group: -pid
.
status
is an int
but note the address operator &
.style
is
0
to wait for some child,
WNOHANG
to poll and return immediately, or
WUNTRACED
to also report children that stopped.
When a child dies, it disappears except for the PID and status. The child is
said to become a
zombie,
and appears as <defunct>
on ps -ef
listings.
Zombies only disappear once they have been waited on,
so every process should be
waited on by somebody ultimately.
A user whose process's do not wait on their
zombies could easily run out of processes so fork
will fail.
If the parent of a
child dies, then the orphaned child is adopted not by its grandparent but
instead by init
by using the simple device
of setting the child's PPID to 1.
init
is careful to wait on all its children including zombies.
File and I/O Control Functions |
fcntl
(2)
- file control.
stat
(2)
- file info (i-node).
ioctl
(2)
- I/O control: disk labels, files, mag tape, sockets, terminals.
tcget*
,
tcset*
- strongly recommended replacements for
most terminal related ioctl
.
See termios(3)
.
/* get window size */
#include <termios.h> /* termio(7I) */
#include <stdio.h>
main() {
struct winsize wsz;
ioctl(1, TIOCGWINSZ, (void *) &wsz); /* Get window size. */
printf("Window has %d rows, %d cols\n", wsz.ws_row, wsz.ws_col);
wsz.ws_row = 10;
wsz.ws_col = 90;
ioctl(1, TIOCSWINSZ, (void *) &wsz); /* Attempt to set window size. */
/* This sends SIGWINCH (ignored by default) to process group
which must catch signal and change the window. */
return 0; }
Problems With Multiple Processes |
Race: The outcome depends on who gets there first, that is, which process gets the most CPU cycles. Consider the following innocent looking code:
ppid = getpid(); /* ppid is the parent id */
if (fork() > 0) /* parent */ exit(0);
else { /* child */
if (ppid == getppid())
printf("parent alive"); /* if child gets CPU */
else printf("parent dead"); /* if parent gets CPU */
}
Job Control |
^Z
while the job is in the foreground.
(2) is requested by typing bg job identifier
.
A job can also be run in the background initially by
terminating the pipeline list with an &
.
(3) is requested by typing fg job identifier
.
Usually the command stty tostop
is put in the .profile
file.
This prevents background jobs from writing to the terminal, which ordinarily
they are allowed to do.
System Facilities for Implementing Job Control |
Process groups are organized into sessions (basically logins)
which are capable of having controlling terminals.
Session IDs are used.
A process may get or become the session leader
subject to restrictions by using:
getsid( )
,
setsid( )
.
Although the terminal is a (device) file and not a process, a terminal has an associated PGID. A process is allowed to read or write to the terminal only if its process group is the same as the terminal's PGID. First open a file descriptor for the terminal:
ttyd = open("/dev/tty",O_RDWR,0700); /* portable: ctermid(NULL) is "/dev/tty" */
termios
functions:
pgid = tcgetpgrp
(ttyd);
tcsetpgrp
(ttyd, to_pgid); /* but first ignore SIGTTOU - see below */
ioctl
on older systems:
ioctl(ttyd,TIOCGPGRP, &pgid); /* note a pointer & is passed */
ioctl(ttyd,TIOCSPGRP, &to_pgid);
There are four kinds of stop signals:
SIGTSTP
,
SIGSTOP
(an uncatchable
SIGTSTP
),
SIGTTOU
and
SIGTTIN
and a continue signal:
SIGCONT
. Usually these
signals are sent to process groups instead of individual processes. When a
process not in the terminal's group attempts to read from the terminal
all
members of the first process group are sent the
SIGTTIN
signal; the default action on
SIGTTIN
is to suspend the process.
SIGTTOU
is used similarly for write attempts.
SIGTSTP
and
SIGSTOP
also suspend a process. The
SIGCONT
signal restarts a process
(but, of course, if the terminal's group is not changed a
SIGCONT
after a
SIGTTIN
results in another
SIGTTIN
).
kill(2)
and
killpg(2)
are used to send these signals to processes and process groups.
kill( )
is usually used for both since by convention,
a negative PID
is taken to mean the process group with PGID equal to -PID.
waitpid(2)
is used to wait for children in a specific process group,
looking also for children that became stopped since the last report
and returning immediately:
waitpid( -pgid, &status, WUNTRACED | WNOHANG );
Implementing Job Control |
system(3S)
,
popen(3S)
, from a makefile or
from emacs
's
compile feature, does not change
process groups. That way whenever one process stops, all others will stop too.
From now on we consider interactive shells only.
The shell maintains a table of jobs. Before every prompt for a command, the shell does a
waitpid(-1, &status, WNOHANG | WUNTRACED)
The shell executes a pipeline list by forking off a child to handle the list. This child immediately makes itself a process group leader by changing its PGID to its own PID. To avoid race conditions the shell should also issue the command to change the process group of the child. Then the child and any of its descendants can be signaled as a unit. No descendent will leave this group (excepting perhaps a new interactive subshell).
If the shell wishes to run this child in the foreground, then
it will change the terminal's process group to that of
the child and then do a
waitpid( )
on the child's pid.
The child now has control of the terminal and the shell disappears
as far as the user is concerned. A
^Z
, if not caught by the child, will stop
the child (and the group) and the shell's waitpid( )
will return.
The shell
determines that a stop was sent to the
child by looking at the status value. At
this point the child still has control of the terminal. The shell uses an
tcsetpgrp( )
to get the terminal back
but here is a tricky detail: since the
shell is not in the terminal's group the
tcsetpgrp( )
will be sent a
SIGTTOU
signal.
In order to prevent being stopped by this signal, the shell must make
sure that it has ignored (SIG_IGN
) signal
SIGTTOU
before doing the
tcsetpgrp( )
. After recording the child as a
suspended job in the shell's table
of jobs and resetting SIGTTOU
, the shell proceeds
probably prompting for the next command.
If the shell wishes to run the child in the background, then no
waitpid( )
is
done and the terminal's process group remains that of the shell.
The entire child process group will be sent a
SIGTTIN
or
SIGTTOU
and become suspended if
any descendent attempts I/O on the terminal.
If no attempt is made the child
runs to completion and turns into a zombie until the shell finally
does a waitpid( )
for it.
If the shell receives a fg
command referring to a
background child,
a SIGSTSP
is sent to the child's process group,
the terminal is given to the child via an tcsetpgrp( )
,
a SIGCONT
is sent to the child's process group,
and then the shell does a waitpid( )
for this child.
If a bg
command is received for a currently
suspended job, then a SIGCONT
is
sent to the child's process group and the shell proceeds without doing a
waitpid( )
.
Last update: 2001 April 28