The Bourne shell sh and the Bash shell bash

The Bourne shell sh and the Bash shell bash

Scripts

Make scripts executable with chmod u+x. The first line should be

#!

/usr/bin/sh

#!

/bin/bash

or whatever the path to the interpreter is (space optional after #! and on many systems a maximum of one argument is allowed).

Comments and Line Continuation

Everything starting with symbol # to the end of the line is a comment. It is helpful to use a single # for temporarily commenting out code and a double ## to indicate ``actual'' comments.

\ followed by a newline is a logical line continuation (except in comments).

Null Command

Symbol : denotes the null, do nothing, command. Do not use it like a comment since:

: hi ; trouble
actually runs command trouble since the ; ends the command.

Pipelines

To send the output (on stdout) of command1 as input (on stdin) of command2 use a pipeline:

command1 | command2
Similarly for multiple pipes.

To debug a pipeline use tee:

command1 | tee debugfile | command2
which records a copy of all data going through the pipe to debugfile.

The exit status is the exit status of the last (rightmost) command in a pipe. For example,

true | true | false has exit status of failure.

true | non-existent-program has exit status of failure.

non-existent-program | goodprogram has exit status depending on when goodprogram gets the ``broken pipe'' signal, and what it does with it.

Variables

A variable name is a string of alphanumerics or _ not starting with a digit. There are various special variables whose names are numbers or single characters. See characters significant to shell.

To assign to a variable use: VARIABLE=value
If value contains spaces or other characters significant to the shell then quote value with ', " or \ as appropriate:

YourHOME=" Your home is $HOME "

Do not put spaces around the = :

TRPRG=tr ## correct assignment

TRPRG= tr ## runs command tr, with variable TRPRG set to null string

TRPRG = tr ## runs command TRPRG, with arguments = and tr

TRPRG =tr ## runs command TRPRG, with argument =tr

$VARIABLE gives the value of VARIABLE. Use ${VARIABLE} to avoid ambiguities. For example, if FILE=output and you want to generate string outputlog then you need to use ${FILE}log not $FILElog because $FILElog evaluates variable FILElog not FILE. There are many useful modifiers, for example, ${TERM:-unknown} expands to the value of TERM, except it expands to unknown if TERM is not set or is the null string. ${TERM:=unknown} is similar except in addition it assigns the value in the latter case.

Note $$ is useful for generating temporary file names, since the process id $$ is unique: TMPFILE=/var/tmp/data1.$$

Command Substitution

` ` means to replace by the output of the enclosed command (with trailing newlines removed). The symbol ` is the back quote, usually located away from the other two quotes on the keyboard. In Korn and bash shells, the nestable $( ) can be used instead. For example,

FILELIST=$( ls $HOME ) ## $FILELIST is list of all files in home

LOGFILE=printerlog.`date +%y%m%d` ## extension by dates like 960925
To change to the directory in which the currently executing script is located use:
cd $( dirname $0 )
For instance, if the script was invoked as ../cmd/foo then $( dirname $0 ) is ../cmd

Quoting

To remove the significance, if any, to the shell of a single character, use a \ just ahead of the character. For an entire string use enclosing forward quotes ' which amounts to putting \ in front of each character. Double quotes can also be used, except variable substitutions ($) and command substitution are done, and also the leading \ is removed for each of: \` \" \$ \\
When using the value of a variable, double quotes are often desirable and recommended. "$VARIABLE" expands the variable but prevents the shell from interpreting the value.

For example, assume the only files in HOME=/homenew that match *.o are: a.o b.o c.o
Each of the following find commands removes these files (after asking):

find $HOME -name '*.o' -exec rm -i {} \;

find $HOME -name '*.o' -exec rm -i {} ';'

find "$HOME" -name '*.o' -exec rm -i {} \;
The last of these is the safest. Each find command above has 8 arguments:

/homenew -name *.o -exec rm -i {} ;

Contrast this with the each of the following, where find executes but is faulty:

find $HOME -name *.o -exec rm -i {} ;
has 9 arguments because * and ; mean something to the shell and so are replaced:

/homenew -name a.o b.o c.o -exec rm -i {}

find "$HOME -name *.o -exec rm -i {} \;"
has one argument because spaces and \ are quoted inside double quotes:

/homenew -name *.o -exec rm -i {} \;

find '$HOME -name *.o -exec rm -i {} \;'
has one argument because everything is quoted:

$HOME -name *.o -exec rm -i {} \;

To get a double quote into a double quoted string use: "a single \" string". You cannot get a single quote into a single quoted string. Instead use "rarely ' needed" or 'rarely '\'' needed'

Note: "$@" is the sole exceptional to the above rules. "$@" means "$1" "$2" ... which is the typical intention when processing command arguments. "$*" follows the rules and gives "$1 $2 ..." which is much less useful.

Command Sequencing (or Command Lists)

cmd1 && cmd2 ## Do cmd1 then do cmd2 only if cmd1 succeeds (exit status 0).

cmd1 || cmd2 ## Do cmd1 then do cmd2 only if cmd1 fails.

cmd1 ; cmd2 ## Do cmd1. When cmd1 is done, do cmd2.

## A newline is similar to a ";"

cmd1 & ## Do cmd1, but do not wait for it to finish (``background'' it).

These associate from left to right with precedence:

highest:	`{ }`	`( )`
high:	`\|`
medium:	`&&`	`\|\|`
low:	`;`	`&`	newline

For example,

a|b|c || d|e & f && g || h|i ;
has precedence as implied by the spaces and means the following. Start pipe a|b|c. If c fails (exit status not 0), then start pipe d|e. Because of the &, do not wait for any of this to complete. Instead after starting a|b|c, also start f. If f succeeds start g. Since we associate left to right, h|i will be run only if f && g fails, that is, if either f or g fails. Wait for f && g || h|i to complete.

&& is typically used in situations were it is essential not to run the second command if the first failed (to avoid data corruption). || is used where you want to do some error recovery or notification. However, an if is more versatile in most situations:

if [ $? = 0 ] ; then

commands for success

else

commands for failure

fi

Control Structures

The for control has form:

for VARIABLE in List_of_values

do

A command list probably containing $VARIABLE somewhere.

done
The command list is executed once for each value in List_of_values, with VARIABLE set to each value in turn. The in clause can be omitted in which case in $* is the default.

Some examples are:

for filename in `ls`

do

echo $filename:

tail $filename

done ## Gives the tail of each file in current directory

for i in 6 5 4 3 2 1 ; do echo $i ; done ; echo BLAST OFF!

## countdown. Note the position of the semicolons; they are required.

for arg

do

echo Next command argument is: "$arg"

done

The while control has form:

while head_command_list

do

body_command_list

done
Execute the head, body, head, body, ... but terminate as soon as the exit status from the head is failure. Sometimes termination on success is desired, for which use until instead of while. For example:

while echo y

do

: do nothing. Command list for a "for" or "while" cannot be empty.

done ## Generate an endless number of "y"s. Yes!

The following example uses a read command. For each line read, the words (stripped of surrounding white space) are stored in the successive variables. The last variable will have all remaining words if any (without stripping of interword white space). For example if the current input line is:

A BIG BAD WOLF
with possible white space at the beginning and end of the line, then after the read we will have: var1=A var2=BIG rest="BAD WOLF"

while read var1 var2 rest ## reads line by line from stdin until EOF

do

A command list probably having $var1, $var2, and $rest

done
A read without variable names will store the line in variable REPLY instead.

In for and while loops, break terminates the loop and continue terminates the rest of the current body.

The case control has form:

case "$VARIABLE" in

file_patterns1) command_list1 ;; ## Note double semicolon

file_patterns2) command_list2 ;;

...

esac
The command list corresponding to first matching pattern, if any, is executed. *) is a catch all. [See the argument processing examples.]

The if control has form:

if command_list_test1

then command_list_do1 ## not optional

elif command_list_test2 ## optional

then command_list_do2 ## required when elif appears

...

else command_list_do_n ## optional

fi
with meaning as one might expect. elif - then and else clauses are optional. The exit status on if and elif clauses is used to determine which then or else clause to execute. Only one of the latter will be executed. Note that because fi is required, there is no ``dangling else'' problem. The command_list_test is typically a test command (see below).

Test [ ]

test is a condition testing utility used in if statements typically. [ ] is a synonym for test. The Korn shell also has a more powerful test as a builtin command: [[ ]]. An exit status of success (0) indicates true. This is the opposite to the C language where an integer 0 indicates false.

For example, there are 6 ways to test if file foobar is a readable file:

if test -r foobar

then ...

if test -r foobar ; then ...

if [ -r foobar ]

then ...

if [ -r foobar ] ; then ...

if [[ -r foobar ]]

then ...

if [[ -r foobar ]] ; then ...

Warning (Pitfall): In all cases test must end in a ; or newline or some other legal command termination such as && since then is not recognized as a keyword except at the beginning of a command. Since [ is merely a synonym for test, spaces are required around it. The only optional space is the one just ahead of the semicolon. [ reads better than test so use [.

With test you can, for example:

make file checks: -r is readable, -w is writable, -x is executable, -f exists and is ordinary, -d is directory
compare arithmetic expressions: -eq for equal, -le for less or equal, etc.
compare strings with "string" = "pattern" or "string" != "pattern" but note that spaces are required and the quotes are for safety. For test and [ no pattern wild cards are available, just strings. Because of this limitation, if you are not using the Korn shell and you need pattern matching, use the case construct instead. Some examples are:

if [ "$TERM" != sun ] ; then ## commands for non-sun terminals
if [[ "$EDITOR" = *emacs ]] ; then ## if editor is emacs, xemacs, ...
if [[ "$PAGER" = "" ]] ; then ## PAGER not set. Note all " are required.

The above can be combined using not (!), and (-a), or (-o) and grouping ( and ). In bash's [[ test use and (&&), or (||) instead (additionally).

File Redirection

File redirection forces reads from stdin to come from a different file than the current one. Initially reads usually come from the terminal. Similar for writes. Recall stdin, stdout, stderr are given values 0, 1 and 2 called file descriptors. Redirections only effect the commands they modify.

Expression:		Meaning:
`>`	`file`	stdout to `file` (writing)
`>>`	`file`	stdout appended to end of `file` (``seek'' to end before every write)
`<`	`file`	stdin from `file` (reading)
`2>`	`file`	stderr to `file`
`4<>`	`file`	descriptor 4 opened for read and write on `file` (a disk file probably)

Note that spaces are optional, > is really 1>, < is really 0< and no space is allowed between digits and < and >.

A typical use is:

sort < inputFile > outputFile 2> errorFile
The commands: command < infile and cat infile | command are similar but the first one uses one fewer process. The command >z opens file z for output (truncating it), runs no other command, then closes z. The net result is an empty file z.

``Here documents'' are introduced with <<. Input comes from following lines in the script. The word after the << determines the extent. For example:

command <<EOF

some lines of input

EOF

## This line is outside the here document.

## Make sure EOF is flush left and not mistyped.

Note that the shell will expand $VARIABLES and command substitutions in the here document. Suppress this behaviour by using <<\word , for example, <<\EOF in above. When leading tabs are not important, use <<-word which allows the shell to strip leading tabs (but not blanks!)

command <<-EOF ## lines below are tab indented

tab indented lines

more pretty lines - next line also tab indented

EOF

Redirection and duping are done sequentially from left to right within each command:

wc infile >za >zb >zc ## za, zb are emptied and word count is in zc

File Duping

n>&m or n<&m makes descriptor n be whatever m is currently, where n and m are digits. To avoid confusing yourself, note the first number always refers to the effected descriptor and &m can be thought of as indirection.

For example, the standard idiom for outputting errors (to stderr) is:

echo >&2 Warning: Conflicting options.
To combine output and errors into a file w:

sort >w 2>&1
whereas the following is wrong because errors go to stdout (buffered terminal), and the rest goes to the file.

sort 2>&1 >w
To ignore errors:

sort >w 2>/dev/null

n>&- closes descriptor n giving EOF write errors if writes are attempted.
n>/dev/null leaves n open, but throws away everything that is written. This is usually the better choice.

The redirections are done only within each command and from outside in. Consider:

{ echo hi | wc -illegal -options 2>&1 ; } 2>errorfile >testfile
If one were to read left to right then one would expect the errors from wc appear on the terminal, but this is not the case. The redirections outside the { } take effect first, so stdout is testfile. The 2>&1 takes the errors to the stdout of the { }, that is, testfile. So all output including errors are sent to testfile.

exec

exec replaces the current process with a new one. For example, exec /bin/bash replaces the current shell with a bash shell.

exec with just file redirections, rearranges the file descriptors. The redirections effect all subsequent commands.

trap

Signals can be handled very easily using the trap command.

trap 'commands to execute on signal' signal_numbers

trap '' signal_numbers ## to ignore signals

trap signal_numbers ## to reset to default actions
Signal 0 signifies an ordinary exit.

trap '/usr/bin/rm *.o ' 0 ## removes all object files on ordinary exit.

trap '/usr/bin/rm *.o ; exit 1' 1 2 15 ## removes on signals 1, 2 or 15.

An application should, if possible, not have code to handle signals, since such code is difficult to write well and not very portable. Instead, consider using a shell wrapper to handle signals.

Argument Handling

The recommended way to handle argument processing is with getopts. Commands shift and set are useful in this regard. shift n moves $n,... to $1,....

set -- zero_or_more_arguments
will set $1 to first argument, if any, $2 to second, etc., unset the rest, and adjust $#. Bourne shell sh (but not bash) has the misfeature that set -- does nothing. To clear arguments in sh use: set -- "" ; shift

The Environment

On creation every process is given an environment which is a copy of its parent's environment. The process can use and alter its environment, but since it is merely a copy, this does not affect the parent's environment in any way. The processes' future children will get the altered environment.

Environment VARIABLES are by convention always UPPERCASED. Use command env to see the environment. Typically the environment is used to configure, set preferences, or pass small amounts of data but the environment cannot exceed some small size (5K on some systems) so conserve space.

Some key variables are:

PATH which is exec's search path,

MANPATH which is man's search path,

and other similar path variables: the values are colon separated lists of directories to search.

TERM, TERMCAP are for terminal settings.

MAIL is the mail directory.

CC is the name of C compiler (cc, gcc are common choices).

HOME and LOGNAME are user home directory and login-name.

PS1, PS2, and so on are the prompt strings.

To make a VARIABLE part of the environment, that is, exportable to future children use:

export VARIABLE ; VARIABLE=value

export VARIABLE=value ## bash and ksh

setenv VARIABLE value ## csh
Note that statements like:

VAR1=value1 VAR2=value2 command
put VAR1 and VAR2 in the environment of command.

Subshells and Sourcing

( ) indicates that the enclosed commands are to be run in a subshell, that is, a shell which is a child of the current shell. Almost all commands are implicitly executed in a subshell, so explicit subshells are mostly just useful for commands that are not, namely builtins and functions. Subshells are useful to:

Temporarily change directories or variables:

( cd /home/foo/bar ; command1 )
( PATH=/usr/bin:/usr/local/bin ; command1 )
PATH=/usr/bin:/usr/local/bin command1 ## easier

Copy whole directories across file systems without using ftp by using tar which packages and unpackages file trees:

( cd source_dir && tar cf - . ) | ( cd destination_dir && tar xpf - )

Insert data in streams. [See the canned_in example.]

Suppose you have commands in a file .configrc that are to effect the environment of the current shell, for example, cd /home/project or TERM=vt100. No script or executable command can accomplish this, since a child only has a copy of the parent shell's environment. Instead, source (command ``.'') the file which amounts to reading commands from the file, and then executing them in the current environment:

. .configrc
For example, when you log in to bash, it in effect does:

. .bash_profile ; if [ "$ENV" != "" ] ; then . $ENV ; fi
Each time you start a subshell, it does:

if [ "$ENV" != "" ] ; then . $ENV ; fi
Typically ENV is set to be .kshrc in the your .profile file.

The Processing Loop

Prompt.
Read input.
Parse input in following order:
- token splitting (recognizing: >> ; if etc.)
- alias substitution
- ~ substitution (~user becomes the user's home directory, typically /home/user)
- command substitution (a subshell is started to handle this)
- parameter expansion ($VARIABLE replaced)
- file wildcard expansion
- blank interpretation (splitting command string into $0 $1 $2 ...)
- quote processing (removing quotes)
- file redirection
Execute commands (using fork, exec, pipe, dup and other system calls).
Report status changes: if any children stopped or if backgrounded ones exited.

Evaluation Order and Eval

For example, the command tset -s sun prints out commands to correctly set TERM and TERMCAP for SUN terminals. This command must be run in the current shell. Yet $( tset -s sun ) fails, because the output from tset is not substituted until after token splitting has already occurred. The output could be sent to a file and then the file could be sourced, but this is clumsy. The solution is to use the builtin command eval which reads its arguments as input to the shell and executes them in the current shell:

eval $( tset -s sun )

Functions

Functions are very similar to scripts. Arguments are $1, $2, etc. Functions must be defined before the first reference. Unlike most other commands, functions execute in the current process not a subshell, so they can affect the environment. To define a function use the syntax:

functionname( ) {

commands for function

}
To invoke it just use the name of the function: functionname arg1 arg2 ... as you would for any other command.

More Builtins

cd changes directory.

set configures the shell. set -x to debug a script.

umask sets the file umask.

ulimit sets limits on resources.

Additional Features of bash

Besides the differences mentioned above, bash offers:

Command line editing and file name completion.
Job control: backgrounding and foregrounding jobs.
typeset to alter behaviour of variables: integer, local, etc.
printf gives formatted printing.
Aliases: use these (only) to give a short name for an executable file. Functions should be used for other uses.
Arithmetic: evaluate integer arithmetic expressions with: $(( ))
Arrays.

An alternate shell, ksh, also offers many of these features. Csh and tcsh have some of these features but are quite inferior for scripting.

Note that advanced features require a capable operating system. Thus job control is not available when the operating system is incapable of handling the needed signals or is incapable of multitasking.

Last update: 2009 January 22