File I/O; Files and Directories - Overview

Reference: Details are in Chapters 3 and 4, Advanced Programming in the UNIX Environment, by W. Richard Stevens.

Many of the following library functions and system calls use either a pathname (an absolute or relative filename: absolute if it starts with / or relative to the current directory otherwise) or a file descriptor (called fd or fildes). Where there are similar functions that allow either, such as for example stat( ), the file descriptor version has prefix "f". A prefix "l" version does not follow symbolic links, and so acts on the symbolic link itself instead of the file it points too.

open - open a file for I/O, returning the file descriptor

fd = open(const char *pathname, int flags);
fd = open(const char *pathname, int flags, mode_t mode);
fd = creat(const char *pathname, mode_t mode);
where flags is one of: O_RDONLY, O_WRONLY or O_RDWR which may be bit or'ed (|) with zero or more of O_CREAT | O_EXCL | O_TRUNC | O_APPEND. The mode is used with O_CREAT and is the same one used by stat( ) for access modes (permissions).
creat( ) is open( ) with flags O_WRONLY | O_CREAT | O_TRUNC.
Note: fopen( ) is different because it additionally sets up an STDIO buffer and I/O is buffered.

close - close file descriptor

int close(int fd);
In simple situations this function closes a file. What it actually does is deallocate the file descriptor; when all file descriptors associated with an open file description have been closed the open file description will be freed - only then is the file actually closed (if then).
Note: fclose( ) is different because it additionally flushes an STDIO buffer.

lseek - reposition read/write file offset

off_t lseek(int fd, off_t offset, int whence);
The offset is relative to the beginning (whence=SEEK_SET), the current position (whence=SEEK_CUR) or the end (whence=SEEK_END).

read, write - file I/O

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
read/write count bytes to buf. Note there is a readv( ) and writev( ), which do scatter or gather I/O using a vector of buffers - this is very useful for network I/O where data is packet based.

fcntl, ioctl - file descriptor and I/O device control

int fcntl(int fd, int cmd);
int fcntl(int fd, int cmd, long arg);
int ioctl(int d, int request, ...);

stat, fstat, lstat - get file status

int stat(const char *file_name, struct stat *buf);
int fstat(int fd, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);

buf is a pointer to a struct stat for which space must be set aside:

          struct stat                    /* May vary - may have more members */
          {
              dev_t         st_dev;      /* device number                   */
              dev_t         st_rdev;     /* device type (if inode device)   */
              ino_t         st_ino;      /* inode (file serial number)      */
              unsigned long st_blksize;  /* blocksize for filesystem I/O    */
              unsigned long st_blocks;   /* number of blocks allocated      */
              /***   MEMBERS OF MOST INTEREST:          Used by "ls"      ***/ 
              mode_t        st_mode;     /* file type & mode (permissions)  */
              nlink_t       st_nlink;    /* number of hard links            */
              uid_t         st_uid;      /* user ID of owner                */
              gid_t         st_gid;      /* group ID of owner               */
              off_t         st_size;     /* total size, in bytes            */
              time_t        st_atime;    /* time of last Access             */
              time_t        st_mtime;    /* time of last Modification       */
              time_t        st_ctime;    /* time of last file status Change */
          };

Note file names are NOT part of struct stat because they are stored in directories, not with the file! A file can have many names. For more details about this and about inodes (st_ino) and hard links (st_nlinks) see the next section.

The following can be applied to struct stat's st_mode member:

       File type macros (Figure 4.1):
           S_ISREG(m)  regular file?
           S_ISDIR(m)  directory?
           S_ISCHR(m)  character device?
           S_ISBLK(m)  block device?
           S_ISFIFO(m) fifo?
           S_ISLNK(m)  symbolic link?
           S_ISSOCK(m) socket?

       File type masks (not in book - see man for stat):
           S_IFMT   0170000 file type bitfields
           S_IFSOCK 0140000 socket               (network pipe-like object)
           S_IFLNK  0120000 symbolic link        (has name of another file)
           S_IFREG  0100000 regular file
           S_IFBLK  0060000 block device         (disk device)
           S_IFDIR  0040000 directory
           S_IFCHR  0020000 character device     (tty, printer, etc.)
           S_IFIFO  0010000 fifo                 (pipe)
           S_ISUID  0004000 set UID bit (prg gains permissions of owner)
           S_ISGID  0002000 set GID bit (prg gains permissions of group)
           S_ISVTX  0001000 sticky bit (world write dirs: owner only rm file)

       Masks for access permissions - mode (Figure 4.4):
           S_IRWXU  0000700 user (file owner) has rwx permission
           S_IRUSR  0000400 user has read permission
           S_IREAD
           S_IWUSR  0000200 user has write permission
           S_IWRITE
           S_IXUSR  0000100 user has execute permission
           S_IEXEC
           S_IRWXG  0000070 group has read, write and execute permission
           S_IRGRP  0000040 group has read permission
           S_IWGRP  0000020 group has write permission
           S_IXGRP  0000010 group has execute permission
           S_IRWXO  0000007 others have read, write and execute permission
           S_IROTH  0000004 others have read permission
           S_IWOTH  0000002 others have write permission
           S_IXOTH  0000001 others have execute permission

Many of the following functions have names corresponding to utilities. The utilities are often just front ends to these: they process arguments often allowing a wider range of options, and then call these functions.

chmod, fchmod - change mode (file permissions)

int chmod(const char *pathname, mode_t mode);
int fchmod(int fd, mode_t mode);

umask - set file creation mask

int umask(int mask);

chown, fchown, lchown - change owner/group of file

int chown(const char *pathname, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *pathname, uid_t owner, gid_t group);

utime - change access or modification time of file

int utime(const char *pathname, struct utimbuf *buf);
See Figure 4.14 for more details.

symlink, readlink - make or get value of symbolic link

int symlink(const char *oldpath, const char *newpath);
int readlink(const char *pathname, char *buf, size_t bufsiz);

remove, unlink, rmdir - remove name

int remove(const char *pathname);
int unlink(const char *pathname);
int rmdir(const char *pathname);
remove( ) is unlink( ) or rmdir( ) as appropriate. rmdir( ) removes an empty directory. unlink( ) removes the name from a directory and deletes the inode and data blocks if this is the last hard link. For more details about inodes and hard links see the next section.

mkdir - make a directory

int mkdir(const char *pathname, mode_t mode);

chdir, fchdir - change working directory (cd in shells)

int chdir(const char *pathname);
int fchdir(int fd);

- directory scanning functions

DIR *opendir(const char *name);
struct dirent *readdir(DIR *dir);
void rewinddir(DIR *dir);
int closedir(DIR *dir);
First opendir( ) to get the DIR * pointer. Then read each directory entry in turn using readdir( ). If desired reset to beginning with rewinddir( ). When done, call closedir( ). See 4.21 for an extensive example. Also see page 4 for a simple ls example.

mount, umount - mount or unmount a file system

mount( ); umount( );
These and their utility namesakes are important. Associated with these is a file /etc/fstab or /etc/vfstab which specifies mount options and mount points. Typically each physical device, usually a disk, will have one or more file systems on it; each can be imagined to be a file tree rooted at /. Mounting means logically attaching one of these file systems at a directory called the mount point in the main / tree. This hides the hardware implementation of the file tree from users, and provides users with a clean view of the file system as a single logical file tree under / instead of a forest of / trees that it actually is. For example, if a mount point is /usr/local and the disk file system mounted there has a file /bin/xemacs then a reference by a process to /usr/local/bin/xemacs is translated by the kernel into a reference to /bin/xemacs on the disk that was mounted at /usr/local. On any decent system /usr/local, /home, /var and a number of other directories will be mount points.
 
The UNIX file system - Inodes and Hard Links
Unix simultaneously supports many types of file systems through a vnode/vfs interface (v=virtual, fs=file system). The main disk based file system is ufs (=unified fs, originally called BSD ffs=fast fs) on main line UNIXs, s5fs(=System V fs) on very old UNIXs, ext2 (=second extended) on Linux. In addition there is support for NFS (=network fs), RFS (=remote fs), tmpfs (=temporary fs: memory or ``ram disk'' fs), specfs (=special fs: device), proc fs (pseudo fs at /proc which provides an interface to kernel), dos-fat-vfat-pc fs (dos=disk operating system, fat=file allocation table), and numerous other file systems including logging, distributed and experimental ones.

A UFS filesystem (on a disk) consists of a boot block (for booting machine), super blocks (having disk info including free and allocated blocks), inode blocks (file info and indices to data blocks, i-node=index-node) and data blocks. See Figures 4.7-4.9 for diagrams of the s5fs structure. Directories are files whose inode has a flag telling the system that its data blocks are to be interpreted as lists of inodes and names.

A file consists of name entries in directories, one inode block and some data blocks as follows. The file has a unique (within file system) inode number. There are one or more name and inode number pairs in various directory data blocks (all on the same file system). Each such entry is said to hard link (point) to the identified inode. An inode (index node) consists of information for the file such as file type, mode, owner, group, size and times, together with indices to the data blocks. For example:


$ ls -idl /etc/rc.d/init.d/x* /etc/rc.d/rc*.d/*xdm* /etc  ## -i shows inodes

 326186 -rwxr-xr-x   1    root root  1082 May 10  1998 /etc/rc.d/init.d/xntpd
 326190 -rwxr-xr-x   2    root root   391 Dec 31 10:44 /etc/rc.d/init.d/xdm
 326190 -rwxr-xr-x   2    root root   391 Dec 31 10:44 /etc/rc.d/rc3.d/S98xdm
   8193 drwxr-xr-x  27    root root  3072 Mar  6 10:49 /etc
^^^^^^^             ^^               ^^^^              ^^^^^^^^^^^^^^^^^^^^^^ 
INODE-NUMBER      link-count         Size              Name  
shows that the inode with number 326190 (which has data blocks of total size 391) has two name entries because the link count is 2. They are xdm in /etc/rc.d/init.d and S98xdm in /etc/rc.d/rc3.d.

Directories will have link count equal to two more than the number of subdirectories. Using the directory /etc above with inode number 8193 as an example, inode 8193 will be listed in the data block for / as "etc", in the data block for "/etc" as "." and in the data block for /etc/rc.d as "..". Notice that if just the parent directory's data block becomes corrupt then the name for the /etc subdirectory is lost but the tree based at /etc (with now unknown root name) can and will be recovered by file system checking routines at /lost+found/8193.

Last update: 2001 April 28