KWIC Index

University of Lethbridge - Mathematics & Computer Science

Problem #8 KWIC Index

A KWIC index is often used to list a series of subject titles (magazine articles, books, etc.). Such an index is created by using each word of each title as a sort key. (This means that a title can be located even if only one word in the title is known.) For example, given the input titles :

Green Sleeves
Time Was Lost

the corresponding KWIC index is:

                               Green Sleeves
                      Time Was Lost
                         Green Sleeves
                               Time Was Lost
                          Time Was Lost

Write a program to accept lines of input, and output those lines in a KWIC index as shown above, including the alignment. For simplicity, assume that a word is defined as any consecutive sequence of upper and lower case letters. You may assume that no word has more than 20 characters and that no line has more than 80 characters. However, there is no limit to the number of lines, nor the number of words on a line, other than that imposed by the above restrictions. Only alphabetic characters are to be used in the sort, i.e. a comma in a title would be simply appear in the KWIC index as a "next" character in the title and would not be a factor in the sort or treated as a special word.

The problem data is stored in a text file called kwic.dat. The output should be written to the file kwic.out.

Original problem: Rocky Mountain Region ACM contest, 1989