Register Machines

Wed, Jan 25, 2017 5 min read Compiler, Programming

An emulator for Register Machine

Our goal is to implement an emulator for a register machine using flex and bison. In the process we will illustrate

a function that assigns numbers to programs,
the von Neumann machine,
and explain how to write self modifying code.

Recall that a register machine has an unlimited number of registers and the following three instructions.

 HALT, stops the machine

 INC r j, increments the contents of register r, and moves to instruction number j.

 DEB r i j, If contents of register r is > 0 then decrement the register and move to instruction i, else move to instruction j.

Consider the ADD program below. It adds the contents of register 1 to contents of register 2.

 1. DEB 1 2 3
 2. INC 2 1
 3. HALT

State of such a program can be described by the contents of the registers. Execution of ADD 4 1 will result in the following states

  Register 1, Register 2 
  4         , 1
  3         , 1
  3         , 2
  2         , 2
  2         , 3
  1         , 3
  1         , 4
  0         , 4
  0         , 5

Each instruction in the RML program would be represented internally as a sequence of four numbers. The complete program would be stored in an array of integers called program. The program is stored starting at index 0. Below is the ADD program and its internal representation. The line numbers in the program are indexed starting at 0.

 DEB 1 1 2
 INC 2 0
 HALT

3 2 1 2 2 1 0 0 1 0 0 0

We can now infer that all programs are just numbers. Our encoding function is not the most convenient to reason with but it works. Functions that assign unique numbers to programs were first used by Gödel to prove the incompleteness theorems.

Grammar

We will specify the grammar (rml.y) and the lexical units (rml.l) for RML. The grammar rules are described using variables and tokens HALT, DEB, INC, NUM. We then use the tools, bison and flex to generate our emulator. Let us begin with the grammar. The start variable is prog in the grammar below. Associated with each rule is an action part within { }. The action part contains the C code which is executed when the RHS is matched. We use the code in the action part to build the internal representation of the RML program provided as input. Grammars with actions are called attribute grammars and can be used perform source to source translations.

// save this piece of code in rml.y
%{
#include <stdio.h>
#define YYSTYPE int
int regs[26];
int program[512];
int ctr = 0;
%}


%token NUM HALT INC DEB

%start prog
%%

prog: 
    stmt | stmt prog

stmt:
    HALT { program[ctr] =  $1 ; ctr+= 4; }      |           
    INC NUM NUM { 
            program[ctr] = $1; program[ctr+1] = $2;
            program[ctr+2] = $3 ; ctr+=4 ; }    |
    DEB NUM NUM NUM { 
            program[ctr] = $1; program[ctr+1] = $2;
            program[ctr+2] = $3 ; program[ctr+3]= $4; ctr+=4; } 

%%
int yywrap() {}             // add to get rid of -lfl dependency

int yyerror(char *s) {
  printf("%s\n", s);
}

void prnRegs() {
    printf("\n %d %d %d\n", regs[0], regs[1], regs[2]);
}


int main ()
{
  int i = 0 ;
  for(i = 0; i < 26; i++)  {
    regs[i] = 0;
    program[i] = 0 ;
  }
  regs[1] = 4; regs[2] = 5;

  yyparse ();

  for(i = 0; i < 12; i++) 
    printf("%d ", program[i]);


// Fetch and Exectue Loop
// First instruction is stored a index 0 in the program array

int PC = 0;             // program counter initialized

while (program[PC]!=1)          // HALT is not encountered

    switch (program[PC] ) {

        case    2:
            regs[program[PC+1]] += 1;   // INC the register
            PC = 4*program[PC+2] ;      // set PC to the next instruction.
                prnRegs();          // print the registers
    
        case    3:
            if (regs[program[PC+1]] > 0) {      // contents in REG > 0
                regs[program[PC+1]] -= 1;   // DEC the contents
                PC = 4*program[PC+2] ;      // set PC to the next instruction
                prnRegs();
            }   
            else {
                PC = 4*program[PC+3] ;      // Branch to the next instruction
                prnRegs();
            }
    }
}

}
// end rml.y

Note that the registers are cleared to 0, at the start of the program. The while loop in the program above, implements the von Neumann stored program machine. As the program is stored as data, it is possible for us to modify the program inside the while loop.

Lexer

Flex stores the value of the token in a variable called yylval. We set the type of yylval as int, good enough for our purpose no need to use union. NUM token has a value which is obtained by converting the matched text (stored in yytext) to an int. Other tokens have value 1,2,3 as shown below.

// start of rml.l
%{

#include "y.tab.h"
#include <stdio.h>
#define YYSTYPE int
int yylval;
%}

%%

[0123456789]+       { yylval = atoi(yytext);
              printf("NUM %d", yylval);
              return NUM;}
HALT            { yylval = 1; 
                  printf("HALT");
                return HALT; }
INC         { yylval = 2; 
                  printf("INC");
                return INC ;}
DEB         { yylval = 3; 
                  printf("DEB");
                return DEB ;}

%%

Save the two programs in rml.y and rml.l respectively. Save the following in Makefile.

// Makefile

rml: rml.y rml.l
    bison -d rml.y
    flex  -o rml.lex.c rml.l
    gcc  -o rml rml.lex.c rml.tab.c

Compile the code using make. Save the add program to add.rml and test your emulator using >./rml < add.rml Write new programs and test.

Possible enhancements

We set the values of registers 1 and 2 to 4 and 5 using a direct assignment. It is instructive to modify the grammar to read the contents of the registers as part of the input RML program, do try it.
Read the program from a file as opposed from stdin
Implement a nicer REPL loop.
Add variables and assignment statements to the language.

Register Machines

An emulator for Register Machine

Grammar

Lexer

Possible enhancements

Further reading