Skip to content

LennyGonz/Pascal-Compiler

Repository files navigation

Pascal Compiler

This is a Pascal compiler built using Python 2.7

Running this Compiler

Enter the command: python run.py Inside the run.py file there are 4 examples

  1. array example
  2. assignment example
  3. for-loop example
  4. if-loop example
  5. while-loop example
The first example is array, to see the other examples go into the run.py file and comment out the array example and uncomment a SINGLE other command

Resources used for this project:

Compiler Theory

Parsing Methods

How does an interpreter/compiler work?

A very simple form of a compiler/interpreter:

Source File ==> Scanner ==> Lexer ==> Parser ==> Interpreter/Code Generator

  1. Source File: This is the program that is read by the interpreter/compiler. This is the text that needs to be compiled or interpreted.

  2. Scanner: This is the first module in a compiler/interpreter/

    • The job of a scanner is to read the source file, one character at a time.
    • It also keeps track of which line number and character is currently being read.
    • A typical scanner can be instructed to move backwards and forwards through the source file.
      • Why do we need to move backwards?
    • For now assume that each time the scanner is called:
      • it returns the next character in the file
  3. Lexer: This module serves to break up the source file into chunks(called tokens). It calls the scanner to get characters one at a time and organizes them into:

    • tokens
    • token types
cx = cy + 324;
print "value of cx is ", cx;

A lexer would break it like this:

cx                 --> Identifier       (variable)
=                  --> Symbol           (assignment operator)
cy                 --> Identifier       (variable)
+                  --> Symbol           (addition operator)
324                --> Numeric Constant (integer)
;                  --> Symbol           (end of statement)
print              --> Identifier       (keyword)
"value of cx is "  --> String Constant  (string)
,                  --> Symbol           (string concatentation operator)
cx                 --> Identifier       (variable)
  • The lexer calls the scanner to pass it one character at a time
  • Then lexer groups them together(groups characters together) and identifies them up as tokens for the language parser (which is the next stage)
  • SO basically TOKENS are characters grouped together
  • The lexer also identifies the type of token:
    • variable vs keyword
    • assignment operator vs addition operator vs string concatentation operator etc
  • Occasionally, the lexer has to tell the scanner to back up.
    • Consider a language that has operators that may be more than one character long
      • For example ! vs !=
      • < vs <=
      • '+' vs ++
  • If we assume that the lexer needs to determine whether the operator is a < or a <=, the lexer will request the scanner for another character.
  • If the next character is a '=', it changes the token to "<=" and passes it to the parser
  • If not, it tells the scanner to back up one character and hold it in the buffer, while it passes the '<' to the parser.
  1. Parser: This is the part of the compiler that really understand the syntax of the language
  • It calls the lexer to get tokens and prcessess the tokens per the syntax of the language
  • For example, taking the example from the lexer above, the hypothetical interaction between the lexer and paraser could go like this:

Parser: Give me the next token
Lexer : Next token is "cx" which is a variable.
Parser: Ok, I have "cx" as a declared integer variable. Give me next token
Lexer : Next token is "=", the assignment operator.
Parser: Ok, the program wants me to assign something to "cx". Next token
------> Lexer : The next token is "cy" which is a variable.
------> Parser: Ok, I know "cy" is an integer variable. Next token please
------> Lexer : The next token is '+', which is an addition operator.
------> Parser: Ok, so I need to add something to the value in "cy". Next token please.
--------------> Lexer: The next token is "324", which is an integer.
--------------> Parser: Ok, both "cy" and "324" are integers, so I can add them. Next token please:
--------------> Lexer: The next token is ";" which is end of statement.
------> Parser: Ok, I will evaluate "cy + 324" and get the answer
Parser: I'll take the answer from "cy + 324" and assign it to "cx"


In the section above, the indenting shows a subprocess that the parsers enters to evaluate "cy+324". This gives an idea about how the parser operates.

  • Also note that the parser is checking types and syntax rules (for instance, it checked whether cy and 324 were both integer types before adding them).
  • If the parser gets a token that it was not expecting, it will stop processing and complain to the user about an error.
  • The scanner holds the current line number and character, so the Parser can inform the user approximately where the error occured.
  1. Interpreter/Code Generator: This is the part that actually takes the action that is specified by a program statement.
  • In some bases, this is actually part of the parser(especially for interpreters)
    • The parser interprets and takes action directly
  • In other cases, the parser converts the statements into byte-code
  • In the case of a compiler, it then hands them to the Code Generator to convert into machine code instructions
  • If you want a compiler for a different CPU or architecture, all you have to do is put a new code generator unit to translate the byte code into machine code for the new CPU

About

Pascal

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published