-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDOCUMENTATION.txt
More file actions
149 lines (147 loc) · 8.36 KB
/
DOCUMENTATION.txt
File metadata and controls
149 lines (147 loc) · 8.36 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
Overall Design:
Tokenizer:
1. Loads source file by passing file name to `Tokenizer` constructor.
2. Upon user calling `NextToken` fetches the next token:
i. Reads one or more (if needed) characters from the file stream.
ii. If the character denotes a symbol sets the current token to the type
of symbol and a the string covered by the token in the source file.
iii. If the character is alphanumeric the class will classify it by:
- Lowercase -> `NextReservedToken` (since reserved tokens are the only
tokens that start with a lowercase character)
- Uppercase -> `NextIdentifier` (since identifier tokens are the only
tokens that start with an uppercase character)
- Numeric -> `NextInteger` (since numeric tokens are the only
tokens that start with a numeric character)
ix. If the character hasn't matched and is not whitespace, newline, or
and end-of-file the tokenizer will throw an error.
x. `NextReservedToken` / `NextIdentifier` / `NextInteger` process the
file stream until all consecutive alphanumeric characters are consumed.
These functions will throw any errors for tokens that are invalid in
the CORE language (Ex: Integer/Identifier length greater than 8).
3. Uses the `currentToken` getter to fetch the most recently consumed
token.
4. Uses `Token->getType()` to capture the number representation of all the
tokens from the source file. If the source file was erroneous,
the code will not output the number representation and an error will
instead be outputted. If the tokenizer was successful in reading all
tokens, the number representation will be printed to the program's
output.
Parser:
1. Assigns to the AST the value of a constructed program.
2. The initializer of a `Prog` is passed the parser.
3. The parsing methods are contained in all of the Nodes constructors.
4. When an error is reached it is propagated up to the constructor.
Interpreter:
1. Works similar to how the parser recursively descends.
2. The AST.Execute() method may be used to execute a parsed AST.
3. The AST will execute the program (Prog.Execute) which will recursively
descend the constructed abstract syntax tree calling Execute at each node
and handling logic for statements.
4. The ASTContext class houses the symbol table that holds all identifiers
and their current values.
5. All programs must initialize a variable at all paths in the program
before the identifier may be used. This is enforced at the Parser level.
As such, the errors that occur at the Interpreter level are limited to
overflow/underflow and invalid integer input.
Class Structure:
- Class structure for the Parser & Interpreter can be found in the
documenting header files.
- Token.h:
- TokenType (enum): This enum list is organized by the predetermined
numbering of tokens. I.e: program, begin, end...
- Token (class): A class for holding token information.
- Private Members:
- Type (TokenType): The type of token the `Data` represents.
- Data (std::string): The underlying data behind the token.
- LineNumber (int): The line number of this token.
- ColumnNumber (int): The column number of this token.
- Public Members:
- `Constructor` (Token): Creates an empty token.
- Getters for various members (defined above):
- getType (() -> TokenType)
- getData (() -> std::string)
- getLineNumber (() -> unsigned)
- getColumnNumber (() -> unsigned)
- setToken((TokenType, std::string) -> void): Just a simple setter for
the token properties.
- setLocation((unsigned L, unsigned C) -> void): Just a simple setter
for the token's location in the source file. L = line number; C =
column number.
- Tokenizer.h/cpp:
- Static Private Members:
- IdentifierMaxLength (static const unsigned): The maximum length of a
valid identifier in CORE.
- IntegerMaxLength (static const unsigned): The maximum length of a valid
integer in CORE.
- Private Members:
- CurrentToken (Token): The most recent token. Upon initialization this
is the undefined token type: `TokenType::undefined`.
- FileStream (std::ifstream): The file stream to be opened upon
initialization and closed on destruction.
- LineNumber (unsigned): The line number that the tokenizer is
processing. Incremented on line breaks. Line 1 is considered the first
line. Counting starts from 1. Not 0.
- ColumnNumber (unsigned): The column number that the tokenizer is
processing. Resets on line breaks. Increments on character
consumption. Column 1 is considered the first column. Counting starts
from 1. Not 0.
- NextIdentifier ((std::string) -> unsigned): Processes the next
identifier token from the file stream. The passed in `tokenString`
must contain a single uppercase character. This function processes up
to (but not including) a character that is non-alphanumeric. Identifier
Format = [A-Z]+[0-9]*
This function will throw if:
- Any of the characters after the initial character are non-uppercase
Ex: ABc123.
- Any of the characters after the initial character and all uppercase
characters are non-digits. Ex: ABC123X.
- The resulting identifier exceeds `IdentifierMaxLength`
Returns: The number of characters consumed. AKA The resultant token's
length.
- NextReservedToken ((std::string) -> unsigned): Processes the next
reserved token from the file stream. The passed in `tokenString` must
contain a single lowercase character. Reserved Token Format = [a-z]+
and must match one of the predefined tokens. This function will throw
if:
- Any of the characters after the initial character are non-lowercase
Ex: ABc123.
- The result matches none of the languages identifiers.
Returns: The number of characters consumed. AKA The resultant
token's length.
- NextInteger ((std::string) -> unsigned): Processes the next integer
token from the file stream. The passed in `tokenString` must contain a
single numeric character. This function processes up to (but not
including) a character that is non-alphanumeric.
Identifier Format = [0-9]+
This function will throw if:
- Any of the characters after the initial character are non-numeric
Ex: 123c, 123ABC, 1x4, 45e8.
- The resulting identifier exceeds `IntegerMaxLength`.
Returns: The number of characters consumed. AKA The resultant
token's length.
- consumeToken (() -> void): Internal method that retrieves the next
token from the file stream. Is called by public member `NextToken` and
utilizes: `NextIdentifier`, `NextInteger`, and `NextIdentifier`.
- Public Members
- `Constructor` ((std::string) -> Tokenizer): The only constructor for
the Tokenizer. Takes in a file path to a CORE source file and opens a
file stream with the initial token set to an undefined token.
- `Destructor` (() -> virtual): Destructor just closes the file stream.
- lineNumber (() -> unsigned): Returns the current line number of the
scanner.
- columnNumber (() -> unsigned): Returns the current line number of the
scanner.
- currentToken (() -> Token): Getter for the current token.
- NextToken (() -> void): Retrieves the next token from the file stream.
That token can then subsequently be read by calling
`Tokenizer::currentToken`. Handles error outputting (specifically line
numbers).
- isEOF (() -> bool): Whether the tokenizer (and file stream) has reached
the end of token output. End of file.
Testing / Bugs:
- Code was tested batch running against various valid and invalid CORE files
and comparing the output to given calculations of the resultant token
number stream; ensuring that code that was erroneous failed to produce a
token stream and halted with a reasonable error description.
- As of this latest version no outstanding bugs are present.
- For Parser project integration tests were introduced.