This is our implementation for the course project of CSC3170, 2022 Fall, CUHK(SZ). For details of the project, you can refer to project-description.md. In this project, we will utilize what we learned in the lectures and tutorials in the course, and implement either one of the following major jobs:
- Application with Database System(s)
- Implementation of a Database System
Our team consists of the following members, listed in the table below (the team leader is shown in the first row, and is marked with 🚩 behind his/her name):
| Student ID | Student Name | GitHub Account (in Email) | GitHub Username |
|---|---|---|---|
| 120090194 | 桂驰 🚩 | 120090194@link.cuhk.edu.cn | @Penguin-N |
| 120090521 | 庞嘉扬 | 120090521@link.cuhk.edu.cn | @PJYasuna |
| 120090117 | 赵思远 | 120090117@link.cuhk.edu.cn | @ZhaoSiyuan120090117 |
| 120090671 | 从御政 | 120090671@link.cuhk.edu.cn | @Paligi |
| 120090026 | 孙思鹏 | 120090026@link.cuhk.edu.cn | @Sun8731 |
| 121040084 | 钟嘉乐 | 121040084@link.cuhk.edu.cn | @MikeZhong21 |
After thorough discussion, our team made the choice and the specification information is listed below:
- Our option choice is: Option 3
- For the progess summary, please refer to Project-Specification.md
- Build a new directory.
- Put source/direcotry, source/makefile, source/testing/files together. For example, Put db61b,makefile,Shooter.db,WroldCupGroups.db these four files together to test the data from the worldcup.
- Run "make" to compile.
- Run "java db61b.Main" to run the DBMS.
This project aims to implement a relational database management system. The DBMS consists of tables with row(s) and column(s), and query language to retrieve relevant information from the database. The project reflects our understanding of the low-level working mechanism of the database management system. Our implementation includes creation, deletion, updating of tables (more specifically, insertion of rows and etc.), and selection of desired information by stating conditions. Other than that, the DBMS will evaluate the SQL commands and return relevant messages if errors occur for the purpose of reminding. For ease of implementation, the "join" statement can be viewed as natural join by default. The implementation will also include the retrieval of information from multiple tables. To test whether the implemented DBMS works properly and correctly or not, test cases will be provided for the enhancement of our DBMS and better user understanding. Additionally, this project will extend some additional methods in order to re-implement Assignment 2 such as group by, order by and etc.
We have finished the basic implementation of the database management system. i.e, we have passed the tests provided by CS61B and meet its requirement. We intend to realize some new functions in the following days, like order by, delete, etc. We finished the program on 12/10.
As "Project Skeleton" demonstrates, the project is divided into ten classes. Some important classes will be introduced. The main class serves as the entry point and top-level control for the project. It is responsible for managing the overall flow of the program and coordinating the actions of the other classes. The command interpreter class is responsible for handling user input and communicating with the database class. It receives commands from the user and passes them on to the database class for processing. The database class contains tables, which are represented by the table class. The table class is responsible for performing various database operations, such as inserting, updating, and deleting rows, as well as interacting with files to read and write data. The row and column classes represent individual elements within the table, such as a specific row or column in a spreadsheet. These classes are used to store and manipulate data within the table. The jtable and jTableImg classes are used for the visualization of table.
Notice: In the section "statement structure", we use Backus–Naur Form (BNF) for the ease of understanding our version of SQL language structure.
Statement Structure: create table <name> <table definition>;
Notice: <table definition> ::= ( <column name>+, ) | as <select clause>
In the function “tabledefinition”, the program will first use array list to store the column titles (column names). Then, create and initialize a new table object containing the column titles. If the statement contains “as”, the program will execute the function “selectclause” and select the corresponding information from the table to form a new table. Finally, put the new table and its name into the database.
Statement Structure: insert into <table name> values <literal>+, ;
First, the program will add the literals to the value array list. Then, create a new row to hold these values(literals) (More specifically, transform the value array list to the type Row). Finally, add the new row to the corresponding table.
Statement Structure: remove from <table name> <condition clause>;
“removeRowStatement”: the function is similar to the “select” except that it will delete rows satisfying the conditions of the specific table.
Statement Structure: select <column name>+, from <table name>+, <condition clause>;
First, the program creates three array lists to store selected column titles (two array lists to obtain distinct column title and prevent duplicated column title) and the name of aggregate functions if there is any.
Then, by using the functions provided by the tokenizer class to check if there are any aggregate functions (“avg”, “sum”, “min”, “max”, “count”), if there is aggregate function(s) in the statement, do the corresponding operation(s). At the same time, the program will store the selected column name(s) in the array list. Then the program will obtain targeted table(s) after the “from” statement.
If the statement contains “where”, “group by”, “having”, “order by”, the program will do the corresponding operation(s) until it ends with “;”.
Statement Structure: column_minus <table name>: <column name1> and <column name2> to <column name3>;
Notice: <column name3> ::= <column name1> - <column name2>
First, the program will check if the first token of the string is "column_minus". Then, the program will get the table name (suppose the size of the table is m*n), two column names in the table, and the result column name after the minus operation. After that, the program will call the function "columnMinusCluase" to do the minus operation. If the values of the two columns are not digits, the program will raise the error reminder. Then, the program will iterate each row in the table so that it can combine the original values of the row with the minus result value to form a new row. All the new rows together form a new table (size of the new table: m*(n+1)). Finally, the new table will be printed.
Statement Structure: column_plus <table name>: <column name1> and <column name2> to <column name3>;
Notice: <column name3> ::= <column name1> + <column name2>
First, the program will check if the first token of the string is "column_plus". Then, the program will get the table name (suppose the size of the table is m*n), two column names in the table, and the result column name after the plus operation. After that, the program will call the function "columnPlusCluase" to do the plus operation. If the values of the two columns are not digits, the program will raise the error reminder. Then, the program will iterate each row in the table so that it can combine the original values of the row with the plus result value to form a new row. All the new rows together form a new table (size of the new table: m*(n+1)). Finally, the new table will be printed.
First, use a string list to record the type of aggregate functions for each column. Then assume there is no aggregate function and do a similar process until after group by. Then for each column, do the corresponding operations.
Statement Structure: where <column name> <relation> <column name> | constant
In the class “Condition”, the program will get three variables: column1, relation, and column2 or constant. The relation is one of the symbols: "<, >, =, <=, >=, !=". And then the program will filter the data matching the conditions.
Statement Structure: select <column name>+, from <table name> group by <column name>;
Notice: In most cases, select multiple columns but group by one column often accompany with aggregate function(s).
First, the program checks if the next token in the input is the string "group" and if the following token is the string "by". It then calls a function called name() to get the name of the column to group the rows by.
Then the program checks that the column obtained from name() is a "normal" aggregate type and that all other return columns are not "normal" aggregate types. If either of these conditions is not met, an error is thrown with the message "Group by is error format!".
After that, the program creates a new Column object with the name of the group column and the select table as arguments. It then creates a new LinkedHashMap called groups to store the different groups.
Finally, the code iterates over each row in the select table and for each row, it gets the value in the group column. It then checks if there is already a group with that value in the groups map. If there is not, it creates a new Table object with the column titles and adds the row to it. If there is already a group with that value, it adds the row to the existing group's table.
Statement Structure: select <column name>+, from <table name> group by <column name> having <condition clause>;
The program creates a new table called having_table using the Table class and initializes it with the column titles from a table called group_table. It then creates an ArrayList of Condition objects called having_conditions and another ArrayList of String objects called column_titles_.
The program iterates through the column titles in group_table and adds them to column_titles_. After that, it calls a method called conditionClause and passes in group_table as an argument, and assigns the return value to having_conditions.
Finally, it calls the select method on group_table and passes in column_titles_ and having_conditions as arguments, and assigns the return value to having_table. It then assigns having_table to group_table.
Statement Structure: select <column name>+, from <table name>+, order by <column name>;
The program iterates through the table n times (n is the number of the rows) to find the maximum or minimum (if the statement includes "desc" after order by <column name>) row in each iteration, then add the row to a new table and remove the maximum or minimum (if the statement includes "desc" after order by <column name>) row from the table.
We use the jtable library to visualize the table. The different countries will be shown with their corresponding national flags. To implement this, we name the flag picture files with their country names. Thus, we can relate each country to its flag picture.
- Read the whole introduction carefully
- Perfect each small program
- Overall debugging modification
Suppose group by x, where x is the column name.
- First, deal with a simple situation: only aggregate functions without group by
- Then, according to column x, divide the table into several sub-tables. Then deal with each sub-table (To deal with each sub-table, group by can be ignored since the values of column x are the same. Therefore, using the step 1 approach to dealing with it)
- Finally, integrate the results of each sub-table into a new table
Use one string list to record the aggregate function type of each column. Then for each column, do the corresponding operation.
Learn the knowledge about GitHub, and divide the task into small tasks. Then each teammate does one of the small tasks. finally, integrate what we have done.
Every student in our group gives plays to their strengths and helps each other. Every student tried his best. It is difficult to list every tiny task and individual contributions. So we're going to omit it. Thanks for your understanding.
