From 25111ab9f185ddbd6ab20baa0ceac4fb327e2469 Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Wed, 24 Jan 2024 14:25:45 -0800 Subject: [PATCH 1/6] Update README.md --- README.md | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ec112b4..080b95a 100644 --- a/README.md +++ b/README.md @@ -38,17 +38,38 @@ This repository contains the implementation of algorithms for detection and prev #### Step 1: Configure Database -1. Create database in MySQL ([Export files](https://drive.google.com/drive/folders/1CiCXU08zWgzI2VUKp1vEcadBkTJA6Lbb?usp=sharing), enabling database index and creating hash indexes can improve performance) +1. Create database in MySQL ([Export files](https://drive.google.com/drive/folders/1CiCXU08zWgzI2VUKp1vEcadBkTJA6Lbb?usp=sharing), enabling database index and creating hash indexes can improve performance) + + +**2024 Update** +**You'll need to change line 77 of `src/main/java/edu/policy/dbms/MySQLConnectionManagerDBCP.java` from** +``` +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); +``` +**to** +``` +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); +``` + 2. Update corresponding database info (*username, password, server and port number*) in the `mysql.properties` file under `resources/credentials/` directory. #### Step 2: Prepare Testcases -Use the test script **testcase_gen_tax.py** or **testcase_gen_hospital.py** to generate testcases on Tax or Hospital dataset. The generated testcases will be automatically placed under `testdata/testcases/` directory. +Use the test script **testcase_gen_tax.py** or **testcase_gen_hospital.py** to generate testcases on Tax or Hospital dataset. The generated testcases will be automatically placed under `testdata/testcases/` directory. + + +**2024 Update** +**The testcases folder is stored in the parent folder of this GitHub repo, you'll need to remember this as the current code looks in the wrong file location.** + > python testcase_gen_tax.py > > python testcase_gen_hospital.py + +**2024 Update** +**You'll need to edit the files generated by these python scripts the .json files contain values called `databaseName` which need to match the value that you used in step 1.** + For the scalability experiment (described in the extended version of our paper), we enable the binning-then-merging (*btm*) mode. Testcases can be generated using script **testcase_gen_hospital_scalability.py**. Our codebase is forward-compatible with test cases without btm mode. @@ -59,6 +80,18 @@ Our codebase is forward-compatible with test cases without btm mode. Under the working directory (`Tattle-Tale/`), use the following commands to install required dependencies and execute the program. **Experimental setting**: requiring at least 64 GB RAM [if not possible for limited computing environment, use *btm* mode (as in the scalability experiment) to reduce the memory requirement] + + +**2024 Update** +**You'll need to change line 25 of `src/main/java/edu/policy/execution/Experiment.java` from** +``` +private static final File testCaseDir = new File(System.getProperty("user.dir") + "/testdata/testcases"); +``` +**to** +``` +private static final File testCaseDir = new File(); +``` + > mvn clean install > From eb16b125f3416a8452f79bcfde0b0b507294bb06 Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Wed, 24 Jan 2024 14:52:35 -0800 Subject: [PATCH 2/6] Create Documentation.md This documents various issues I found, and more importantly how to deal with them. --- Documentation.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 Documentation.md diff --git a/Documentation.md b/Documentation.md new file mode 100644 index 0000000..1c5b5ba --- /dev/null +++ b/Documentation.md @@ -0,0 +1,9 @@ +Issues encountered while trying to run Tattle-Tale GitHub repo, on both Windows 11 OS Build 22621.3007 and Ubuntu 20.04 LTS. +The first issue is that the python scripts for generating testcases store their contents in the parent folder of the GitHub repo. This creates a breaking change in the Experiment.java class as the class looks for the testcases folder inside of the GitHub repo so the pathing fails. +Issues encountered while trying to run Tattle-Tale GitHub repo, on both Windows 11 OS Build 22621.3007. +The way that a connection to the MySQL database is established has a problem because when the connection is tested in `src/main/java/edu/policy/dbms/MySQLConnectionManagerDBCP.java` at line 77 +``` +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); +``` +when it looks for a database named `mysql` this causes the runtime execution to hard crash because it can't connect. Additionally even if the database name exists in the database there will still be hard crashes during the runtime. This is because the python test scripts instantiate the name of a data base and set it to a value, i.e. hospitaldb, and for each test run the program then tries to use that database name which if the name isn't in the running MySQL server then the program fails to connect. Either way someone pulling the repo has to manually make these changes to get the repo to execute without crashing. +There is an issue where the readme doesn't fully explain the amount of either tables or permissions that the default `Kirby` user needs to complete the experiment. If someone, i.e. like I did, just gives the default user the ability to `SELECT` from tables in the database when the repo runs there is an issue where it tries to locate the table `temp` but when it can't find this table it will continue. This creates confusing behavior because the program can execute but it will always find the exact same cuesets for each algorithm, full den and k-percentile. I haven't quite fully debugged this part yet but I hope to have it finished before Monday. \ No newline at end of file From 79aa59f02360355f5e8d63a87daeef6ac3a76e79 Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Mon, 29 Jan 2024 20:12:01 -0800 Subject: [PATCH 3/6] Additions to md files There were some things I hadn't yet tested regarding changing the permission settings for the user named in the mysql.properties file. I also found a new issue when the permissions had been boosted that made the program crash so I made note of that in documentation.md --- Documentation.md | 3 ++- README.md | 7 ++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/Documentation.md b/Documentation.md index 1c5b5ba..e18d696 100644 --- a/Documentation.md +++ b/Documentation.md @@ -6,4 +6,5 @@ The way that a connection to the MySQL database is established has a problem bec dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); ``` when it looks for a database named `mysql` this causes the runtime execution to hard crash because it can't connect. Additionally even if the database name exists in the database there will still be hard crashes during the runtime. This is because the python test scripts instantiate the name of a data base and set it to a value, i.e. hospitaldb, and for each test run the program then tries to use that database name which if the name isn't in the running MySQL server then the program fails to connect. Either way someone pulling the repo has to manually make these changes to get the repo to execute without crashing. -There is an issue where the readme doesn't fully explain the amount of either tables or permissions that the default `Kirby` user needs to complete the experiment. If someone, i.e. like I did, just gives the default user the ability to `SELECT` from tables in the database when the repo runs there is an issue where it tries to locate the table `temp` but when it can't find this table it will continue. This creates confusing behavior because the program can execute but it will always find the exact same cuesets for each algorithm, full den and k-percentile. I haven't quite fully debugged this part yet but I hope to have it finished before Monday. \ No newline at end of file +There is an issue where the readme doesn't fully explain the amount of either tables or permissions that the default `Kirby` user needs to complete the experiment. If someone, i.e. like I did, just gives the default user the ability to `SELECT` from tables in the database when the repo runs there is an issue where it tries to locate the table `temp` but when it can't find this table it will continue. This creates confusing behavior because the program can execute but it will always find the exact same cuesets for each algorithm, full den and k-percentile. I haven't quite fully debugged this part yet but I hope to have it finished before Monday. +It seems that these issues can be partially fixed by giving all privileges to the Kirby user. However I ran into a problem that I have yet to fix where the program runs out of heap memory. I tried to solve this using the settings in IntelliJ to increase the available memory but the program crashed before it reached the memory limit, about 10 GB. \ No newline at end of file diff --git a/README.md b/README.md index 080b95a..a165dd6 100644 --- a/README.md +++ b/README.md @@ -48,10 +48,15 @@ dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); ``` **to** ``` -dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); ``` 2. Update corresponding database info (*username, password, server and port number*) in the `mysql.properties` file under `resources/credentials/` directory. + + +**2024 Update** +**In addition you need to change the user privileges for the default account named in the mysql.properties file to have all privileges on the database value you set in step 1.** + #### Step 2: Prepare Testcases From 2190684ffbdfa88138da5b8e9d9e52f2f608bf85 Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Thu, 1 Feb 2024 22:32:30 -0800 Subject: [PATCH 4/6] Updated Documentation.md Documentation is more up to date should have all the issues I encountered in it. --- Documentation.md | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/Documentation.md b/Documentation.md index e18d696..d5b1232 100644 --- a/Documentation.md +++ b/Documentation.md @@ -1,10 +1,33 @@ Issues encountered while trying to run Tattle-Tale GitHub repo, on both Windows 11 OS Build 22621.3007 and Ubuntu 20.04 LTS. The first issue is that the python scripts for generating testcases store their contents in the parent folder of the GitHub repo. This creates a breaking change in the Experiment.java class as the class looks for the testcases folder inside of the GitHub repo so the pathing fails. -Issues encountered while trying to run Tattle-Tale GitHub repo, on both Windows 11 OS Build 22621.3007. +Issues encountered while trying to run Tattle-Tale GitHub repo, on Windows 11 OS Build 22621.3007. The way that a connection to the MySQL database is established has a problem because when the connection is tested in `src/main/java/edu/policy/dbms/MySQLConnectionManagerDBCP.java` at line 77 ``` dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); ``` -when it looks for a database named `mysql` this causes the runtime execution to hard crash because it can't connect. Additionally even if the database name exists in the database there will still be hard crashes during the runtime. This is because the python test scripts instantiate the name of a data base and set it to a value, i.e. hospitaldb, and for each test run the program then tries to use that database name which if the name isn't in the running MySQL server then the program fails to connect. Either way someone pulling the repo has to manually make these changes to get the repo to execute without crashing. +when it looks for a database named `mysql` this causes the runtime execution to hard crash because it can't connect. Additionally even if the database name exists in the database there will still be hard crashes during the runtime. This is because the python test scripts instantiate the name of a data base and set it to a value, i.e. hospitaldb, and for each test run the program then tries to use that database name which if the name isn't in the running MySQL server then the program fails to connect. Either way someone pulling the repo has to manually make these changes to get the repo to execute without crashing. I think I had this same issue when I initially tried to use the tax mysql script to There is an issue where the readme doesn't fully explain the amount of either tables or permissions that the default `Kirby` user needs to complete the experiment. If someone, i.e. like I did, just gives the default user the ability to `SELECT` from tables in the database when the repo runs there is an issue where it tries to locate the table `temp` but when it can't find this table it will continue. This creates confusing behavior because the program can execute but it will always find the exact same cuesets for each algorithm, full den and k-percentile. I haven't quite fully debugged this part yet but I hope to have it finished before Monday. -It seems that these issues can be partially fixed by giving all privileges to the Kirby user. However I ran into a problem that I have yet to fix where the program runs out of heap memory. I tried to solve this using the settings in IntelliJ to increase the available memory but the program crashed before it reached the memory limit, about 10 GB. \ No newline at end of file +It seems that these issues can be partially fixed by giving all privileges to the Kirby user. However I ran into a problem that I have yet to fix where the program runs out of heap memory. I tried to solve this using the settings in IntelliJ to increase the available memory but the program crashed before it reached the memory limit, about 10 GB. +I tried using the btm script to deal with the memory heap issues but they still happened even when I gave IntelliJ 22 GB of available memory. +Current Fixes: +1. Change line 77 of `src/main/java/edu/policy/dbms/MySQLConnectionManagerDBCP.java` from** +``` +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); +``` +to +``` +dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); +``` +2. In addition you need to change the user privileges for the default account named in the mysql.properties file to have all privileges on the database value you set in step 1. + +3. The testcases folder is stored in the parent folder of this GitHub repo, if you run the python script from the main repo folder, you'll need to remember this as the current code looks in the wrong file location. + +4. You'll need to edit the files generated by these python scripts the .json files contain values called `databaseName` which need to match the value that you used in step 1. + +5. You'll need to change line 25 of `src/main/java/edu/policy/execution/Experiment.java` from +``` +private static final File testCaseDir = new File(System.getProperty("user.dir") + "/testdata/testcases"); +``` +to +``` +private static final File testCaseDir = new File(); \ No newline at end of file From e3b14445afda5f31508681c98d1b487c8f2c1c61 Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Fri, 2 Feb 2024 18:41:56 -0800 Subject: [PATCH 5/6] Updates to Documentation I updated the Documentation and Readme so that it should be possible to run any of the testcases generated with the testscripts. --- Documentation.md | 6 +++-- README.md | 64 ++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 63 insertions(+), 7 deletions(-) diff --git a/Documentation.md b/Documentation.md index d5b1232..f193843 100644 --- a/Documentation.md +++ b/Documentation.md @@ -20,7 +20,7 @@ dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, ``` 2. In addition you need to change the user privileges for the default account named in the mysql.properties file to have all privileges on the database value you set in step 1. -3. The testcases folder is stored in the parent folder of this GitHub repo, if you run the python script from the main repo folder, you'll need to remember this as the current code looks in the wrong file location. +3. Change line 278 of your python 4. You'll need to edit the files generated by these python scripts the .json files contain values called `databaseName` which need to match the value that you used in step 1. @@ -30,4 +30,6 @@ private static final File testCaseDir = new File(System.getProperty("user.dir") ``` to ``` -private static final File testCaseDir = new File(); \ No newline at end of file +private static final File testCaseDir = new File(); + +6. I heard from the developer that some of these settings could be fixed by running the testscripts in the testscript folder and by editing the testscripts by changing what they name the database. I also learned that the taxes table shouldn't have issues with the heap size so that should give us a way to move forward. \ No newline at end of file diff --git a/README.md b/README.md index a165dd6..7a527c6 100644 --- a/README.md +++ b/README.md @@ -50,6 +50,7 @@ dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); ``` dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); ``` +**Alternatively you could create two databases named hospitaldb and taxdb to store the hospital and taxes tables respectively.** 2. Update corresponding database info (*username, password, server and port number*) in the `mysql.properties` file under `resources/credentials/` directory. @@ -64,17 +65,70 @@ Use the test script **testcase_gen_tax.py** or **testcase_gen_hospital.py** to g **2024 Update** -**The testcases folder is stored in the parent folder of this GitHub repo, you'll need to remember this as the current code looks in the wrong file location.** +**Be sure to run the testscripts from the testscript folder in the GitHub repo. You will also need to change the python testscript to set the correct database for the java classes to connect to.** +**If you're using** +      **testcase_gen_tax.py** +      **change line 13 from** +``` +DCFileName = "/testdata/taxdb_constraints.txt" +``` +      **to** +``` +DCFileName = "/testdata/taxdb_constraints_noPBD.txt" +``` +      **and line 278 from** +``` +database_name = "taxdb" +``` +      **to** +``` +database_name = "" +``` +      **testcase_gen_tax_btm.py** +      **change line 235 from** +``` +database_name = "taxdb" +``` +      **to** +``` +database_name = "" +``` +      **testcase_gen_hospital.py** +      **change line 280 from** +``` +database_name = "hospitaldb" +``` +      **to** +``` +database_name = "" +``` +      **Additionally you need to create a table called hospital10k in your given database. This table can be a duplicate of your hospital table but with only 10,000 of the hospital table records. This will work if you `DELETE FROM hospital10k WHERE tid > 10000`. I have not tested if any combination of 10,000 hospital table records are also compatible. In addition you must set up the jvm environment to have sufficient space on the jvm and the heap. You will need to allocate over 10 GB (exact size not confirmed) of heap space to complete this test.** + +      **testcase_gen_hospital_btm.py** +      **change line 236 from** +``` +database_name = "hospitaldb" +``` +      **to** +``` +database_name = "" +``` +      **testcase_gen_hospital_scalability.py** +      **change line 192 from** +``` +database_name = "hospitaldb" +``` +      **to** +``` +database_name = "" +``` + > python testcase_gen_tax.py > > python testcase_gen_hospital.py - -**2024 Update** -**You'll need to edit the files generated by these python scripts the .json files contain values called `databaseName` which need to match the value that you used in step 1.** - For the scalability experiment (described in the extended version of our paper), we enable the binning-then-merging (*btm*) mode. Testcases can be generated using script **testcase_gen_hospital_scalability.py**. Our codebase is forward-compatible with test cases without btm mode. From b727efa6fb74dc77150c94756af75e355a6f069b Mon Sep 17 00:00:00 2001 From: Plexyplox Date: Fri, 2 Feb 2024 18:44:47 -0800 Subject: [PATCH 6/6] Update README.md There was an oopsie. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7a527c6..ea8fe9a 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ dataSource.setUrl(String.format("jdbc:mysql://%s:%s/mysql", SERVER, PORT)); ``` dataSource.setUrl(String.format("jdbc:mysql://%s:%s/", SERVER, PORT)); ``` -**Alternatively you could create two databases named hospitaldb and taxdb to store the hospital and taxes tables respectively.** +**Alternatively you could create two databases named hospitaldb and taxdb to store the hospital and taxes tables respectively.** 2. Update corresponding database info (*username, password, server and port number*) in the `mysql.properties` file under `resources/credentials/` directory.