Skip to content

The Rice Comp413 class (2016-2017) implementation of HDFS. (This will eventually be put under an open source license, which one TBD).

Notifications You must be signed in to change notification settings

Rice-Comp413-2016/RDFS

Repository files navigation

Rice-HDFS

The current plan is to store all our code in this one repo, with separate directories for nameNode, dataNode, and code is needed by both (such as various protocols). We'll add more as we need them.

Check the wiki for documentation!

Development

  1. Install Virtualbox. Works with 5.1.
  2. Install Vagrant. Works with 1.8.5.
  3. Clone the repo: git clone https://github.com/Rice-Comp413-2016/Rice-HDFS.git
  4. cd Rice-HDFS
  5. vagrant up (takes 17 minutes from scratch for me)
    • I (Stu) had to "sudo" these commands
    • Make sure to do this from the repo directory (otherwise it asks for vagrant install)
  6. vagrant ssh.
  7. You should be in the development environment. Things to know:
    • The username is vagrant and the password is vagrant.
    • The machine has 1G of memory allocated. Change Vagrantfile if you need more.
    • The folder /home/vagrant/rdfs is synced from here (here being the location of this readme), meaning that all edits you make to files under the project are immediately reflected in the dev machine.
    • Hadoop binaries such as hdfs are on the PATH.
    • Google protobuf 3.0 is installed, you can run protoc to generate C++ headers from .proto specifications.
    • If you need external HTTP access, the machine is bound to the address 33.33.33.33.

Building

sudo apt-get install libboost-all-dev
sudo apt-get install libasio-dev 

mkdir build
cd build

cmake ..
make

You will see a sample executable placed in build/rice-namenode/namenode. The compiled protocols are in build/proto.

Testing

The Google Test framework is now included in the development environment. You may need to do vagrant destroy and vagrant up to install it. Tests should be placed in the home/vagrant/rdfs/test directory. After creating a new test file, you can modify the CMakeLists.txt file to create an executable to run those tests. There is a file, tests/run-all/run-all-tests.cc, that creates an executable running all tests. If you create a new test executable, modify this to add yours. There is currently a file in the test directory, tests.cc, with a sample test. You can run it by executing

cmake CMakeLists.txt
make
./runTests

in the test/ directory. A beginner's guide to using Google Test is located here

A githook has been added at rdfs/test/pre-commit. It's a shell script that will build and run the unit tests. To use it, copy the file to rdfs/.git/hooks. Then, before each commit is made the tests will run, and a failure will halt the commit. If this is too restrictive, renaming the file to pre-push will do the same thing only when you try to push.

Namenode: Run the namenode executable from build/rice-namenode. Then run something like hdfs dfs -fs hdfs://localhost:port/ -mkdir foo where port is the port used by the namenode (it will print the port used)

Datanode: Run the datanode executable from build/rice-datanode. Then run something like hdfs dfsadmin -shutdownDatanode hdfs://localhost:port/ where port is the port used by the datanode (it will print the port used)

If you want to do a quick end-to-end test, try the following to cat the file:

  1. Pull the code and build (as explained above).
  2. Run zookeeper (from ~, it’s sudo zookeeper/bin/zkServer.sh start). This will run in the background.
  3. Run namenode (rdfs/build/rice-namenode/namenode). This will run in the foreground.
  4. Run datanode (rdfs/build/rice-datanode/datanode). This will run in the foreground.
  5. Create a file with hdfs dfs -fs hdfs://localhost:5351 -copyFromLocal localFile /filename
  6. Try to cat that file with hdfs dfs -fs hdfs://localhost:5351 -cat /filename

Mocking

Whether you use Google Mock in conjunction with Google Test is up to you.

Google Mock should be used in conjunction with Google Test.

Google Mock is not a testing framework, but a framework for writing C++ mock
classes. A mock class is simplified version of a real class that can be created to aid with testing. However, Google Mock does do an automatic verification of expectations.

The typical flow is:

  1. Import the Google Mock names you need to use. All Google Mock names are in the testing namespace unless they are macros or otherwise noted.

  2. Create the mock objects.

  3. Optionally, set the default actions of the mock objects.

  4. Set your expectations on the mock objects (How will they be called? What will they do?).

  5. Exercise code that uses the mock objects; if necessary, check the result using Google Test assertions.

  6. When a mock objects is destructed, Google Mock automatically verifies that all expectations on it have been satisfied.

You should read through all of the Google Mock documentation located at (/googletest/googlemock/docs/) before using it:

  • ForDummies -- start here if you are new to Google Mock.
  • CheatSheet -- a quick reference.
  • CookBook -- recipes for doing various tasks using Google Mock.

About

The Rice Comp413 class (2016-2017) implementation of HDFS. (This will eventually be put under an open source license, which one TBD).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 12