-
Notifications
You must be signed in to change notification settings - Fork 3
Home
#Welcome
Welcome to the remap wiki! Here you'll be able to explore the design of remap, see some examples and find out how to install and get started quickly.
The design is located here: Design
More design of a more technical nature
A couple of example implementations
Customizing remap is possible.
##Help wanted
This project has come to a stage where it's important to get new people on board. Here are open general tasks to work on. If you want to help out, :
- Improve the design and plumbing
- Improve reliability, task failure handling and reporting
- Improve installation process
- UI design for a dashboard app (nodejs + whatever framework)
- General UI work: logo, look & feel, icons, interaction analysis
- Development of some new examples for map/reduce or vertex
- Reliability testing in larger clusters
##Community
You can subscribe to a mailing list over at freelists.org:
http://www.freelists.org/list/remap
##Install dependencies
There is a minimal of dependencies for remap. Most are available on Ubuntu 14.10 through apt.
- python3.4
- nanomsg
- python3-pybonjour
This is all pure python code, so you should be able to run on windows, mac osx or linux. On linux and windows, you need to have avahi or bonjour-equivalent things installed, on osx this just works.
Ubuntu 14.10
> sudo apt-get install libnanomsg0 python3.4
If nanomsg is not available (14.04?), try:
> cd /tmp
> git clone https://github.com/nanomsg/nanomsg.git
> cd nanomsg
> ./autogen.sh
> ./configure
> make
> sudo make install
Then install the following python packages:
> sudo apt-get install python3-pip
> sudo python3.4 -m pip install nanomsg flask flask-simple-api
> cd /tmp
> git clone https://github.com/depl0y/pybonjour-python3.git
> cd pybonjour-python3
> sudo python3.4 setup.py install
Now your machine is ready to run the remap code.
OSX
Download a package of python3.4 from the python website. This gets installed into its own environment, because osx ships with python2.7. Remap does not work with python 2.7.
https://www.python.org/downloads/
Install the .pkg file. You should now be able to run python in a shell if you make some changes:
> unset PYTHONPATH
> ./python3.4
To get the other packages, download latest version of nanomsg from here:
http://nanomsg.org/download.html
> mkdir ~/build
> cd ~/build
> mkdir nanomsg
> cd nanomsg
> tar -zxvf ~/Downloads/nanomsg-0.5-beta.tar.gz
> cd nanomsg-0.5
> ./configure
> make
> sudo make install
Then install some python packages:
> unset PYTHONPATH
> git clone https://github.com/tonysimpson/nanomsg-python.git
> cd nanomsg-python/
> sudo python3.4 setup.py build_ext --include-dirs /usr/local/include/ --library-dirs=/usr/local/lib/
> python3.4 setup.py install
> cd ~/build
> git clone https://github.com/depl0y/pybonjour-python3.git
> cd pybonjour-python3
> python3.4 setup.py install
> pip3.4 install flask flask-simple-api
Find a directory where you want to create your remap directory in, then:
> git clone https://github.com/gtoonstra/remap.git
> cd remap
> ls
Let's call this directory remap_git_dir as a future reference.
##Quick test run
A quick test run can be done in 3 steps:
Set up an environment. Remap uses a specific directory layout (see design) where files are kept. There's a script available that sets up this environment to save you the trouble and time.
From within the "remap" directory you ended up in the previous step, the root git directory:
> python3.4 prepare_env.py
This script is indeed only compatible with python3.4, other versions won't work
The script asks you for another location where you can set up this sample directory tree. Of course, the root of that new directory must be writable by the user. Type a directory of your choice, hit enter, confirm with 'y' and the system will be set up for you.
Let's call this remap_data_dir
Start 3 daemons
Start three shells and go into the remap git directory. Then in each shell, start a different daemon process.
First shell:
> cd <remap_git_dir>
> cd daemons
> cd broker
> python broker_daemon.py <_remap_data_dir_>
Second shell:
> cd <remap_git_dir>
> cd daemons
> cd node
> python node_daemon.py <_remap_data_dir_>
Third shell:
> cd <remap_git_dir>
> cd daemons
> cd initiator
> python initiator.py <_remap_data_dir_>
We're now ready to run the first test.
Start an app on remap
From the environment you created in step 1, start a shell. Then go into the 'test' directory:
> cd <remap_data_dir>
> cd test
Keep the console with the "node_daemon" visible, it will show the log messages.
Run any of the test files in there. The node daemon shows how it receives the request to run an app, it creates a 'processing core' for each map or reduce action and processes the run.
Verification
This information is outdated and new scripts are being created. You should now start jobs through a REST api which the initiator exposes.
Using a REST API client like "Advanced Rest Client" in Chrome, submit the following json data to the URL:
POST: http://127.0.0.1:5000/jobs/start
{ "priority": 5, "app":"wordcount", "parallellism": 4, "inputdir":"gutenberg", "type":"mapper" }
You can reduce that job by starting a reducer. You need the jobid for that:
{ "priority": 5, "app":"wordcount", "parallellism": 4, "type":"reducer", "jobid":"<whatever-the-mapper-job-id-was-for-the-mapper", "outputdir":"wordcount-results" }
Check the data and job directories for the generated results.
I'm very interested in hearing results from people who ran remap on a couple of nodes, their experiences, the performance. Want to work together on a project? Drop me a line!