-
Notifications
You must be signed in to change notification settings - Fork 2
run a mapreduce on a filesystem hierarchy
License
jabrcx/fsmr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
fsmr is for running a mapreduce algorithm where the inputs are every entity in
a filesystem hierarchy. It is a combination of two excellent pieces of
software -- libcircle/libdftw[1-3] (for the map) and libmrmpi[4] (for the
reduce). It is a distributed algorithm using MPI and intended to run on
clusters with storage that scales well under such applications.
How do dftw and mrmpi complement each other in fsmr?
* dftw has no built-in way to combine and process output; mrmpi provides
the data structures and reduce algorithm for doing so
* mrmpi's built-in file tree walking is a serial step performed before the
map and is not designed to scale to processing hundreds of millions of
files like dftw is
* dftw's map more easily accommodates callbacks on all filesystem entities,
including directories -- it has the same interface as ftw(3)
To build libfsmr.so, install libdftw and mrmpi and run `make' here. See
examples/fsmr.du_by_owner/example.c for a simple example of how to write a
map() and reduce() and call fsmr; export TEST_ARGS=/some/path and run
`make test' in that directory.
[1] http://dl.acm.org/citation.cfm?id=2389114
[2] https://github.com/hpc/libcircle
[3] https://github.com/hpc/libdftw
[4] http://mapreduce.sandia.gov/
About
run a mapreduce on a filesystem hierarchy
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published