A framework for multi-thread pipeline process
This software supports performing multi-thread pipeline processing for sequences of texts or objects by using "ChamberLang." First of all, let's look at the following simple example.
Read:file="./input" > inputdata
Write:file="./output" < inputdata
Please save this code as ./example-code, and prepare a text file named ./input with several lines.
To run this script:
$ mt-chamber.py --threads 2 ./example-code
Then, a file named ./output that has same contents with ./input will be generated.
You can interpret this script as follows:
Readcommand reads lines of./inputone by one, and outputs them to a variableinputdata.Writecommand writes contents ofinputdatato./outputone by one.
The point we should notice is that Write will begin before Read is finished.
Read passes data immediately after it reads each line.
You can specify the following arguments to mt-chamber.py:
--threads: Number of jobs.--unsrt-limit: This value affects the size of queues used to transfer data between processes. It should be large enough to--threads.--prompt: Prompt mode. Displays an interactive screen during running to show progress and to debug a script.FILE: A script file to run. If not set, standard input is read. If the prompt mode is enabled, standard input will be used for the prompt mode, soFILEhas to be specified.
A line of ChamberLang script generally consists of a command name, options, input and output variables.
If a backslash \ is contained at end of line, that line will be concatenated with a next line.
If # is contained, after that symbol will be interpreted as comments.
Command:option:option... < input variables > output variables
Input and output does not have to one variable like UNIX shell commands. Some commands receive and export multiple data. Available characters for I/O variables are alphanumerics and underscore, and must not start from the numbers.
Options are specified using OptionName=value forms, and multiple options are separated with :.
If values are omitted, values are automatically interpreted to True.
If you specify string values, they are quoted by " " or ' '.
Here is an example of using LengthCleaner command contained in plugins:
# Read files
Read:file="./en.tok" > en_tok
Read:file="./ja.tok" > ja_tok
# Clean lines by number of words
LengthCleaner:maxlen1=80:maxlen2=80 < en_tok ja_tok > en_clean ja_clean
# Write files
Write:file="./en.clean" < en_clean
Write:file="./ja.clean" < ja_clean
In this example, it reads two files ./en.tok and ./ja.tok which are corresponding to each line, and if one of both lines are longer than 80 words, these pairs are removed.
Using *, you can specify the number of threads for each command individually.
# Next command will be run on three threads regardless of --threads argument
LengthCleaner *3 < en_tok ja_tok > en_clean ja_clean
If you want to specify many options and write them in the script, its readability becomes low.
To avoid that problem, you can use Alias statement.
Alias can name long statement by using a short string.
Alias MyCleaner LengthCleaner:maxlen1=80,maxlen2=80
MyCleaner < en_tok ja_tok > en_clean ja_clean
Alias replaces any names before interpreting scripts.
Please refer Command reference.
Next, let's take a look at the method to define commands using python.
To define the new command, place a python file on plugins/CommandName.py.
In this file, you will define a Command class as follows:
class Command:
# Settings variable
InputSize = 1
OutputSize = 1
MultiThreadable = True
ShareResources = False
def __init__(self, [threads], options...):
::::
def routine(self, [thread_id], instream):
::::
def hook_prompt(self, statement):
::::
def kill(self):
::::
def __del__(self):
::::- InputSize: The size of input tuple.
- OutputSize: The size of output tuple.
- MultiThreadable: If
True, this command is run on multi-threads. If it is difficult to run on multi-threads like file reader/writer, please set toFalse. - ShareResources: If
True, resources are shared by all threads. If you want to create instance for each thread, please set toFalse.
In Command class, you have to define at least routine function. In addition, you will define other functions.
__init__: Called when an instance ofCommandclass is created. Instances are created specified numbers byMultiThreadableandShareResources. IfMultiThreadableisFalseorShareResourcesisTrue, it will be generated once. Otherwise, it will be generated for each thread. Inoptions..., you can define options as normal arguments. IfMultiThreadableisTrueandShareResourcesisTrue, this function takesthreadsargument that contains the number of threads. Otherwise, it does not take that.routine: Called when the command received data.instreamis a tuple of input data, and this function will return output data as a tuple. If it returnsNoneinstead of a tuple, this command will be finished and notify it to other commands. IfMultiThreadableisTrueandShareResourcesisTrue, this function takesthread_idargument. Otherwise, it does not take that.hook_prompt: Called when a command is input in the prompt mode.statementis a list of a command and arguments.kill: Called whenkillcommand is input in the prompt mode.__del__: Called when the script is finished and an instance is discarded.
InputSize and OutputSize can be defined as a function.
class Command:
def InputSize(self, size):
::::
raise Exception(...)
::::
::::If it is defined as a function, it will get actual number of given variables as size argument.
If size is incorrect, it raises an exception.
example/example-script is a large example, and there are many example plugins in plugins directory. Please refer them.
If mt-chamber.py is run with --prompt, it shows interactive interface.
In this mode, you can input commands following to >>> .
You can run the following commands by default.
-
watch: Shows variables specified inWatchin script.watch [name...]Arguments Description name...Watch's name you want to show. If it is empty, all watches are shown. -
pause: Pauses script.pause -
start: Restarts paused script.start -
exit: If the script is finished, closes the prompt. You can also do it by CTRL+D.exit -
kill: Stops script forcely.kill