Stop writing multiple parser scripts for parsing different websites. With Parsify you can have a single few lines script and the configuration file to fit your parser to different websites.
pip install parsify
Make sure you have your configuration file (usually handbook.json) ready.
import parsify as pf
# Create Parsify engine
ngn = pf.Engine(handbook='handbook.json')
# Run a single step
# Provide step name as an argument
# Should be in Engine.current_parser
# Should not have any "dynamic_variables" when custom using this method
# By default Engine.current_parser is the first parser in the Handbook
step_result = ngn.stepshot(step='get_products')
# print(step_result)
# Parse a single website (must be configured in "handbook.json")
# Provide scope name as an argument
scope_result = ngn.scopeshot(parser='example.com')
# print(scope_result)
# Run all the parsers that are configured in "handbook.json"
final_result = ngn.parse()
# print(final_result)- Handbook file should start with "parser" key value of which is the array of parsers.
- Each parser in the array should have two keys:
- "scope" - String: Name of the parser. Usually website name, i.e. "example.com".
- "steps" - Array: Steps to parse.
- Each step should have at least following fields:
- "name" - String: Unique name of the step. This field will make possible to access this step's results and dynamic variables in the proceeding steps (if needed).
- "chain_id" - Integer: Steps with the same chain id will be executed as a sequence of steps on every iteration.
- "url" - String: Target url of the request(s) for the current step.
- "method" - String: Request method for the current step.
- "output_path" String: Path of the result data in response. Use dots if it's multi-nested, for example, if needed result is in response -> "data" -> "products", "output_path" should be "data.products".
- "output" Dictionary:
Distributed under the MIT License. See LICENSE file for more information.
Luka Sosiashvili - @lukasanukvari - luksosiashvili@gmail.com
Project Link: https://github.com/lukasanukvari/parsify
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.