GitHub - ad54/integrate_crawler_django: integrate crawler developed in scrapy with django

Hello Guys,

In this tutorial We are going to learn How To integrate crawler developed in scrapy with django.

Crawler (backend) : scrapy
Frontend : django
database : MySQl

We are going to make an app which takes keyword from user and extract the quotes related to that keyword from this site.

First We are going to make a crawer in scrapy. That will extract data and store it in database. The database used is MySQL. If you don't have idea about scrapy crawler please visit here https://docs.scrapy.org/en/latest/intro/tutorial.html

Now We will make a django project using command.

djangoadmin startproject quotes_scrape

Then we will create an app named 'request_handler'

cd quotes_scrape
python manage.py startapp request_handler

add name of the app 'request_handler' in list 'INSTALLED_APPS' in settings.py.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'request_handler'
]

go to views.py file in directory named 'request_handler'. Create one view for inserting keyword. For that we will create an html file (template) and render the same

def home(request):
    return render(request, "index.html")

Now we will create second view for handling the request and getting data and send json response. We pass keyword from django to scrapy. We set job_id , use the same job_id in scrapy for identify request and use the same job_id in django for getting data from database.

def handle_request(request):
    # get keyword from input
    keyword = request.GET.get('keyword')

    # create an object of scrapyd API
    scrapyd = ScrapydAPI("http://localhost:6800")
    request_time = datetime.datetime.now()

    # create a job id
    job_id = scrapyd.schedule(project='quotes_scrape', spider='quotes_crawler', keyword=keyword,
                              request_time=request_time)
    qry = f"select * from quotes where job_id = '{job_id}'"

    job_status = "running"

    values = []

    # check for job status
    while job_status != "finished":
        job_status = scrapyd.job_status(project='quotes_scrape', job_id=job_id)
        if job_status == 'finished':

            # database connection
            con = pymysql.connect(host="localhost", user="root", passwd="Arihant", db="quotes_scrape",
                                  cursorclass=pymysql.cursors.DictCursor)
            cursor = con.cursor()
            # get records from database of particular database
            cursor.execute(qry)

            values = cursor.fetchall()
            print(values)
        else:
            sleep(1)
    
    return JsonResponse(data=values, safe=False)

So for deploying our crawler we will use 'scrapyd' for installation you can refer https://scrapyd.readthedocs.io/en/stable/install.html#installing-scrapyd-generic-way

pip install scrapyd

if you get any error like 'windows scrapyd-deploy is not recognized' then can install

pip install git+https://github.com/scrapy/scrapyd-client

Now edit conf file of crawler : scrapy.cfg uncomment the line below [deploy:local]

start scrapyd using command

start scrapyd

You can check in browser by going to : http://localhost:6800/

deploy spider, you need write the name whatever in config file with deploy for e.g [deploy:local]

scrapyd-deply local

go to home page and enter keyword

after crawling complete you will get result in json format

Thank you,

Happy Coding

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
crawler/quotes_scrape		crawler/quotes_scrape
screenshot		screenshot
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ad54/integrate_crawler_django

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages