.. image:: https://img.shields.io/pypi/v/scrapy-djangoitem.svg :target: https://pypi.python.org/pypi/scrapy-djangoitem :alt: PyPI Version
.. image:: https://img.shields.io/travis/scrapy-plugins/scrapy-djangoitem/master.svg :target: http://travis-ci.org/scrapy-plugins/scrapy-djangoitem :alt: Build Status
scrapy-djangoitem is an extension that allows you to define
Scrapy items <http://doc.scrapy.org/en/latest/topics/items.html> using existing
Django models <https://docs.djangoproject.com/en/latest/topics/db/models/>.
This utility provides a new class, named
DjangoItem, that you can use as a
regular Scrapy item and link it to a Django model with its
attribute. Start using it right away by importing it from this package::
from scrapy_djangoitem import DjangoItem
Python 2.7 and
Python 3.4/3.5 are
Python 3 you need
Scrapy v1.1 or above.
Latest tested Django version is
pip install scrapy-djangoitem
DjangoItem is a class of item that gets its fields definition from a
Django model, you simply create a
DjangoItem and specify what Django
model it relates to.
Besides of getting the model fields defined on your item,
provides a method to create and populate a Django model instance with the item
DjangoItem works much like ModelForms in Django, you create a subclass
and define its
django_model attribute to be a valid Django model. With this
you will get an item with a field for each Django model field.
In addition, you can define fields that aren't present in the model and even override fields that are present in the model defining them in the item.
Let's see some examples:
Creating a Django model for the examples::
from django.db import models class Person(models.Model): name = models.CharField(max_length=255) age = models.IntegerField()
Defining a basic
from scrapy_djangoitem import DjangoItem class PersonItem(DjangoItem): django_model = Person
DjangoItem works just like Scrapy items::
'name'] = 'John' p['age'] = '22'p = PersonItem() p[
To obtain the Django model from the item, we call the extra method
DjangoItem.save() of the
'John' person.age '22' person.id 1person = p.save() person.name
The model is already saved when we call
can prevent this by calling it with
commit=False. We can use
DjangoItem.save() method to obtain an unsaved model::
False) person.name 'John' person.age '22' person.id Noneperson = p.save(commit=
As said before, we can add other fields to the item::
import scrapy from scrapy_djangoitem import DjangoItem class PersonItem(DjangoItem): django_model = Person sex = scrapy.Field()
p = PersonItem() p['name'] = 'John' p['age'] = '22' p['sex'] = 'M'
And we can override the fields of the model with your own::
class PersonItem(DjangoItem): django_model = Person name = scrapy.Field(default='No Name')
This is useful to provide properties to the field, like a default or any other
property that your project uses. Those additional fields won't be taken into
account when doing a
DjangoItem is a rather convenient way to integrate Scrapy projects with Django
models, but bear in mind that Django ORM may not scale well if you scrape a lot
of items (ie. millions) with Scrapy. This is because a relational backend is
often not a good choice for a write intensive applications (such as a web
crawler), specially if the database is highly normalized and with many indices.
To use the Django models outside the Django application you need to set up the
DJANGO_SETTINGS_MODULE environment variable and --in most cases-- modify
PYTHONPATH environment variable to be able to import the settings
There are many ways to do this depending on your use case and preferences. Below is detailed one of the simplest ways to do it.
Suppose your Django project is named
mysite, is located in the path
/home/projects/mysite and you have created an app
myapp with the model
Person. That means your directory structure is something like this::
/home/projects/mysite ├── manage.py ├── myapp │ ├── __init__.py │ ├── models.py │ ├── tests.py │ └── views.py └── mysite ├── __init__.py ├── settings.py ├── urls.py └── wsgi.py
Then you need to add
/home/projects/mysite to the
environment variable and set up the environment variable
mysite.settings. That can be done in your
Scrapy's settings file by adding the lines below::
import sys sys.path.append('/home/projects/mysite')
import os os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
Notice that we modify the
sys.path variable instead the
environment variable as we are already within the python runtime. If everything
is right, you should be able to start the
scrapy shell command and import
from myapp.models import Person).
Django 1.8 you also have to explicitly set up
Django if using
it outside a
Django Docs <https://docs.djangoproject.com/en/1.8/intro/tutorial01/#playing-with-the-api>_)::
import django django.setup()
Test suite from the
tests directory can be run using
tox by running::
...using the configuration in
used have to be installed locally on the system.