pypi i pyCGA


An Open Computational Genomics Analysis platform for big data processing and analysis in genomics

by opencb

1.3.0 (see all)License:Apache Software License
pypi i pyCGA

.. contents::


  • This Python package makes use of the exhaustive RESTful Web service API that has been implemented for the OpenCGA_ database.

  • It provides easy access to OpenCGA, an open-source project that aims to provide a Big Data storage engine and analysis framework for genomic scale data analysis of hundreds of terabytes or even petabytes.

  • More info about this project in the OpenCGA Wiki_



PyCGA can be cloned in your local machine by executing in your terminal::

   $ git clone

Once you have downloaded the project you can install the library::

   $ cd opencga/tree/develop/opencga-client/src/main/python
   $ python install


Getting started

The first step is to set up the OpenCGA server configuration:

.. code-block:: python

>>> configuration = {
        "version": "v1",
        "rest": {
            "hosts": [""]

The configuration can be stored in a JSON or YML file as well:

.. code-block:: python

>>> configuration = '/path/to/config/opencga_configuration.json'

The second step is to import the module and initialize the OpenCGAClient. Configuration, user and password must be specified:

.. code-block:: python

>>> from pyCGA.opencgarestclients import OpenCGAClient
>>> oc = OpenCGAClient(configuration=configuration, user='user_example', pwd='pass_example')

If user and password are not desired to be written down in a script, session id can be used instead:

.. code-block:: python

>>> from pyCGA.opencgarestclients import OpenCGAClient
>>> oc = OpenCGAClient(configuration=configuration, user='user_example', pwd='pass_example')  # Remove after getting session id
>>> print oc.session_id  # Remove after getting session id
>>> oc = OpenCGAClient(configuration=configuration, session_id='I4MG3fXJIZARl1LhwZ')

The next step is to create the specific client for the data we want to query:

.. code-block:: python

samples = oc.samples() # Query for samples files = oc.files() # Query for files cohorts = oc.cohorts() # Query for cohorts

Now you can start asking to the OpenCGA RESTful service by providing a query ID:

.. code-block:: python

sample_search ='study1', name='sample1').get() print sample_search "[{'acl': [{'member': '@gel', u'permissions': ['VIEW', 'VIEW_ANNOTATIONS']}..."

Responses are retrieved as JSON formatted data. Therefore, fields can be queried by key:

.. code-block:: python

>>> creation_date ='study1', name='sample1').get()[0]['creationDate']

First levels in the JSON output can be accessed as attributes:

.. code-block:: python

>>> creation_date ='study1', name='sample1').get().creationDate

>>> annotation ='study1', name='cohort1').get().annotationSets
>>> print annotation[0]['annotations'][0]['value']['sex']

Regex are allowed in some fields. This is specially useful when searching by name:

.. code-block:: python

>>> cohort_name =, name='~LP3000506-DNA_J01').get().name
>>> print cohort_name

Data can be accessed specifying comma-separated IDs or a list of IDs:

.. code-block:: python

>>> creation_date ='study1', name='sample1').get()[0]['creationDate']

>>> creation_date ='study1', name='sample1').get()[1]['creationDate']

>>> creation_date ='study1', name='sample1,sample2').get().creationDate
["20170204122738", "20170204123049"]

Optional filters and extra options can be added as key-value parameters (value can be a comma-separated string or a list):

.. code-block:: python

>>> # e.g. "exclude" parameter
>>> attributes ='study1', name='~sample', bioformat='VARIANT', status='READY', exclude='attributes').get().attributes
>>> print attributes
[{}, {}, {}, {}, {}, {}, {}, {}]

>>> # e.g. "limit" parameter
>>> files ='study1', name='~sample', bioformat='VARIANT', status='READY', limit=1).get()
>>> print len(files)

Special mention for "analysis_variant" endpoint, which returns an iterator:

.. code-block:: python

>>> variant_iterator = oc.analysis_variant.query(pag_size=100, data={'studies': 'study1', 'gene': 'BRCA2'}, limit=1)
>>> for variant in var_iterator:
>>>     print v.get().type

What can I ask for?

The best way to know which data can be retrieved for each client is either checking out the `RESTful web services`_ section of the OpenCGA Wiki or the `OpenCGA web services`_

.. _OpenCGA:
.. _OpenCGA Wiki:
.. _RESTful web services:
.. _OpenCGA web services:

GitHub Stars



5mos ago








4yrs ago
No alternatives found
No tutorials found
Add a tutorial