dat
databot
pypi i databot
dat

databot

Python Fast Dataflow programming framework for Data pipeline work( Web Crawler,Machine Learning,Quantitative Trading.etc)

by kkyon

0.1.8 (see all)License:BSD
pypi i databot
Readme

=======

Databot

  • Data-driven programming framework
  • Paralleled in coroutines
  • Type- and content-based route function

Installing

Install and update using pip:

pip install -U databot

Documentation

http://databot.readthedocs.io

Discuss:

https://groups.google.com/forum/#!forum/databotpy

What's data-driven programming?

All functions are connected by pipes (queues) and communicate by data.

When data come in, the function will be called and return the result.

Think about the pipeline operation in unix: ls|grep|sed.

Benefits:

#. Decouple data and functionality #. Easy to reuse

Databot provides pipe and route. It makes data-driven programming and powerful data flow processes easier.

Databot is...

  • Simple

Databot is easy to use and maintain, does not need configuration files, and knows about asyncio and how to parallelize computation.

Here's one of the simple applications you can make:

Load the price of Bitoin every 2 seconds. Advantage price aggregator sample can be found here <https://github.com/kkyon/databot/tree/master/examples>.

.. code-block:: python

from databot import Pipe,Timer,BotFrame,HttpLoader

 def main():
    Pipe(


        Timer(delay=2),#send timer data to pipe every 2 sen
        "http://api.coindesk.com/v1/bpi/currentprice.json", #send url to pipe when timer trigger
        HttpLoader(),#read url and load http response
        lambda r:r.json['bpi']['USD']['rate_float'], #read http response and parse as json
        print, #print out

    )

    BotFrame.render('simple_bitcoin_price')
    BotFrame.run()

main()
  • flow graph below is the flow graph generated by databot.

.. image:: https://github.com/kkyon/databot/raw/master/examples/simple_bitcoin_price.png :align: center :width: 400 :alt: simple_bitcoin_price

  • Fast Nodes will be run in parallel, and they will perform well when processing stream data.
  • Visualization

With render function: BotFrame.render('bitcoin_arbitrage') databot will render the data flow network into a graphviz image. https://github.com/kkyon/databot/blob/master/docs/bitcoin_arbitrage.png

  • Replay-able

With replay mode enabled: config.replay_mode=True when an exception is raised at step N, you don't need to run from setup 1 to N. Databot will replay the data from nearest completed node, usually step N-1. It will save a lot of time in the development phase.

More about Databot and data-driven programming

Data-driven programming is a programming paradigm which describes the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of data-driven languages are the text-processing languages sed and AWK, where the data is a sequence of lines in an input stream. Data-driven programming is typically applied to streams of structured data for filtering, transforming, aggregating (such as computing statistics), or calling other programs.

Databot has a few basic concepts to implement DDP.

  • Pipe It is the main stream process of the program. All units will work inside.
  • Node It is the process logic node. It is driven by data. Custom functions work as Nodes. There are some built-in nodes:
    • Loop: Works as a for loop
    • Timer: It will send a message in the pipe by timer param. delay, max_time
    • HttpLoader: Get a url and return the HTTP response
    • MySQL query or insert: For mysql querying and insert
    • File read/write: for file I/O.
  • Route It will be used to create a complex data flow network, not just one main process. Databot can nest Routes inside Routes. It is a powerful concept. There are some pre built-in Route:
    • Branch : Duplicate data from parent pipe to a branch.
    • Return : Duplicate data from parent pipe, and return final result to parent pipe.
    • Filter : Drop data from pipe if it does not match some condition
    • Fork : Duplicate data to many branches.
    • Join : Duplicate data to many branches, and return result to pipe.

All units (Pipe, Node, Route) communicate via queues and perform parallel computation in coroutines. This is abstracted so that Databot can be used with only limited knowledge of asyncio.

Below some graphs will get you some basic concept for the Route: branch:https://github.com/kkyon/databot/blob/master/docs/databot_branch.jpg fork:https://github.com/kkyon/databot/blob/master/docs/databot_fork.jpg join:https://github.com/kkyon/databot/blob/master/docs/databot_join.jpg return:https://github.com/kkyon/databot/blob/master/docs/databot_return.jpg

Contributing

GitHub Stars

1.2K

LAST COMMIT

3yrs ago

MAINTAINERS

1

CONTRIBUTORS

9

OPEN ISSUES

2

OPEN PRs

1
VersionTagPublished
0.1.8
4yrs ago
0.1.7
4yrs ago
0.1.6
4yrs ago
0.1.5
4yrs ago
No alternatives found
No tutorials found
Add a tutorial