gSpan is an algorithm for mining frequent subgraphs.
This program supports undirected graphs, and produces same results with gboost on the dataset graphdata/graph.data.
So far(date: 2016-10-29), gboost does not support directed graphs. This program implements gSpan for directed graphs. More specific, this program can mine frequent directed subgraph that has at least one node that can reach other nodes in the subgraph. But correctness is not guaranteed since the author did not do enough testing. After running several times on datasets graphdata/graph.data.directed.1 and graph.data.simple.5, there is no fault.
This program supports both Python 2 and Python 3.
Install this project using pip:
pip install gspan-mining
First, clone the project:
git clone https://github.com/betterenvi/gSpan.git cd gSpan
You can optionally install this project as a third-party library so that you can run it under any path.
python setup.py install
The command is:
python -m gspan_mining [-s min_support] [-n num_graph] [-l min_num_vertices] [-u max_num_vertices] [-d True/False] [-v True/False] [-p True/False] [-w True/False] [-h] database_file_name
python -m gspan_mining -s 5000 ./graphdata/graph.data
python -m gspan_mining -s 5000 -p True ./graphdata/graph.data
python -m gspan_mining -s 5000 -d True ./graphdata/graph.data
python -m gspan_mining -h
|Min support||Number of frequent subgraphs||Time|
|1000||455||3 m 49 s|
|600||1235||7 m 29 s|
|400||2710||12 m 53 s|
gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02).
One C++ implementation of gSpan.