22

Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!

Edamame
  • 2,785
  • 5
  • 25
  • 34

7 Answers7

8

I am actively maintaining an efficient implementation of both PrefixSpan and BIDE in Python 3, supporting mining both frequent and top-k (closed) sequential patterns.

Zephyr
  • 997
  • 4
  • 11
  • 20
Chuancong Gao
  • 181
  • 1
  • 3
5

The only Python package I've found is on Github.

They have an implementation of BIDE there, but it's not maintained code.

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
yossico
  • 207
  • 1
  • 3
  • 7
2

SPMF sounds like a useful library for pattern mining.

Zephyr
  • 997
  • 4
  • 11
  • 20
2

Since none of the existing solutions were satisfactory for me, I created my own Python Wrapper for SPMF (the Java library mentioned in other answers here).

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
2

To complement some of the great answers/libraries:

Seq2Pat: Sequence-to-Pattern Generation Library might be relevant to your case.

The library is written in Cython to take advantage of a fast C++ backend with a high-level Python interface. It supports constraint-based frequent sequential pattern mining.

Here is an example that shows how to mine a sequence database while respecting an average constraint for the prices of the patterns found.

# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute

Seq2Pat over 3 sequences

seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"], ["C", "B", "A"], ["C", "A", "C", "D"]])

Price attribute corresponding to each item

price = Attribute(values=[[5, 5, 3, 8, 2], [1, 3, 3], [4, 5, 2, 1]])

Average price constraint

seq2pat.add_constraint(3 <= price.average() <= 4)

Patterns that occur at least twice (A-D)

patterns = seq2pat.get_patterns(min_frequency=2)

Notice that sequences can be of different lengths, and you can add/drop other Attributes and Constraints. The sequences can be any string, as in the example, or integers.

The underlying algorithm uses Multi-valued Decision Diagrams, and in particular, the state-of-the-art algorithm from AAAI 20019.

Hope this helps!

Disclaimer: I am a member of the research collaboration between Fidelity & CMU on the Seq2Pat Library.

skadio
  • 92
  • 2
1

Have you considered to write it by yourself? Because there is probably no up-to-date maintained library right now.

Check this out, its the basic - PrefixSpan and Closed/Maximal patterns are actually not that hard to implement.

Zephyr
  • 997
  • 4
  • 11
  • 20
HonzaB
  • 1,699
  • 1
  • 14
  • 20
0

I've used fim's fpgrowth function in the past and it worked well. It's kind of a pain to install on Windows machines however. It seems to be an academic website so I'm not sure if they're doing many updates to the code over time...

Jed
  • 129
  • 5