Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!
7 Answers
I am actively maintaining an efficient implementation of both PrefixSpan and BIDE in Python 3, supporting mining both frequent and top-k (closed) sequential patterns.
- 997
- 4
- 11
- 20
- 181
- 1
- 3
The only Python package I've found is on Github.
They have an implementation of BIDE there, but it's not maintained code.
- 1,831
- 11
- 23
- 34
- 207
- 1
- 3
- 7
Since none of the existing solutions were satisfactory for me, I created my own Python Wrapper for SPMF (the Java library mentioned in other answers here).
- 1,831
- 11
- 23
- 34
- 131
- 3
To complement some of the great answers/libraries:
Seq2Pat: Sequence-to-Pattern Generation Library might be relevant to your case.
The library is written in Cython to take advantage of a fast C++ backend with a high-level Python interface. It supports constraint-based frequent sequential pattern mining.
Here is an example that shows how to mine a sequence database while respecting an average constraint for the prices of the patterns found.
# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute
Seq2Pat over 3 sequences
seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"],
["C", "B", "A"],
["C", "A", "C", "D"]])
Price attribute corresponding to each item
price = Attribute(values=[[5, 5, 3, 8, 2],
[1, 3, 3],
[4, 5, 2, 1]])
Average price constraint
seq2pat.add_constraint(3 <= price.average() <= 4)
Patterns that occur at least twice (A-D)
patterns = seq2pat.get_patterns(min_frequency=2)
Notice that sequences can be of different lengths, and you can add/drop other Attributes and Constraints. The sequences can be any string, as in the example, or integers.
The underlying algorithm uses Multi-valued Decision Diagrams, and in particular, the state-of-the-art algorithm from AAAI 20019.
Hope this helps!
Disclaimer: I am a member of the research collaboration between Fidelity & CMU on the Seq2Pat Library.
- 92
- 2