How can I link tasks using machine learning / ai based on historical task sequences?

Question

I'm working on an AI model to predict dependency links between tasks for industrial plannifications, based on historical project data. I have two tables: Task Table (15 sheets, one sheet = one planning)

ID activity	Name of activity	Equipment Type	Start Date	End Date
ZZ0001/001	TRAVAUX A COORDONNER	COLONNE	04/01/2011 08:00	04/01/2011 08:00
ZZ0001/002	POSE ECHAFAUDAGE EXTERNE	COLONNE	04/06/2012 08:00	10/08/2012 17:00
ZZ0001/003	DECALORIFUGEAGE PARTIEL	COLONNE	10/09/2012 08:00	10/09/2012 17:00

Dependencies (15 sheets, one sheet = one planning)

ID task	ID successor	Link Type
ZZ0001/002	ZZ0001/003	FS
ZZ0001/002	ZZ0001/006	FS
ZZ0001/003	ZZ0001/006	SS

Each sheet has 300 to 17k tasks. ID is unique, dataset is unbalanced (some equipment type appears 100x times more than some other)

Goal: Given a new list of tasks (typically filtered by EquipmentType), I want the model to suggest likely dependencies between them (and eventually, the LinkType) — learned from historical patterns in the existing data.

What I’ve tried:

Decision Trees

Basic Neural Networks (MLP + BERT/GNN)

Schematic Code

Data preparation :

Load Excel
Encode names with BERT
Encode type with OneHotEncoder
Combine: [BERT | OneHot] → torch.tensor(feature vector)
Build graph G: each node = task with feature, no edges at inference time

Training SEAL model : For each planning in training:

Extract real edges (u → v)
Generate negative pairs (same type, no link)
Build subgraphs 2-hop around each pair
Apply DRNL labeling
Store PyG Data(x, edge_index, drnl, label)
Train GNN: class SEALGNN(nn.Module): GINConv(input = [feat + drnl]) GlobalPool → MLP → Sigmoid

Problems encountered:

Random or irrelevant links

Models predicting dependencies between all tasks

Lack of logical flow learned from historical data

I'm pretty sure i am not pre-processing the data correctly as i'm not sur how to treat the tasks name for it to recognize the "pattern"

My Question: Would it make sense to frame this as a graph problem and use Graph Neural Networks (GNNs)? Or is there a better ML or statistical approach for modeling and predicting dependencies between tasks in this kind of scenario?

I'm open to advice on model architecture or data pre-processing strategies that might improve performance. Note that i work on google colab pro and have access to gpu a100 as well as tpu

How can I link tasks using machine learning / ai based on historical task sequences?

0 Answers0