Questions tagged [mlops]

21 questions
9
votes
2 answers

MLOps for beginner

I am 1 year old in ML and have been using jupyter notebook to build static models all these days, do some analysis and present my results to the bosses as it was all POC. Now, we would like to scale the solution to become automatic and be able to…
8
votes
1 answer

MLflow real world experience

Can someone provide a summary of the real world deployment experience of MLflow? We have a few ML models (e.g., LightGBM, tensorflow v2, etc.) and want to avoid framework like SageMaker (due to customer requirement). So we are looking into various…
David293836
  • 217
  • 1
  • 2
  • 6
8
votes
1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
7
votes
1 answer

What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?

As a beginner in MLOps, I was overwhelmed by some confusing definitions. As far as I understand, when we have a classifier or regressor with y = f(X) function: Covariate Shift is changing the distribution of independent variables (X), Label Shift…
4
votes
2 answers

Meaningfully compare target vs observed TPR & FPR

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$…
3
votes
1 answer

Which ML models can be saved using HDF5?

I've read that HDF5 format can be used to save machine learning models. However, when using a trained CNNClassifier instance from sktime: import h5py from sktime.classification.deep_learning.cnn import CNNClassifier cnn = CNNClassifier(n_epochs=100,…
2
votes
1 answer

Data preprocessing framework/library alternatives

I am currently working on some python machine learning projects that are soon to be deployed to production. As such, in our team we are interested in doing this the most "correct" way, following MLOps principles. Specifically, I am currently…
1
vote
3 answers

Prompt Ops Alternatives

What are the main alternatives for prompt ops nowadays? By prompt ops, I mean a comprehensive solution for tracking prompt engineering experiments and also registering prompts in different stages, similar to how I would with an ML model in a model…
1
vote
0 answers

Integrating MLFlow and SageMaker for a More Robust ML Model Deployment Pipeline

I'm seeking advice on enhancing the deployment pipeline of a machine learning model that's accessed via a FastApi in production. My goal is to replace the existing setup with a more robust and efficient system that includes built-in model…
1
vote
1 answer

Should I apply the same data transformations in production for my classification model's inference steps

I am now moving my best classification model to production and doing tests currently. Should I use the same scaler() I used in training during my inference in production? Also, what should I do if I used SMOTE during training? Should I also apply…
1
vote
1 answer

Training a CNN in production on new data

How should I approach training a convolutional neural network in production on new data when I detect model performance degradation due to data or concept drift? Resources like this one and this one lead me to conclude that I need to fine tune the…
1
vote
1 answer

Sustain learning separately - continuous learning

This question is to seek suggestions on how to architect the continuous learning approach in distributed manner. Let me explain the situation: In my classification problem, I have classes which can grow in large number over a period of time, as…
Sandeep Bhutani
  • 914
  • 1
  • 7
  • 26
1
vote
0 answers

How is model scheduling set up in practice?

I have been working on various machine learning models so far, but never yet on the deployment phase of an ML project. I have vaguely used Apache Airflow and I'm aware that it is a tool for scheduling DAGs, but I never set up such a scheduling on…
lazarea
  • 299
  • 1
  • 15
1
vote
2 answers

Automate Clustering predictions and RFM metrics

We did a POC for customer segmentation and followed the below approach a) extract data from source system (SAP business objects) b) Use python jupyter notebook to manipulate, merge and group data (multiple csv files) c) We cluster based on some…
The Great
  • 2,725
  • 3
  • 23
  • 49
0
votes
0 answers

Is my idea of a Feature Store wrong?

Cross-posted on Reddit ML. Should a Feature Store be part of an enterprise data catalog? To me, a feature store seems to be a highly niche data catalog but missing a lot of the benefits of having an enterprise data catalog / data discovery tool. My…
1
2