Speed up Keras Model Prediction Load Times

Question

I am trying to create a prediction API using keras which loads the model predicts and closes the model. But initializing time in python is about 3-5 secs so each request takes around 5 secs to return the prediction irrespective of number inputs rows(predictions)

Is there any way to keep the model loaded and then stream the input data to get prediction. Like a Pre loaded model either through a socket or a port.

Similar to open office document converter

\program\soffice.exe -accept="socket,host=127.0.0.1,port=8100;urp;" -headless -nofirststartwizard -nologo

Keras Prediction Code

#!/usr/bin/env python3.6
import sys
import pandas as pd
from keras.models import load_model
model = load_model('model.h5')
X = pd.read_csv(sys.argv[1]).values
prediction = model.predict(X)
pd.DataFrame(prediction).to_json(sys.argv[2])

Script is called as

python3.6 predict.py input_scaled.csv output_scaled.json

The prediction time are as follows

#row    time
1       4.76 secs
10      4.49 secs
50      5.37 secs
5000    5.46 secs
50000   12.7 secs

score 1 · Answer 1 · answered Jan 06 '21 at 10:38

I was able to get to work like this without flask or django. Just using default http.server in python

from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
import sys
import pandas as pd
from keras.models import load_model
from urllib.parse import urlparse
model = load_model('model.h5')
class S(BaseHTTPRequestHandler):
    def _set_response(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()
def do_GET(self):
    query = urlparse(self.path).query
    params = dict(qc.split(&quot;=&quot;) for qc in query.split(&quot;&amp;&quot;))
    X = pd.read_csv(params[&quot;input&quot;]).values
    prediction = model.predict(X)
    pd.DataFrame(prediction).to_json(params[&quot;output&quot;])
    self._set_response()
    self.wfile.write(&quot;Processed&quot;.encode('utf-8'))


def run(server_class=HTTPServer, handler_class=S, port=8080):
    logging.basicConfig(level=logging.INFO)
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    logging.info('Starting httpd...\n')
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()
    logging.info('Stopping httpd...\n')
if name == 'main':
    from sys import argv
if len(argv) == 2:
    run(port=int(argv[1]))
else:
    run()

Trigger Server using

python3.6 predict_server.py 8000

API like

http://ip/localhost:8000/?input=predict_scaled.csv&output=prediction.json

Djib2011 · Answer 2 · 2021-01-06T18:43:47.450

The easiest way I can think of is to create a flask app that will load the model once and have and endpoint where you can send your data as requests to your already loaded model.

A rough skeleton of the service would look like this:

from flask import Flask, request
app = Flask(name)
@app.route('/')
def index():
    return ''
@app.route('/predict/', methods=['GET', 'POST'])
def predict():
    X = pd.read_csv(request.get_data()).values
    prediction = model.predict(X)
    return pd.DataFrame(prediction).to_json()
if name == "main":
    model = load_model('model.h5')
    app.run()

Then you could make an HTTP request to localhost:5000/predict, via another script and it would return your predictions, which then you could save or do whatever you want.

Speed up Keras Model Prediction Load Times

2 Answers2