MLeap usage in streaming scenario (perf issue)

Hi,

We are using MLeap to perform model inference within Apache Accumulo. Since the Accumulo iterator framework exposes a streaming API (e.g. process row by row vs batch) we'd like to re-use as much of the objects required by MLeap. 

We managed to create single "infinite" input dataframe and then produce a single result data frame from which we pull the data iteratively. It works, but unnfortunately this results in a memory leak. I was looking through the stack, but wasn't able to figure out at which point things are kept in memory.

The code works, but isn't performing that well as we have to call transformer.transform(this.mleapDataFrame) for every single row.

Integration code can be found here: https://github.com/microsoft/masc/blob/master/connector/iterator/src/main/java/com/microsoft/accumulo/spark/processors/AvroRowMLeap.java#L337

Any advise appreciated. 

Markus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLeap usage in streaming scenario (perf issue) #633

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MLeap usage in streaming scenario (perf issue) #633

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions