Skip to content

MLeap usage in streaming scenario (perf issue) #633

Closed
@eisber

Description

@eisber

Hi,

We are using MLeap to perform model inference within Apache Accumulo. Since the Accumulo iterator framework exposes a streaming API (e.g. process row by row vs batch) we'd like to re-use as much of the objects required by MLeap.

We managed to create single "infinite" input dataframe and then produce a single result data frame from which we pull the data iteratively. It works, but unnfortunately this results in a memory leak. I was looking through the stack, but wasn't able to figure out at which point things are kept in memory.

The code works, but isn't performing that well as we have to call transformer.transform(this.mleapDataFrame) for every single row.

Integration code can be found here: https://github.com/microsoft/masc/blob/master/connector/iterator/src/main/java/com/microsoft/accumulo/spark/processors/AvroRowMLeap.java#L337

Any advise appreciated.

Markus

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions