-
Notifications
You must be signed in to change notification settings - Fork 445
feat: add datafusion scalar UDF examples #9841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
*prior_columns, | ||
bounded_image_extraction( | ||
col(frame_column), | ||
col(f"{boxes_2d_path}:Position2D"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very related to: #9837
I don't think we want to depend on hard-coded column names here.
Would be great if we could build a helper with a signature like:
resolve_component_selector(df.schema(), boxes_2d_path, rr.archetypes.Boxes2D.centers)
with some magic that uses rr.archetypes.Boxes2D.centers
to resolve the full triple (archetype, component, field), and then scans the schema to find the column that's the best match
857bace
to
4528429
Compare
Web viewer failed to build.
Note: This comment is updated whenever you push a commit. |
… a rrd that already exists instead of the manually extracted one.
This PR must remain in draft until the upstream datafusion version 48 is released
Overview
This PR adds in two DataFusion Scalar UDFs as the first building blocks of our analysis toolkit.
BoundedImageExtractionUdf
will take a video blob, frame ID, and 2d bounding boxes and extract the images within the bounding boxDepthImageToPointCloudUdf
will take a depth image and pinhole and extract a point cloudAlso provided are the wrapper functions that can be used to take a DataFusion DataFrame and pass user friendly entity paths to perform the required UDF manipulations.
There are two example jupyter notebooks demonstrating this using data from our standard examples.
Running the examples
First you will need to download the associated RRD for the demo files indicated in each notebook.
This requires a version of
datafusion-python
that is not yet released. You can pull from this branch and you will need to perform a local build usingAfter setting up the
uv
environment per the readme in the repo, build:The in your
rerun
repo you will needYour exact .whl file will likely be different.
With that installed, you should be able to run the example jupyter notebooks.