Skip to content

Python model serverless workflow doesn't accept environment libraries #1009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dustinvannoy-db opened this issue May 2, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@dustinvannoy-db
Copy link

Describe the bug

I am testing Python Models submitted to serverless job compute. I was trying two submission methods: workflow_job and serverless_cluster. I need to add a library dependency but can not get it to work with workflow_job.

Steps To Reproduce

Model code that is attempting to use a library that needs installed from PyPI while running on serverless job compute.

from faker import Faker

def model(dbt, session):
    dbt.config(
        submission_method='workflow_job',
        environment_key="my_env",
        environment_dependencies=["faker==37.0.2"])

    my_sql_model_df = dbt.ref("CustomerIncremental")

    fake = Faker()
    print(fake.name())

    final_df = my_sql_model_df.selectExpr("*").limit(100)

    return final_df

Response:
Runtime Error in model DimCustomer3 (Databricks/models/main/python/DimCustomer3.py)
Python model failed with traceback as:
(Note that the line number here does not match the line number in your code due to dbt templating)
ModuleNotFoundError: No module named 'faker'

Expected behavior

If I use this code with serverless_cluster method it works. It should be the same for workflow_job.

from faker import Faker

def model(dbt, session):
    dbt.config(
        submission_method='serverless_cluster',
        environment_key="my_env",
        environment_dependencies=["faker==37.0.2"])

    my_sql_model_df = dbt.ref("CustomerIncremental")

    fake = Faker()
    print(fake.name())

    final_df = my_sql_model_df.selectExpr("*").limit(100)

    return final_df

Response:
04:18:26 Finished running 1 table model in 0 hours 1 minutes and 55.52 seconds (115.52s).
04:18:26
04:18:26 Completed successfully
04:18:26
04:18:26 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

serverless_cluster working environment:
Image

workflow_job no environment shown:
Image

System information

The output of dbt --version:

Core:
  - installed: 1.9.3
  - latest:    1.9.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.9.7 - Update available!
  - spark:      1.8.0 - Update available!

The operating system you're using:
MacOS Sequoia 15.4

The output of python --version:
Python 3.10.17

Additional context

Add any other context about the problem here.

@dustinvannoy-db dustinvannoy-db added the bug Something isn't working label May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant