Update W&B exporter #21

SiddhantSadangi · 2025-06-13T15:44:31Z

Description

Added console logs and file support.
Updated namespace of hardware metrics to runtime from system.
Updated script to use only CLI arguments.
Replaced . in system metric names with / to match the Neptune metric namespace format.

Any expected test failures?

Add a [X] to relevant checklist items

❔ This change

adds a new feature
fixes breaking code
is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

Refactored code (sourcery)
Tested code locally
Precommit installed and run before pushing changes
Added code to GitHub tests (notebooks, scripts)
Updated GitHub README
Updated the projects overview page on Notion

🧪 Test Configuration

OS:
Python version:
Neptune version:
Affected libraries with version:

Summary by Sourcery

Enhance the W&B to Neptune migration script by adding file and console log support, converting interactive prompts to CLI arguments, refining metric namespaces, and improving logging, error handling, and configuration options.

New Features:

Add support for copying console output, source code, requirements, checkpoints, and other files to Neptune
Fully migrate script inputs to CLI arguments including entity, workspace, projects, worker count, log level, and timeout
Introduce a timeout mechanism for run and project migrations

Enhancements:

Switch hardware metric namespace from system to runtime and replace dots with slashes in metric names
Default --num-workers to 1 and expose --log-level for configurable logging verbosity
Insert detailed debug and info logs throughout the migration process and upgrade log formatting
Aggregate failures for runs and projects and report them at the end

Documentation:

Add a changelog section, update usage instructions to CLI-based workflow, and expand documentation of file and metric mappings in README

…xporter

… handling. Updated README for clarity on usage and troubleshooting steps.

sourcery-ai · 2025-06-13T15:44:41Z

Reviewer's Guide

This PR refactors the W&B-to-Neptune migration script by replacing interactive prompts with a full argparse-driven CLI, enhancing logging configuration and error handling (including timeouts and aggregated failure reporting), adding support for uploading console logs and various run-related files to Neptune, migrating metric namespaces and separators to the new runtime/ format, and refreshing documentation to reflect these updates.

Sequence diagram for file upload and logging during run migration

sequenceDiagram
    participant Script
    participant W&B_Run as W&B Run
    participant Neptune_Run as Neptune Run
    participant FileSystem as File System

    Script->>W&B_Run: List files()
    loop For each file
        W&B_Run-->>Script: File metadata
        Script->>FileSystem: Download file
        FileSystem-->>Script: File path
        alt output.log
            Script->>Neptune_Run: log_string_series(runtime/stdout)
        else code/*
            Script->>Neptune_Run: assign_files(source_code/files/<filename>)
        else requirements.txt
            Script->>Neptune_Run: assign_files(source_code/requirements.txt)
        else checkpoint/ckpt
            Script->>Neptune_Run: assign_files(checkpoints/<filename>)
        else other
            Script->>Neptune_Run: assign_files(files/<filename>)
        end
    end

Class diagram for updated file handling and logging in migration script

classDiagram
    class Run {
        +log_configs(dict)
        +add_tags(list, group_tags)
        +log_string_series(dict, step)
        +assign_files(dict)
        +log_metrics(dict, step, timestamp)
    }
    class wandb_run {
        +id
        +name
        +project
        +notes
        +_attrs
        +summary
        +scan_history()
        +history(stream, pandas)
        +files()
        +tags
        +group
    }
    class File {
        +source
        +mime_type
    }
    class MigrationScript {
        +copy_run(wandb_run, wandb_project_name)
        +copy_config(neptune_run, wandb_run)
        +copy_summary(neptune_run, wandb_run)
        +copy_metrics(neptune_run, wandb_run)
        +copy_system_metrics(neptune_run, wandb_run)
        +copy_console_output(neptune_run, download_path)
        +copy_source_code(neptune_run, download_path, filename)
        +copy_requirements(neptune_run, download_path)
        +copy_other_files(neptune_run, download_path, filename, namespace)
        +copy_files(neptune_run, wandb_run, download_folder)
    }
    MigrationScript --> Run
    MigrationScript --> wandb_run
    MigrationScript --> File

Class diagram for CLI argument parsing and logging setup

classDiagram
    class argparse.ArgumentParser {
        +add_argument(...)
        +parse_args()
    }
    class Logger {
        +debug(msg)
        +info(msg)
        +error(msg)
        +setLevel(level)
    }
    class MigrationScript {
        +setup_logging(log_level)
    }
    MigrationScript --> Logger
    MigrationScript --> argparse.ArgumentParser

File-Level Changes

Change	Details	Files
Swapped interactive prompts for a comprehensive argparse-based CLI	Removed manual input functions Introduced argparse arguments for entity, workspace, projects, worker count, log level, and timeout Validated required arguments and raised errors if missing	`utils/migration_tools/from_wandb/wandb_to_neptune.py`
Enhanced logging configuration and suppression	Replaced hard-coded INFO level with dynamic log_level parameter Improved log message formatting with function names and line numbers Silenced noisy third-party loggers in a loop	`utils/migration_tools/from_wandb/wandb_to_neptune.py`
Migrated metric namespace and naming conventions	Changed hardware/system metrics namespace from `system` to `runtime` Replaced `.` in metric names with `/` to match Neptune conventions	`utils/migration_tools/from_wandb/wandb_to_neptune.py` `utils/migration_tools/from_wandb/README.md`
Added support for uploading console logs and files	Introduced EXCLUDED_PATHS and File type from neptune_scale.types Implemented functions to copy console output, source code, requirements, checkpoints, and other files Integrated new file-upload functions into the parallel copy flow	`utils/migration_tools/from_wandb/wandb_to_neptune.py`
Improved concurrency control and error handling	Added timeout parameter to ThreadPoolExecutor futures Collected and reported FAILED_RUNS and FAILED_PROJECTS lists Logged and printed aggregated failures with non-zero exit code on project errors	`utils/migration_tools/from_wandb/wandb_to_neptune.py`
Refreshed documentation with changelog and usage details	Expanded README with versioned changelog Documented all CLI arguments with examples Updated metadata mapping to include new runtime fields and file types	`utils/migration_tools/from_wandb/README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

…ls on default `num-workers` setting for the W&B to Neptune migration script.

…le-examples into ss/wandb_exporter

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes and found some issues that need to be addressed.

Avoid relying on a global tmpdirname inside copy_files—pass the download directory explicitly to keep the function pure and testable.
Using threadsafe_change_directory in a multithreaded context can lead to unpredictable cwd changes across threads—prefer uploading files by absolute path instead.
Consider cleaning up the temporary directory after completion (or using a context manager) and exiting with a non-zero code on failures to make automation more robust.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Avoid relying on a global `tmpdirname` inside `copy_files`—pass the download directory explicitly to keep the function pure and testable.
- Using `threadsafe_change_directory` in a multithreaded context can lead to unpredictable cwd changes across threads—prefer uploading files by absolute path instead.
- Consider cleaning up the temporary directory after completion (or using a context manager) and exiting with a non-zero code on failures to make automation more robust.

## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:268` </location>
<code_context>
-    #         filename.replace("code/", ""), wait=True
-    #     )
-    raise NotImplementedError("Source code logging is not implemented in Neptune yet")
+    with threadsafe_change_directory(os.path.join(download_path.replace(filename, ""), "code")):
+        filename = filename.replace("code/", "")
+        neptune_run.assign_files(
</code_context>

<issue_to_address>
Use `os.path.dirname` instead of `replace` for directories

Using string replacement on paths can lead to errors if the filename appears elsewhere in the path. `os.path.dirname(download_path)` is a safer and more reliable way to get the parent directory.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    with threadsafe_change_directory(os.path.join(download_path.replace(filename, ""), "code")):
        filename = filename.replace("code/", "")
        neptune_run.assign_files(
            {f"source_code/files/{filename}": File(source=filename, mime_type="text/plain")}
        )
=======
    with threadsafe_change_directory(os.path.join(os.path.dirname(download_path), "code")):
        filename = filename.replace("code/", "")
        neptune_run.assign_files(
            {f"source_code/files/{filename}": File(source=filename, mime_type="text/plain")}
        )
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:282` </location>
<code_context>
-    # with threadsafe_change_directory(download_path.replace(filename, "")):
-    #     neptune_run[namespace].upload_files(filename, wait=True)
-    raise NotImplementedError("File logging is not implemented in Neptune yet")
+    with threadsafe_change_directory(download_path.replace(filename, "")):
+        neptune_run.assign_files({f"{namespace}/{filename}": File(source=filename)})

</code_context>

<issue_to_address>
Use `os.path.dirname` to determine the directory

`os.path.dirname(download_path)` is more reliable than using `replace(filename, "")` for extracting the parent directory.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
-    with threadsafe_change_directory(download_path.replace(filename, "")):
-        neptune_run.assign_files({f"{namespace}/{filename}": File(source=filename)})
=======
+    with threadsafe_change_directory(os.path.dirname(download_path)):
+        neptune_run.assign_files({f"{namespace}/{filename}": File(source=filename)})
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:395` </location>
<code_context>
-
-    logger, log_filename = setup_logging()
+    wandb_entity = args.wandb_entity
+    neptune_workspace = args.neptune_workspace
+    num_workers = args.num_workers
+    timeout = args.timeout
</code_context>

<issue_to_address>
Validate that Neptune workspace is provided

Add a check to ensure `neptune_workspace` is set, and raise an error if not, to prevent downstream failures.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    wandb_entity = args.wandb_entity
    neptune_workspace = args.neptune_workspace
    num_workers = args.num_workers
    timeout = args.timeout

    logger, log_filename = setup_logging(args.log_level)
=======
    wandb_entity = args.wandb_entity
    neptune_workspace = args.neptune_workspace
    num_workers = args.num_workers
    timeout = args.timeout

    if not neptune_workspace:
        raise ValueError("Neptune workspace must be provided via --neptune-workspace to avoid downstream failures.")

    logger, log_filename = setup_logging(args.log_level)
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

utils/migration_tools/from_wandb/README.md

…ved logging and error handling in the migration script to ensure smoother operation and clearer feedback during execution.

SiddhantSadangi · 2025-06-16T10:11:43Z

@sourcery-ai review

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:148` </location>
<code_context>
+                        neptune_run.log_configs({f"wandb/{attr}": getattr(wandb_run, attr)})
+                except TypeError:
+                    pass
+                except Exception as e:
+                    logger.error(
+                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}' due to exception:\n{e}"
</code_context>

<issue_to_address>
Use `logger.exception` for full traceback

`logger.exception` will log the stack trace, making it easier to debug unexpected errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
                except Exception as e:
                    logger.error(
                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}' due to exception:\n{e}"
                    )
=======
                except Exception as e:
                    logger.exception(
                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}' due to exception:\n{e}"
                    )
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes - here's some feedback:

Rather than scattering module‐level globals like neptune_workspace, timeout, and num_workers, consider bundling them into a single config or argument object and passing it explicitly to functions for better clarity and testability.
Instead of creating a timestamped directory in the current working directory, you could use Python’s tempfile.TemporaryDirectory to avoid clutter and ensure automatic cleanup.
Accessing the protected attribute neptune_run._project to build the run URL is brittle—switch to a public API or cache the project name separately to prevent breakage if internals change.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Rather than scattering module‐level globals like `neptune_workspace`, `timeout`, and `num_workers`, consider bundling them into a single config or argument object and passing it explicitly to functions for better clarity and testability.
- Instead of creating a timestamped directory in the current working directory, you could use Python’s tempfile.TemporaryDirectory to avoid clutter and ensure automatic cleanup.
- Accessing the protected attribute `neptune_run._project` to build the run URL is brittle—switch to a public API or cache the project name separately to prevent breakage if internals change.

## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:347` </location>
<code_context>
             try:
-                future.result()
+                future.result(timeout=timeout)
+            except TimeoutError:
+                logger.error(f"Timeout copying run {wandb_run.project}/{wandb_run.name}")
+                FAILED_RUNS.append((wandb_run.project, wandb_run.name, "Timeout"))
</code_context>

<issue_to_address>
Catch the correct TimeoutError for futures

Use concurrent.futures.TimeoutError instead of the built-in TimeoutError to correctly catch timeouts from future.result(timeout=...).
</issue_to_address>

### Comment 2
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:254` </location>
<code_context>
-    #     for line in f:
-    #         neptune_run["monitoring/stdout"].append(line)
-    raise NotImplementedError("Console logging is not implemented in Neptune yet")
+    with open(download_path) as f:
+        for step, line in enumerate(f):
+            neptune_run.log_string_series({"runtime/stdout": line}, step=step)
</code_context>

<issue_to_address>
Explicitly declare encoding when opening files

Not specifying an encoding may cause errors with non-UTF8 files. Use open(download_path, 'r', encoding='utf-8', errors='ignore') or another suitable encoding.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    with open(download_path) as f:
        for step, line in enumerate(f):
            neptune_run.log_string_series({"runtime/stdout": line}, step=step)
=======
    with open(download_path, 'r', encoding='utf-8', errors='ignore') as f:
        for step, line in enumerate(f):
            neptune_run.log_string_series({"runtime/stdout": line}, step=step)
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `utils/migration_tools/from_wandb/README.md:58` </location>
<code_context>
+
+### Note
+- If they don't exist yet, Neptune projects corresponding to the W&B projects will be created as [*private*][docs-project-access] with project description set to *Exported from <W&B project URL>*. To change the project visibility or description, use the web app or update the `create_project()` function in `copy_project()`.  
+- A `tmp_<timestamp>` directory is created in the current working directory to store the files and logs. Ensure that the current working directory is writable, and that there is enough disk space. You can delete this directory after the script has finished running and the required sanity checks have been performed.

 ## Metadata mapping from W&B to Neptune
</code_context>

<issue_to_address>
Clarify log file location.

Please specify whether `wandb_to_neptune_<timestamp>.log` is located inside the `tmp_<timestamp>` directory or elsewhere.
</issue_to_address>

### Comment 4
<location> `utils/migration_tools/from_wandb/README.md:57` </location>
<code_context>
+```
+
+### Note
+- If they don't exist yet, Neptune projects corresponding to the W&B projects will be created as [*private*][docs-project-access] with project description set to *Exported from <W&B project URL>*. To change the project visibility or description, use the web app or update the `create_project()` function in `copy_project()`.  
+- A `tmp_<timestamp>` directory is created in the current working directory to store the files and logs. Ensure that the current working directory is writable, and that there is enough disk space. You can delete this directory after the script has finished running and the required sanity checks have been performed.

</code_context>

<issue_to_address>
Consider rephrasing guidance on modifying script functions.

Rephrase to clarify that modifying `create_project()` is an advanced customization, not the standard approach, to avoid confusion for typical users.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
- If they don't exist yet, Neptune projects corresponding to the W&B projects will be created as [*private*][docs-project-access] with project description set to *Exported from <W&B project URL>*. To change the project visibility or description, use the web app or update the `create_project()` function in `copy_project()`.  
=======
- If they don't exist yet, Neptune projects corresponding to the W&B projects will be created as [*private*][docs-project-access] with project description set to *Exported from <W&B project URL>*. To change the project visibility or description, use the web app. Advanced users may also customize the `create_project()` function in `copy_project()`, but this is not required for typical usage.  
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

utils/migration_tools/from_wandb/README.md

utils/migration_tools/from_wandb/wandb_to_neptune.py

…irectory usage. Enhance `copy_console_output` function in migration script to log console output to Neptune runs.

SiddhantSadangi · 2025-06-16T10:30:01Z

@sourcery-ai review

Copilot

Pull Request Overview

The PR modernizes the W&B → Neptune migration script by adding full file and console log support, converting interactive prompts to CLI arguments, and standardizing metric namespaces and logging.

Implements downloading and uploading of run files (logs, source code, requirements, checkpoints, etc.).
Switches configuration entirely to CLI flags with default values and enhanced logging.
Renames system metric namespace from system.* to runtime/* and updates matching documentation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
utils/migration_tools/from_wandb/wandb_to_neptune.py	Add CLI parsing, file and console log copying, update metric namespace and logging
utils/migration_tools/from_wandb/README.md	Document new features, changelog, usage instructions, and CLI arguments

Comments suppressed due to low confidence (5)

utils/migration_tools/from_wandb/wandb_to_neptune.py:331

Consider renaming the local list FAILED_RUNS to failed_runs to follow standard snake_case naming for mutable variables.

    FAILED_RUNS = []

utils/migration_tools/from_wandb/wandb_to_neptune.py:438

This local variable FAILED_PROJECTS should be renamed to failed_projects to adhere to snake_case conventions.

    FAILED_PROJECTS = []

utils/migration_tools/from_wandb/wandb_to_neptune.py:164

Accessing the private attribute _project may break if the internal API changes. Consider using a public property or method on neptune_run to construct the target URL.

                    f"Copied {wandb_run.url} to https://scale.neptune.ai/{neptune_run._project}/runs/table?viewId=standard-view&query=(%60sys%2Fname%60%3Astring%20MATCHES%20%22{wandb_run.name}%22)&experimentsOnly=false&runsLineage=FULL&lbViewUnpacked=true"

utils/migration_tools/from_wandb/wandb_to_neptune.py:254

New file-reading and line-by-line logging logic has been introduced here. It would help maintain reliability if unit or integration tests covered copy_console_output handling of missing or malformed log files.

    with open(download_path, encoding="utf-8", errors="ignore") as f:

utils/migration_tools/from_wandb/wandb_to_neptune.py:58

[nitpick] Consider adding a brief docstring explaining the expected format of log_level values and the return tuple contents (logger, log_filename).

def setup_logging(log_level: str) -> tuple[logging.Logger, str]:

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes - here's some feedback:

Consider using a tempfile.TemporaryDirectory (or cleaning up tmpdirname in a finally block) so that the temporary folder is removed after migration instead of littering the working directory.
Rather than relying on globals like neptune_workspace, wandb_entity, and timeout inside copy_project, pass them explicitly as function parameters to improve clarity and testability.
The EXCLUDED_PATHS set mixes prefixes and exact filenames (e.g. config.yaml); to avoid accidental partial matches you might normalize paths and check equality for files while keeping prefix checks for directories.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Consider using a tempfile.TemporaryDirectory (or cleaning up `tmpdirname` in a finally block) so that the temporary folder is removed after migration instead of littering the working directory.
- Rather than relying on globals like `neptune_workspace`, `wandb_entity`, and `timeout` inside `copy_project`, pass them explicitly as function parameters to improve clarity and testability.
- The EXCLUDED_PATHS set mixes prefixes and exact filenames (e.g. `config.yaml`); to avoid accidental partial matches you might normalize paths and check equality for files while keeping prefix checks for directories.

## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:425` </location>
<code_context>
-
-    if selected_projects == "":
-        selected_projects = wandb_project_names
+    if args.projects:
+        selected_projects = [p.strip() for p in args.projects.split(",") if p.strip()]
     else:
-        selected_projects = [project.strip() for project in selected_projects.split(",")]
+        selected_projects = wandb_project_names

     if not_found := set(selected_projects) - set(wandb_project_names):
</code_context>

<issue_to_address>
No check for empty selected_projects after filtering.

Consider exiting or prompting the user if selected_projects is empty to prevent unnecessary execution.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    if args.projects:
        selected_projects = [p.strip() for p in args.projects.split(",") if p.strip()]
    else:
        selected_projects = wandb_project_names

    if not_found := set(selected_projects) - set(wandb_project_names):
        print(f"Projects not found: {not_found}")
    print(f"Copying {len(selected_projects)} projects: {selected_projects}\n")
    logger.info(f"Copying {len(selected_projects)} projects: {selected_projects}")
=======
    if args.projects:
        selected_projects = [p.strip() for p in args.projects.split(",") if p.strip()]
    else:
        selected_projects = wandb_project_names

    if not selected_projects:
        print("No valid projects selected after filtering. Exiting.")
        exit(1)

    if not_found := set(selected_projects) - set(wandb_project_names):
        print(f"Projects not found: {not_found}")
    print(f"Copying {len(selected_projects)} projects: {selected_projects}\n")
    logger.info(f"Copying {len(selected_projects)} projects: {selected_projects}")
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes - here's some feedback:

Refactor copy_run into smaller subfunctions for attribute handling, metrics copying, and file uploads to reduce complexity and improve readability.
Avoid relying on module-level globals (e.g., neptune_workspace, timeout) inside core functions by passing them explicitly as parameters to improve modularity and testability.
Consider batching or streaming console log lines rather than logging each line individually to reduce API calls and improve performance when handling large log files.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Refactor `copy_run` into smaller subfunctions for attribute handling, metrics copying, and file uploads to reduce complexity and improve readability.
- Avoid relying on module-level globals (e.g., `neptune_workspace`, `timeout`) inside core functions by passing them explicitly as parameters to improve modularity and testability.
- Consider batching or streaming console log lines rather than logging each line individually to reduce API calls and improve performance when handling large log files.

## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:283` </location>
<code_context>
+
+
+def copy_files(neptune_run: Run, wandb_run, download_folder: str) -> None:  # type: ignore
+    for file in wandb_run.files():
+        if file.size and not any(file.name.startswith(path) for path in EXCLUDED_PATHS):
+            download_path = os.path.join(download_folder, file.name)
+            try:
+                logger.debug(
+                    f"Downloading file {file.name} for run {wandb_run.name} to {download_path}"
+                )
+                file.download(root=download_folder, replace=True, exist_ok=True)
+                if file.name == "output.log":
</code_context>

<issue_to_address>
No handling for file name collisions in copy_files.

Files with identical names from different subdirectories may overwrite each other in the download_folder. Please implement logic to preserve directory structure or handle name collisions to prevent data loss.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

sourcery-ai

Hey @SiddhantSadangi - I've reviewed your changes - here's some feedback:

Remove unused helper functions and commented-out code (e.g., the threadsafe_change_directory context manager) to reduce clutter and maintainability burden.
Add cleanup logic to delete the temporary directory after migration completes to prevent leftover files from accumulating on disk.
Rename local mutable lists FAILED_RUNS and FAILED_PROJECTS to lowercase (e.g., failed_runs, failed_projects) to follow Python naming conventions for non-constant variables.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Remove unused helper functions and commented-out code (e.g., the threadsafe_change_directory context manager) to reduce clutter and maintainability burden.
- Add cleanup logic to delete the temporary directory after migration completes to prevent leftover files from accumulating on disk.
- Rename local mutable lists FAILED_RUNS and FAILED_PROJECTS to lowercase (e.g., failed_runs, failed_projects) to follow Python naming conventions for non-constant variables.

## Individual Comments

### Comment 1
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:347` </location>
<code_context>
             try:
-                future.result()
+                future.result(timeout=timeout)
+            except TimeoutError:
+                logger.error(f"Timeout copying run {wandb_run.project}/{wandb_run.name}")
+                FAILED_RUNS.append((wandb_run.project, wandb_run.name, "Timeout"))
</code_context>

<issue_to_address>
Ambiguous TimeoutError catch

`future.result(timeout=timeout)` raises `concurrent.futures.TimeoutError`, not the built-in `TimeoutError`. Please import and catch `concurrent.futures.TimeoutError` explicitly.
</issue_to_address>

### Comment 2
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:46` </location>
<code_context>
+EXCLUDED_PATHS = {"artifact/", "config.yaml", "media/", "wandb-"}
+

 def exc_handler(exctype, value, tb):
     logger.exception("".join(traceback.format_exception(exctype, value, tb)))
</code_context>

<issue_to_address>
exc_handler never registered

Set `sys.excepthook = exc_handler` after defining the handler to ensure it processes uncaught exceptions.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
def exc_handler(exctype, value, tb):
    logger.exception("".join(traceback.format_exception(exctype, value, tb)))
=======
def exc_handler(exctype, value, tb):
    logger.exception("".join(traceback.format_exception(exctype, value, tb)))

import sys
sys.excepthook = exc_handler
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `utils/migration_tools/from_wandb/wandb_to_neptune.py:149` </location>
<code_context>
+                except TypeError:
+                    pass
+                except Exception as e:
+                    logger.error(
+                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}' due to exception:\n{e}"
+                    )
</code_context>

<issue_to_address>
Use logger.exception to capture traceback

Using `logger.exception` will automatically include the stack trace, making debugging easier.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
                except Exception as e:
                    logger.error(
                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}' due to exception:\n{e}"
                    )
=======
                except Exception:
                    logger.exception(
                        f"[{copy_run.__name__}]\tFailed to copy '{attr}' from W&B run '{wandb_run.name}'"
                    )
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

utils/migration_tools/from_wandb/wandb_to_neptune.py

SiddhantSadangi added 3 commits June 2, 2025 14:20

feat: (WIP) Added file, fileseries, and stringseries support to W&B e…

c9577fa

…xporter

Added file and console logs support to wandb migration script

8e9ea6d

Enhance wandb migration script with improved logging and CLI argument…

2f72672

… handling. Updated README for clarity on usage and troubleshooting steps.

SiddhantSadangi requested a review from a team as a code owner June 13, 2025 15:44

SiddhantSadangi changed the title ~~Ss/wandb_exporter~~ Update W&B exporter Jun 13, 2025

SiddhantSadangi self-assigned this Jun 13, 2025

SiddhantSadangi requested a review from Copilot June 13, 2025 15:44

SiddhantSadangi added the enhancement New feature or request label Jun 13, 2025

Merge branch 'main' into ss/wandb_exporter

7a1e0ef

This comment was marked as outdated.

Sign in to view

SiddhantSadangi added 2 commits June 13, 2025 17:47

Update README.md for v0.2.0: Adjusted changelog dates and added detai…

ce5a2fc

…ls on default `num-workers` setting for the W&B to Neptune migration script.

Merge branch 'ss/wandb_exporter' of https://github.com/neptune-ai/sca…

683e957

…le-examples into ss/wandb_exporter

sourcery-ai bot reviewed Jun 13, 2025

View reviewed changes

utils/migration_tools/from_wandb/wandb_to_neptune.py Outdated Show resolved Hide resolved

utils/migration_tools/from_wandb/wandb_to_neptune.py Outdated Show resolved Hide resolved

utils/migration_tools/from_wandb/wandb_to_neptune.py Outdated Show resolved Hide resolved

szaganek reviewed Jun 16, 2025

View reviewed changes

Update README and migration script for W&B to Neptune transfer. Impro…

38665a5

…ved logging and error handling in the migration script to ensure smoother operation and clearer feedback during execution.

SiddhantSadangi requested review from szaganek and Copilot June 16, 2025 10:07

This comment was marked as outdated.

Sign in to view

updated README

1fded26

This comment was marked as outdated.

Sign in to view

sourcery-ai bot reviewed Jun 16, 2025

View reviewed changes

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

sourcery-ai bot reviewed Jun 16, 2025

View reviewed changes

Update README for clarity on Neptune project creation and temporary d…

b6a2480

…irectory usage. Enhance `copy_console_output` function in migration script to log console output to Neptune runs.

SiddhantSadangi requested a review from Copilot June 16, 2025 10:29

Copilot AI reviewed Jun 16, 2025

View reviewed changes

sourcery-ai bot reviewed Jun 16, 2025

View reviewed changes

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

sourcery-ai bot reviewed Jun 16, 2025

View reviewed changes

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

utils/migration_tools/from_wandb/wandb_to_neptune.py Show resolved Hide resolved

sourcery-ai bot reviewed Jun 16, 2025

View reviewed changes

szaganek previously approved these changes Jun 16, 2025

View reviewed changes

Updated the URL logging format for better readability.

55fe48f

SiddhantSadangi dismissed szaganek’s stale review via 55fe48f June 16, 2025 11:57

SiddhantSadangi merged commit e856870 into main Jun 16, 2025
7 checks passed

SiddhantSadangi deleted the ss/wandb_exporter branch June 16, 2025 11:59

Update W&B exporter #21

Update W&B exporter #21

Uh oh!

Conversation

SiddhantSadangi commented Jun 13, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

❔ This change

✔️ Pre-merge checklist

🧪 Test Configuration

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for file upload and logging during run migration

Class diagram for updated file handling and logging in migration script

Class diagram for CLI argument parsing and logging setup

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

This comment was marked as outdated.

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

SiddhantSadangi commented Jun 16, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SiddhantSadangi commented Jun 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

SiddhantSadangi commented Jun 13, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jun 13, 2025 •

edited

Loading