Skip to content

feat: auto-downgrading revision #4678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 121 commits into
base: main
Choose a base branch
from

Conversation

korolenkowork
Copy link
Contributor

@korolenkowork korolenkowork commented May 1, 2025

Closes #4665

πŸ“‘ Description

This PR introduces automatic database downgrading support when rolling back KeepHQ to a previous version.

βœ… Checks

  • Main logic completed
  • I have updated the documentation as required
  • New logic covered by test
  • E2E is written
  • All the tests have passed

β„Ή Additional Information

  • From this commit onward, all migration files will be copied to the SECRET_MANAGER_DIRECTORY to enable safe database downgrades during rollback scenarios.
  • A new environment variable ALLOW_DB_DOWNGRADE has been added to explicitly allow or prevent database downgrades, reducing the risk of accidental downgrades.

Copy link

vercel bot commented May 1, 2025

@EnotShow is attempting to deploy a commit to the KeepHQ Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Feature A new feature labels May 1, 2025
@korolenkowork korolenkowork marked this pull request as draft May 1, 2025 21:23
@shahargl
Copy link
Member

shahargl commented May 2, 2025

hey @EnotShow - thanks for this PR!

as this PR introduces some major changes to how Keep works, I thought I would add some e2e tests for it (mainly install Keep, do some migration, then downgrade). wdyt?

Copy link
Member

@shahargl shahargl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

Copy link

codecov bot commented May 3, 2025

Codecov Report

Attention: Patch coverage is 8.45070% with 65 lines in your changes missing coverage. Please review.

Project coverage is 46.03%. Comparing base (1ae49bf) to head (f1328ef).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
keep/api/core/db_on_start.py 8.45% 65 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4678      +/-   ##
==========================================
- Coverage   46.19%   46.03%   -0.16%     
==========================================
  Files         165      165              
  Lines       17495    17564      +69     
==========================================
+ Hits         8081     8086       +5     
- Misses       9414     9478      +64     

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

vercel bot commented May 19, 2025

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 19, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 19, 2025
@korolenkowork
Copy link
Contributor Author

Hey @shahargl! Thx for your review! I made all the required changes, so it's ready for re-review :)

@korolenkowork korolenkowork requested a review from shahargl May 19, 2025 13:37
@shahargl
Copy link
Member

@EnotShow let tests pass and if they will I'll review

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 26, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 29, 2025
@korolenkowork
Copy link
Contributor Author

Hey @shahargl, it's resolved!

I also added a separate Docker Compose file for E2E migrations testing, which attaches a dedicated volume. Because it seems that accessing the same volume from different workflows was causing concurrent map writes, like this.

Copy link

cursor bot commented Jun 14, 2025

🚨 BugBot couldn't run

Pull requests from forked repositories are not yet supported (requestId: serverGenReqId_48ddba26-49ed-4dcf-a98c-b6051afed68d).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Migration Copying Fails on Missing Directories

The copy_migrations function has several flaws: it attempts to list directory contents (local_migrations_path and source_versions_path) without ensuring their existence, which can lead to FileNotFoundError if os.makedirs fails or source_versions_path is missing. Furthermore, its cleanup logic for local_migrations_path only removes files and symlinks, leaving stale subdirectories that can cause conflicts during subsequent migration copying.

keep/api/core/db_on_start.py#L183-L204

# Ensure destination exists
try:
os.makedirs(local_migrations_path, exist_ok=True)
except Exception as e:
logger.error(f"Failed to create local migrations folder with error: {e}")
# Clear previous versioned migrations to ensure only migrations relevant to the current version are present
for filename in os.listdir(local_migrations_path):
file_path = os.path.join(local_migrations_path, filename)
if os.path.isfile(file_path) or os.path.islink(file_path):
os.remove(file_path)
# Alembic needs the full migration history to safely perform a downgrade to earlier versions
# Copy new migrations
for item in os.listdir(source_versions_path):
src = os.path.join(source_versions_path, item)
dst = os.path.join(local_migrations_path, item)
if os.path.isdir(src):
shutil.copytree(src, dst, dirs_exist_ok=True)
else:
shutil.copy(src, dst)

Fix in Cursor


Bug: Migration Path Naming Mismatch

Environment variable MIGRATION_PATH (singular) is used in the Docker Compose files for Postgres, but the application expects MIGRATIONS_PATH (plural). This inconsistency prevents the configured /tmp/migrations path from being recognized, causing the default /tmp/keep/migrations to be used instead.

tests/e2e_tests/docker-compose-e2e-postgres-migrations.yml#L40-L41

- SECRET_MANAGER_DIRECTORY=/app
- MIGRATION_PATH=/tmp/migrations

tests/e2e_tests/docker-compose-e2e-postgres.yml#L41-L42

- SECRET_MANAGER_DIRECTORY=/app
- MIGRATION_PATH=/tmp/migrations

tests/e2e_tests/docker-compose-e2e-mysql.yml#L44-L45

- SECRET_MANAGER_DIRECTORY=/app
- MIGRATIONS_PATH=/tmp/migrations

Fix in Cursor


Bug: Database Migration Errors and Backup Failures

The migrate_db function has two issues:

  1. It incorrectly assumes any alembic.command.upgrade failure implies a database rollback, leading to an unwarranted downgrade attempt. Upgrade failures can stem from various causes (e.g., connection issues, permissions, migration errors), and attempting a downgrade in such cases is incorrect and risks data loss.
  2. If the database schema is already up-to-date, the function returns early without calling copy_migrations. This prevents the current version's migration files from being backed up locally, making it impossible to downgrade to this specific version in the future.

keep/api/core/db_on_start.py#L268-L286

# If the current revision is the same as the expected revision, we don't need to run migrations
if current_revision and expected_revision and current_revision == expected_revision:
logger.info("Database schema is up-to-date!")
return None
logger.warning(f"Database schema ({current_revision}) doesn't match application version ({expected_revision})")
logger.info("Running migrations...")
try:
alembic.command.upgrade(config, "head")
except Exception as e:
logger.error(f"{e} it's seems like Keep was rolled back to a previous version")
if not os.getenv("ALLOW_DB_DOWNGRADE", "false") == "true":
logger.error(f"ALLOW_DB_DOWNGRADE is not set to true, but the database schema ({current_revision}) doesn't match application version ({expected_revision})")
raise RuntimeError("Database downgrade is not allowed")
logger.info("Downgrading database schema...")
downgrade_db(config, expected_revision, local_migrations_path, app_migrations_path)

Fix in Cursor


Bug: Missing Step ID Causes Workflow Failure

The workflow attempts to reference steps.get_revision.outputs.revision, but the step intended to produce this output lacks the id: get_revision.

.github/workflows/run-migrations-e2e-tests.yml#L334-L336

echo "Tests completed!"
env:
WORKFLOW_REVISION: ${{ steps.get_revision.outputs.revision }}

.github/workflows/run-migrations-e2e-tests.yml#L478-L480

poetry run pytest -v tests/e2e_tests/test_end_to_end_db_auth.py -n 4 --dist=loadfile
env:
WORKFLOW_REVISION: ${{ steps.get_revision.outputs.revision }}

Fix in Cursor


Was this report helpful? Give feedback by reacting with πŸ‘ or πŸ‘Ž

@tuantran0910
Copy link
Contributor

Hi @shahargl, @talboren, can we get this PR merged ? I think it would be a good feature and we are planning to deploy Keep in our production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements or additions to documentation Feature A new feature size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[βž• Feature]: Implement "auto downgrades"
3 participants