Skip to content

"FATAL: Failed to spend" error during initial sync #283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
talebi opened this issue Mar 24, 2025 · 31 comments
Open

"FATAL: Failed to spend" error during initial sync #283

talebi opened this issue Mar 24, 2025 · 31 comments
Labels
observation required Dev unable to reproduce situation described by user

Comments

@talebi
Copy link

talebi commented Mar 24, 2025

I have been using v1.10 on a Windows machine for a while with no issues until a power failure caused the "db in inconsistent state" error. then I had to resync but every time I try I get an error at different stages like:

[2025-03-24 22:49:36.037] <Controller> Processed height: 256000, 28.8%, 113.8 blocks/sec, 27904.8 txs/sec, 72246.2 addrs/sec
[2025-03-24 22:49:46.235] <Controller> Processed height: 257000, 28.9%, 98.0 blocks/sec, 31380.6 txs/sec, 74520.3 addrs/sec
[2025-03-24 22:49:54.599] <Controller> Processed height: 258000, 29.0%, 119.6 blocks/sec, 28980.2 txs/sec, 78229.2 addrs/sec
[2025-03-24 22:50:03.201] <Controller> FATAL: Failed to spend: f63add26fae45e935ab8a2bedaf1af40437c121e049df582c46569145c817476:1 (spending txid: 91735f64d2eeac80e67376d1a2bfe7a462469610aa87698bd108fd0c8b6cb04b)

The database is now likely in an inconsistent state. To recover, you will need to delete the datadir and do a full resynch. Sorry!

[2025-03-24 22:50:03.202] Stopping Stats HTTP Servers ...
[2025-03-24 22:50:03.202] Stopping Controller ...
[2025-03-24 22:50:03.236] Stopping BitcoinDMgr ...
[2025-03-24 22:50:03.239] Closing storage ...
[2025-03-24 22:50:04.068] Shutdown complete

I am using a quick config file and at this stage totally clueless as what to do.

@cculianu
Copy link
Owner

cculianu commented Mar 24, 2025

Oh wow. On windows huh? I wonder if something with the new rocksdb in 1.11+ somehow is broken on windows.

I did a few full synchs on windows on small testnets and they worked ok. You are on Bitcoin CIA version or on Bitcoin Cash or what?

@cculianu
Copy link
Owner

Also you aren't using the utxo-cache option right?

@talebi
Copy link
Author

talebi commented Mar 24, 2025

I use it on the mainnet. have tried both with and without utxo-cache, with same results

@cculianu
Copy link
Owner

cculianu commented Mar 24, 2025

I use it on the mainnet. have tried both with and without utxo-cache, with same results

Mainnet what? BTC (Bitocin CIA) or BCH (Bitcoin Cash) or LTC (Litecoin CIA)? Fulcrum supports all 3.

@talebi
Copy link
Author

talebi commented Mar 24, 2025

sorry meant mainnet on bitcoin core

@cculianu
Copy link
Owner

cculianu commented Mar 24, 2025

Ok, I'll set up my dev machine here on windows to do a full sync there using Bitcoin CIA version. The problem started happening on 1.11 and above or what? Which version should I use? Also if you go back to 1.10 does it "just work"?

@talebi
Copy link
Author

talebi commented Mar 24, 2025

thanks. so the problem started on 1.10 (after resync due to power failure) but then upgraded to the latest 1.12 but the same happens. the max I got to was around 56% but generally the error happens around 20-30% mark

@talebi
Copy link
Author

talebi commented Mar 24, 2025

also the first few lines of logs as per below (if it helps)

PS C:\Fulcrum-1.12.0-win64> .\Fulcrum.exe .\fulcrum-quick-config.conf
[2025-03-24 22:37:52.566] Enabled JSON parser: simdjson
[2025-03-24 22:37:52.566] simdjson implementations:
[2025-03-24 22:37:52.567]     haswell: Intel/AMD AVX2  [supported]
[2025-03-24 22:37:52.567]     westmere: Intel/AMD SSE4.2  [supported]
[2025-03-24 22:37:52.567]     fallback: Generic fallback implementation  [supported]
[2025-03-24 22:37:52.567] active implementation: haswell
[2025-03-24 22:37:52.568] jemalloc: version 5.3.0-0-g54eaed1
[2025-03-24 22:37:52.568] Qt: version 5.15.13
[2025-03-24 22:37:52.568] rocksdb: version 9.2.1-08f9322
[2025-03-24 22:37:52.568] simdjson: version 0.6.0
[2025-03-24 22:37:52.568] ssl: OpenSSL 3.3.0 9 Apr 2024
[2025-03-24 22:37:52.568] zmq: libzmq version: 4.3.3, cppzmq version: 4.10.0
[2025-03-24 22:37:52.568] UPnP: miniupnpc 2.3.0 (API version: 19)
[2025-03-24 22:37:52.568] Fulcrum 1.12.0 (Release 7743cca) - Mon Mar 24, 2025 22:37:52.568 AUS Eastern Daylight Time - starting up ...
[2025-03-24 22:37:52.569] Loading database ...
[2025-03-24 22:37:54.717] DB memory: 512.00 MiB
[2025-03-24 22:37:54.717] Verifying headers ...
[2025-03-24 22:37:54.717] DB version: v3
[2025-03-24 22:37:54.717] BitcoinDMgr: starting 3 bitcoin RPC clients ...
[2025-03-24 22:37:54.718] BitcoinDMgr: started ok
[2025-03-24 22:37:54.718] Stats HTTP: starting 1 server ...
[2025-03-24 22:37:54.718] Starting listener service for HttpSrv 127.0.0.1:8080 ...
[2025-03-24 22:37:54.721] Service started, listening for connections on 127.0.0.1:8080
[2025-03-24 22:37:54.722] <BitcoinDMgr> Coin: BTC
[2025-03-24 22:37:54.830] <Controller> Chain: main
[2025-03-24 22:37:54.830] <Controller> Block height 889225, downloading new blocks ...
[2025-03-24 22:37:54.830] <Controller> utxo-cache: Not enabled
[2025-03-24 22:37:55.060] <Controller> Processed height: 1000, 0.1%, 4390.4 blocks/sec, 4473.7 txs/sec, 4587.7 addrs/sec

@cculianu
Copy link
Owner

cculianu commented Mar 24, 2025

Thanks for the log.

thanks. so the problem started on 1.10 (after resync due to power failure) but then upgraded to the latest 1.12 but the same happens. the max I got to was around 56% but generally the error happens around 20-30% mark

This is very disconcerting and indicates some strange low-level DB error with rocksdb.. is my going hypothesis. Will do a full synch today on ciacoin + windows and let you know.

@cculianu
Copy link
Owner

Well so I set it up and am running it on Windows. I'm on Windows10. Connected to bitcoind core 27.0. It's synching just fine so far -- It's at block height 567,000 and counting.

@cculianu cculianu added the observation required Dev unable to reproduce situation described by user label Mar 25, 2025
@cculianu
Copy link
Owner

So far I'm at block 665,000 and it's still working ok.

@cculianu
Copy link
Owner

Block 720k. No errors so far.

I think this will likely just succeed to sync. Can you tell me more details about what you are doing? It is mysterious to me that you would consistently get these errors?

Also is your database in some dir that is on a strange filesystem (like a network share or something)?

@talebi
Copy link
Author

talebi commented Mar 26, 2025

I have a fairly decent machine (intel i7 14th gen, 64GB memory) running Win 11. the database is in C drive somewhere (so no strange share or anything). below are my quick config if it helps:

datadir = C:\FulcrumData\mainnet
bitcoind = 127.0.0.1:8332
rpcuser = someuser
rpcpassword = someP@ss
tcp = 0.0.0.0:50001
peering = false
announce = false
admin = 8000  # <-- 1.2.3.4:8000 notation also accepted here
donation = bitcoincash:qplw0d304x9fshz420lkvys2jxup38m9symky6k028
stats = 8080  # <-- 1.2.3.4:8080 notation also accepted here
utxo_cache = 0

thanks for checking.

@cculianu
Copy link
Owner

Hmm.. and it's ntfs and not fat32 or something like that, right?

I haven't ever once run Windows 11. I'm on Windows 10. Jeez I hope that's not it..

@talebi
Copy link
Author

talebi commented Mar 26, 2025

yes its ntfs.
just trying to remove a few apps I recently installed to see if it helps. I will report back if I succeed

@cculianu
Copy link
Owner

cculianu commented Mar 26, 2025

I wonder if antivirus apps somehow are screwing with the files in the datadir. Maybe see if you can make the datadir for Fulcrum immune to antivirus real-time checking?

@cculianu
Copy link
Owner

Well the sync finished here just fine on my Windows 10 box. No issues. I sincerely hope you get to the bottom of this but since I cannot reproduce it I really can't help.

I don't think it's a problem with Fulcrum itself.. but who knows.

@talebi
Copy link
Author

talebi commented Mar 26, 2025

I have disabled the windows defender temporarily and hasn't broken so far. not sure if that helped or removing recent apps, but so far so good (74%)

@talebi
Copy link
Author

talebi commented Apr 7, 2025

just to update, after 2 weeks of struggling (disabling defender, programs etc) I still get the error. I've decided to move on to other options. thanks for all your help much appreciated

@cculianu
Copy link
Owner

cculianu commented Apr 7, 2025

Well if you ever figure it out do let me know... very unusual. Never seen anything like it.

@Sroose
Copy link

Sroose commented Apr 16, 2025

I confirm I see the same on v1.12 on Ubuntu.

Fulrum was running stable for a few months on 1.11, but I got a full disk so I had to remove the database and started fully fresh with 1.12.

  1. First attempt was with utxo_cache = 3000 (I have plenty of RAM), this failed after 2.5 hours:
    [2025-04-16 15:01:38.499] <Controller> FATAL: batch merge fail for block height 560230: Corruption: block checksum mismatch: stored(context removed) = 1234764856, computed = 1055331840, type = 4

  2. Second attempt was with utxo_cache = 1000, this failed after 3 minutes (at blockheight 221000):

[2025-04-16 15:08:24.099] <Controller> FATAL: Failed to spend: 03f338a8479cd0f94952da693157a533a51440af9232ab8a386e4e48d5851019:1 (spending txid: 467a1c2cb7f8433cd243772e99593f1d47d2f41e8dd80e81b64edabedcfbd462)
The database is now likely in an inconsistent state. To recover, you will need to delete the datadir and do a full resynch. Sorry!
  1. Third attempt was with utxo_cache disabled, like I used to have it. However, it again crashed after 5 minutes at the same tx:
[2025-04-16 15:15:55.910] <Controller> FATAL: Failed to spend: 03f338a8479cd0f94952da693157a533a51440af9232ab8a386e4e48d5851019:1 (spending txid: 467a1c2cb7f8433cd243772e99593f1d47d2f41e8dd80e81b64edabedcfbd462)
The database is now likely in an inconsistent state. To recover, you will need to delete the datadir and do a full resynch. Sorry!
  1. Tried again and still the same, now it consistently crashes at this location.

After each attempt I started with a complete empty .fulcrum data folder. The tx it fails at is a very old one, if there would be a bitcoincore data issue I would assume it did not pass there the first time. So it seems a bit random while at the same time it's now each time the same tx..

Note I did not perform any library/ubuntu updates in a while that would cause this change.

@cculianu
Copy link
Owner

Are you running in docker or anything like that? Also what FS on Ubuntu?

@Sroose
Copy link

Sroose commented Apr 16, 2025

No docker, just the plain precompiled binaries.
FS is type ext4 (rw,noatime), Samsung NVME SSD, LUKS

btw, I reverted to 1.11 and it fails at the same tx now

update:

bitcoin-cli getrawtransaction 03f338a8479cd0f94952da693157a533a51440af9232ab8a386e4e48d5851019 true
No such mempool or blockchain transaction. Use gettransaction for wallet transactions.

seems like now the bitcoin db is probbaly the culprit :'(

@cculianu
Copy link
Owner

Interesting as fuck. And it's always that tx. I really need to investigate this. Wow.

@cculianu
Copy link
Owner

Is -txindex enabled on the bitcoind side? (enabled in bitcoin.conf as txindex=1)??

@cculianu
Copy link
Owner

I actually suspect some form of corruption on the bitcoind side somehow.. tbh... that propagates forward to Fulcrum This is quite a pickle!

@Sroose
Copy link

Sroose commented Apr 16, 2025

yes i have txindex on. Note that i'm using this setup for years now. This year Ive had some more bad luck with corruptions though. Maybe I should try building bitcoin core db from scratch as well

@cculianu
Copy link
Owner

You can try rebuilding with -reindex-chainstate as you may very well know (but that can take a day).

Something is fishy with the blocks I suspect but I cannot be sure.

@Sroose
Copy link

Sroose commented Apr 24, 2025

the cat is out of the bag: some memory modules apparently give bitflips, resulting in the corruption of both bitcoin and fulcrum databases.

@cculianu
Copy link
Owner

Oh man glad it wasn’t this software

But sorry to hear you gotta buy new ram

How did you figure it out? You ran memtet86 or other software to test ram ?

@Sroose
Copy link

Sroose commented Apr 24, 2025

yes, memtest86 (this required a reboot and boot from memtest usb)
But for whoever is in the same boat and wants to test live: stress-ng --vm 2 --vm-bytes 80% -t 5m
I got errors quite quickly from this live test as well so I guess the memory was really messed up.

Good that it was definitely not your software 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
observation required Dev unable to reproduce situation described by user
Projects
None yet
Development

No branches or pull requests

3 participants