-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] node stalls syncing Error: Err.UNKNOWN_UNSPENT #17797
Comments
It starts to sync from good database backup, then screws up: while true; do sleep 10; chia show -s | grep Status ; done Current Blockchain Status: Syncing 5134139/5134139 (0 behind). |
We are thinking that somehow your coin_record table got corrupted and lost a coin. Best bet would be to get a new database by downloading from torrent or syncing from 0. Its failing because 5134793 needs this coin to be unspent. That's why you can't sync past it |
I have no clue how this could have happened. It was always syncing. Getting complete database is unacceptable. If such issues are not fixed, the project will loose participants - every time a bug corrupts the db pull 180GB of data? This is not serious. There must be a mechanism implemented either to fix broken databases, or blockchain pruning must be implemented. Think about it: at which database size the project gets unmanageable for most users? 100GB, 1TB, 10TB? Can the db be fixed by e.g. vacuuming it only up to certain sequence number (before the corruption point)? |
Don't you run another node? You can simply copy the db from one machine to another. You can also do backups from time to time. Data corruption can always happen, especially on consumer hardware. |
Nope, I don't run another. It is exactly not so "simple copy it from another machine". Such projects are started without ahead planing to my eye. Oh just use Sqlite. If it had to be real block chain then it would be possible to just dd it and discard the corrupted tail or so (like mpeg for instance). The db size makes it unmanageable long term for most of the users if it keeps corrupting. So many users complaining about chia not syncing if you google it. Maybe it is time to think about real solution to this and not just have to start from 0, eh? What about some DB validation tool that can revive a not syncing db? Can't be that difficult to write if you know the database format - I would say. I didn't look into details. ps upgraded to latest and it is not even trying to sync (never showing a syncing attempt) |
can you write a SQL script that will discard last X blocks from the DB ? are there any sequence numbers... |
e.g. lets take a simple example of 3 blocks: block1 [ coin1 coin2 coin 3 ] lets assume (without me knowing details about chia format) that block1 is only there due to coin1 not having another future transaction, however otherwise it is completely useless. Would it be possible to implement it in the protocol to automatically send coin1 to the same address it belongs it and then discard block1 - as no further future transaction will be ever able to reference it, since all coins in that block were moved into further on transactions. If Im not mistaken Nexellia is using something similar to keep the chain size reasonably small. Solutions DO exits. I'm sure you can find a solution. |
The fastest workaround is to download the DB from a torrent. Some error or edge case or corruption may have lost your coin. Unfortunately we haven't seen this issue from others so it may be impossible to reproduce. If you are able to reproduce it reliably, please let us know. Any issue we can reproduce should be fixable. |
I ran into this as well on my full node today: |
=> to me it looks like first update to 1.8.3 then 2.0.0 crippled my database. It was syncing before I decided to upgrade to v2 format. |
ps I was running 1.6.x for a while before I did that upgrade |
Do you still have the v1 DB? Is this reproducible when you convert the DB from v1 to v2? If so that is really interesting and we should be able to fix the issue. A coin could be getting dropped during the migration. |
eming, can u ask your node for additions for block 1729601? chia rpc full_node get_additions_and_removals '{"header_hash": "0x5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae"}' Lets see if it finds { aka 18c739bd4ef06e6059216141d8596c573d588dcda7fe67383e1abe25d1463561 https://alltheblocks.net/chia/coin/18c739bd4ef06e6059216141d8596c573d588dcda7fe67383e1abe25d1463561 We are trying to determine if u have the block correctly but it just wasn't putting the coin into coin_records |
it prints: chia rpc full_node get_additions_and_removals '{"header_hash": "0x5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae"}' |
Im afraid I deleted the v1 one. Maybe the v1 got corrupted when chia shut down and then I migrated an improperly shut down database? Had to vaccuum it first maybe? Anyways this happened to me several times in the past (when the DB was far smaller, now Im unable to start from 0). Should be addressed ASAP to have a journaled transactions or so, it just can't be that so much data gets corrupted because e.g. of power outage and the pc goes down. |
We use WAL for sqlite. But that doesn't mean that a drive can't corrupt things. I'm a bit concerned that u can't get information for block https://alltheblocks.net/chia/block/5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae as that is at a million or so and you are at 5 million. Also that you say you have had problems like this in the past. I am wondering if the drive hosting your chia database is having issues or maybe the controller or connection. |
So, you are saying the DB was corrupted at an earlier stage but it just manifests right now because someone moved that coin from the past, is that correct? In that case I wonder why the protocol is so inconsistent, there should be a consistent stream of blocks. I didn't look into details but it stills looks to me like giving too much control to SQLite may be one of a design errors here. In a real block chain one should be able to just "chop it off" at block X and start syncing from there. But it looks more like a SQL blob |
There are tables for the coin store and when reorgs happen these need to get processed to account for changes in the blockchain, so the DB is constantly changing and unfortunately can't get chopped off. We are wondering if there are issues with these changes being made because of drive problems, only because you've had multiple issues in the past. WAL is supposed to prevent corruption due to power outages etc, but still assumes a working disk. We think this is a DB issue, if you replace it things should get working again. Apologies for the problems. Please let us know if you continue to have issues especially if it involves a different drive. |
This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days. |
This issue was automatically closed because it has been flagged as stale, and subsequently passed 7 days with no further activity from the submitter or watchers. |
What happened?
Node doesn't sync. When I replace the database with known-good data base it starts syncing some 200-3000 blocks but then fails again:
2024-03-27T17:26:41.240 full_node chia.full_node.full_node: ERROR Error: Err.UNKNOWN_UNSPENT, Invalid block from peer: PeerInfo(_ip=IPv4Address('58.183.125.25'), _port=8444)
2024-03-27T17:26:41.439 full_node full_node_server : WARNING Banning 58.183.125.25 for 600 seconds
2024-03-27T17:26:41.442 full_node chia.full_node.full_node: ERROR sync from fork point failed err: Failed to validate block batch 5134792 to 5134824
2024-03-27T17:27:58.593 full_node chia.consensus.block_body_validation: ERROR Err.UNKNOWN_UNSPENT: COIN ID: 0cc7d7663bf47dd61c280e178ab7ae9af068b2b51ab536ecf6d328942623487c NPC RESULT
Happens with latest, 2.2.1 and 2.0.1 versions, tested them all.
Version
2.0.1
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output
The text was updated successfully, but these errors were encountered: