Potential issues

Parameters:

TRANSACTION_RETENTION_TIME: The time after which we can remove data of the processed transactions
CURSOR_CLAIM_TIMEOUT: Maximum time we are willing to block accepting of the deposit transactions on a single network (after the timeout, the same page starts being processed again by available worker)

Validator didn't finish scrolling the deposits

Failing to process the deposit transactions batch from the scroller results in failing to advance the scrolling cursor. The remaining data is not lost and will be eventually picked up (see Cursor not released) again by the next validator or the same validator in the next job cycle. This can lead to creating identical transactions in the database, which is discussed in Transaction duplicated when in progress and Transaction duplicated when finished.

Transaction is duplicated when being processed

Transactions in progress are always stored in the database and the repository makes sure that duplicate release transactions (duplications by the id of the relevant deposit transaction) are not created.

Transaction is duplicated when finished

When the release transaction is performed on the chain, it is still stored in the transactions queue with the finished state. The finished transactions are eventually removed from the database after TRANSACTION_RETENTION_TIME. Validator doesn't allow to create transaction when TRANSACTION_RETENTION_TIME elapsed since claiming the scrolling cursor. If the time already passes, the validator continues the loop (same as if it errored), leaving the scrolling cursor locked.

Cursor not released

If a validator errors before releasing the cursor or the processing of the transaction takes too long, the cursor remains locked. After safe timeout CURSOR_CLAIM_TIMEOUT the cursor is considered released and other validators can claim the cursor again. The errored validator doesn't update the cursor if CURSOR_CLAIM_TIMEOUT has passed since claiming the cursor.

Updated before saving

Read + update steps in a concurrent environment can result in lost updates (A reads, B reads, B updates, A updates -- B's update is lost). This means validators can fetch the transaction to be signed, try to sign it and their update of the transaction in the database will be lost by the write operation of the concurrent validator or the signature update can fail (conditioned that the state is unchanged from the read). If the failure occurs or the update is lost, the worker will try re-signing in the next round which will result in correcting the database state.

Transaction is sent again

Due to the unreliable state (see Marking as completed fails), there must be a step to prevent duplicate transaction execution.

Solana

The Multisig program is not allowing multiple executions of the same transaction and fails with The given transaction has already been executed.

EVM

All the release transactions are created and called with Safe Global, where no duplicate transaction executions are allowed, transactions fail with Already known

Transaction is rejected

If the transaction is rejected, the transaction is moved to the dead letter queue, if this process fails, the next validator will perform the same operation.

Marking as completed fails

The transaction execution and change in the database cannot be performed in a transaction-safe manner, we must assume this step will fail. The execution needs to handle the duplicates (see Transaction is sent again) and if the duplicate is detected, the state of the transaction needs to be corrected.

PreviousLifecycle of the validator NextSupported Networks, Token Bridge v2

Last updated 2 years ago