Failing to update `localMessageState` results unwanted message reordering #2693

chan-reclip · 2023-07-05T05:11:08Z

NB: This bug report is based on my speculation and I don't have repro steps or a concrete proof that what I described below is exactly what happened.

We've recently encountered this issue again, so I had a look at the code to debug possible reasons.
While browsing through the call stack, I found a suspicious code in MessageRepository.swift.

For whatever reason, database.write (L53) fails, we will get into the error handling code (L58-61) which basically calls markMessageAsFailedToSend to update localMessageState to .sendingFailed.

stream-chat-swift/Sources/StreamChat/Repositories/MessageRepository.swift

Lines 52 to 63 in cd24f3f

    
           // Change the message state to `.sending` and the proceed with the actual sending 
        
           self?.database.write({ 
        
               let messageDTO = $0.message(id: messageId) 
        
               messageDTO?.localMessageState = .sending 
        
           }, completion: { error in 
        
               if let error = error { 
        
                   log.error("Error changing localMessageState message with id \(messageId) to `sending`: \(error)") 
        
                   self?.markMessageAsFailedToSend(id: messageId) { 
        
                       completion(.failure(.failedToSendMessage)) 
        
                   } 
        
                   return 
        
               }

I see two issues here:

The logic to update localMessageState works only when the old state is .sending. Given we failed to update the state to .sending, it won't work.

stream-chat-swift/Sources/StreamChat/Repositories/MessageRepository.swift

Lines 130 to 132 in cd24f3f

    
           if dto?.localMessageState == .sending { 
        
               dto?.localMessageState = .sendingFailed 
        
           }

Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to .sendingFailed because that's the same database.write operation which just failed.

Anyways, regardless of this issue, MessageSendingQueue will remove the request once it's tried.

stream-chat-swift/Sources/StreamChat/Workers/Background/MessageSender.swift

Lines 146 to 148 in cd24f3f

    
           self?.messageRepository.sendMessage(with: request.messageId) { _ in 
        
               self?.removeRequestAndContinue(request) 
        
           }

The above code ignores the result _ because, I believe, it assume sendMessage will either succeed or update localMessageState to .sendingFailed. Unfortunately, that assumption seems to be wrong.

If there are more requests in the send request queue, those requests could hit the backend earlier than the above failed request. Because the failed message's localMessageState will stay as .pendingSend, it will be retried later. This will eventually result message delaying and thus reordering.

The text was updated successfully, but these errors were encountered:

chan-reclip · 2023-07-05T05:22:02Z

cc @jfru

polqf · 2023-07-05T06:45:06Z

Hi @chan-reclip,

Thank you for the report. We'll have a look into it.

Couple of comments:

Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄
If there is any evidence, could you tie this to a particular event happening in your app?

Regardless of the previous:

Even if the old state checking issue is somehow fixed, I'm not too sure it can successfully update the state to .sendingFailed because that's the same database.write operation which just failed

If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.

regardless of this issue, MessageSendingQueue will remove the request once it's tried

Because the failed message's localMessageState will stay as .pendingSend, it will be retried later. This will eventually result message delaying and thus reordering

Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it. So this is the reason why we are ignoring the error, and removing the message from the queue.

In any case, we are going to have a deeper look into it whenever possible.

chan-reclip · 2023-07-05T12:25:41Z

Hi @polqf,

Thanks for the prompt response.

Which evidence do you have that support that you reach MessageRepository L57 with an error? Is it logs or is it an assumption in this case? Just asking because I am not sure based on the comment if you saw that happening or not. I think this would be the most important thing for us to try to help you 😄

Totally agreed but unfortunately I couldn't get any logs from the original reporter. 😞
As I mentioned at the top of description, it's purely based on my speculation. I'm just judging from my understanding of the related code, if we ever reach L57, the issue can be reproduced.

If the write fails, chances are that either the message does not exist, that it has been deleted, or that the DB is in a bad state. I am mostly inclined towards the latest as there is no throwing function that is called in that process that could justify an error being reported in L57.

Yes, I also think the message should exist as we check it just before the write.

stream-chat-swift/Sources/StreamChat/Repositories/MessageRepository.swift

Lines 29 to 33 in cd24f3f

    
           guard let dto = self?.database.backgroundReadOnlyContext.message(id: messageId) else { 
        
               log.error("Trying to send a message with id \(messageId) but the message was deleted.") 
        
               completion(.failure(.messageDoesNotExist)) 
        
               return 
        
           }

Based on the assumption that I stated above, if seems "safe" to assume that if the message was not retrieved in the previous step, there is no way to recover from it.

It's "safe" if message was not retrieved. But haven't you said you're mostly inclined to DB being in a bad state?

In any case, we are going to have a deeper look into it whenever possible.

I'll be waiting for it.

Cheers!

nuno-vieira assigned polqf Jul 5, 2023

nuno-vieira added 🔎 Investigating This issue is currently being investigated (Not reproducible yet) 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK labels Jul 5, 2023

chan-reclip mentioned this issue Jul 25, 2023

Speculative fix for outbound messages going out of order reclipapp/stream-chat-swift#1

Merged

nuno-vieira unassigned polqf Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to update `localMessageState` results unwanted message reordering #2693

Failing to update `localMessageState` results unwanted message reordering #2693

chan-reclip commented Jul 5, 2023 •

edited

chan-reclip commented Jul 5, 2023

polqf commented Jul 5, 2023

chan-reclip commented Jul 5, 2023

Failing to update localMessageState results unwanted message reordering #2693

Failing to update localMessageState results unwanted message reordering #2693

Comments

chan-reclip commented Jul 5, 2023 • edited

chan-reclip commented Jul 5, 2023

polqf commented Jul 5, 2023

chan-reclip commented Jul 5, 2023

Failing to update `localMessageState` results unwanted message reordering #2693

Failing to update `localMessageState` results unwanted message reordering #2693

chan-reclip commented Jul 5, 2023 •

edited