What are the Slack Archives?

It’s a history of our time together in the Slack Community! There’s a ton of knowledge in here, so feel free to search through the archives for a possible answer to your question.

Because this space is not active, you won’t be able to create a new post or comment here. If you have a question or want to start a discussion about something, head over to our categories and pick one to post in! You can always refer back to a post from Slack Archives if needed; just copy the link to use it as a reference..

Hi, In our project we have Redis synchronization issues since several months. Mainly we have not al

Alberto Reyer
Alberto Reyer Posts: 690 🪐 - Explorer

Hi,

In our project we have Redis synchronization issues since several months. Mainly we have not all data in redis which is part of the related *_storage table.
To see if all data is synchronized to Redis I’ve build a tool to compare the count of a *_storage table with the respective key count in redis. With this tool it became pretty obvious that we have a synchronization problem.
I already took several experiments and tried to find anything in the logs, but there was just no reasons I could find, which could explain the missing entries.
What I’ve validated so far:

  • Redis: maxmemory = 0
  • Redis: maxmemory-policy: noeviction

    • Writing 100000 entries with random strings (writing was completely successful)
    • manual comparing counts with redis-cli vs. DB

    Did you had similar experiences in any of you Spryker projects or did you have a hint what else we can have a look into?

One thing which took our attention in this regards was that after we have set up a test Kubernetes cluster with a managed Postgres database in Azure, but still a self hosted redis, this issue did not came up again. But I couldn’t find an issue with loading the data from the database either.

Comments

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    We’ve also checked that RabbitMQ is not the bottleneck or has issues. So we are pretty confident that the problem can be only in Redis or when data is written to redis.

  • Hi Alberto, did you have a look into the spy_queue_process database table?

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    @UK5EG6PBM In which regard? There are several entries in the spy_queue_process and we get events processed. Also there is a lot of data in our redis, but not all data which is in the related storage table.

  • That database table should be empty once the queue runs out of messages. If you have remaining entries in that table it could be a hint that something is wrong

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    Ok, but this isn’t our problem. Depending on when I do a look into the table it’s sometimes empty and we see in rabbitMQ that all queues are processed successful, but still data is missing in redis.

  • There are circumstances when Redis doesn't behave like you would expect. It might happen that Redis doesn't except writes while it's persisting data to disk. This, however, should be visible in the logs somehow

  • Until now, we were always able to identify causes of synchonization issues. Network, limited pod resources, various Kubernetes related problems, queue issues, ...

  • Maybe not related, but we had an issue with cms pages that had no store relation and where (silently) skipped.

  • Stanislav Matveyev
    Stanislav Matveyev Sprykee Posts: 211 🧑🏻‍🚀 - Cadet
    edited December 2019

    One more possible reason (if you have custom publishers) is race condition between write & delete Redis messages.

    When your publisher does DELETE from *_storage table it generates DELETE Redis records jobs in rabbitMq, and if your publisher after that does INSERT new records with same keys, which have been removed recently it generates WRITE Redis records jobs in rabbitMq.

    Because RabbitMQ could proccess jobs asynchronously sometimes it can WRITE key and next worker can DELETE this key.

    As result in DB you have records, but not all records in Redis.

  • Ehsan Zanjani
    Ehsan Zanjani Head of Solution Architecture @ Spryker Posts: 113 🧑🏻‍🚀 - Cadet

    Hi @UL6DGRULR,
    if events are processed successfully, you might see some messages in sync.storage.* queues, these sync queues will be consumed to let the Redis gets updated.

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    Thx for all the answers. We already could exclude RabbitMQ as failure point, as our sync all tool writes directly into redis, without involving RabbitMQ.

    @UK5EG6PBM The hint that redis sometimes does not accept writes is the most promising for me, but sadly there was nothing related in the logs, even when running with debug log level for redis. So far we already identified that in our production cluster the disk becomes pretty slow with a lot of writes. The througput of azure disks is a bad joke, it’s even slower than USB 1.0. So this might be the issue.

  • Good luck on you bug hunt 🍀