What are the Slack Archives?

It’s a history of our time together in the Slack Community! There’s a ton of knowledge in here, so feel free to search through the archives for a possible answer to your question.

Because this space is not active, you won’t be able to create a new post or comment here. If you have a question or want to start a discussion about something, head over to our categories and pick one to post in! You can always refer back to a post from Slack Archives if needed; just copy the link to use it as a reference..

Hello, do you know if it’s possible/working to speed up publish queue processing by having a worker

UPWG9AYH2
UPWG9AYH2 Posts: 509 🧑🏻‍🚀 - Cadet
edited May 2022 in Help

Hello,
do you know if it’s possible/working to speed up publish queue processing by having a worker count > 1? I assume this may fail since the data integrity in elasticsearch/redis is based on the order of the publish messages. Any experience here? Any ideas to speed up the processing in another way?
Best regards

Comments

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    We have run several workers in parallel in a project, wasn't an issue there, but we used a different search and exported all products to the external search after a full product import. So normally when the search index was rebuild the redis was already filled and all events where processed.
    If you can accept that a product might be found through search and the PDP can lead to a 404 for a moment you might even be fine with parallel processing in a scenario where you rely on elasticsearch. If not maybe a lookup for all existing products on a search result page can help and if one of the products is not found in redis it does not get linked to the PDP (maybe show a "not available" text instead of the "View Product" button)

  • UK7KBE2JW
    UK7KBE2JW Posts: 463 🧑🏻‍🚀 - Cadet

    We use 3 worker a 3 processes each worker in production and have no problems

  • UPWG9AYH2
    UPWG9AYH2 Posts: 509 🧑🏻‍🚀 - Cadet

    Okay, sounds interesting, but i have some scenario like this in mind: there are 2 messages in the publish queue … the first says, the user changed the price to 100, second says he changed then to 50 … first worker grabs the first message, second the second … somehow, the second worker is faster, because first one is more busy on other items or so … wouldn’t that cause, that the price in storage is now 100 again but 50 was intended? 🤔

  • UPWG9AYH2
    UPWG9AYH2 Posts: 509 🧑🏻‍🚀 - Cadet

    Or is there any timestamp comparison used here?

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    In theory that might happen. In our use cases prices haven't changed that often. If you have to deal with price updates every few seconds for the same product, sure this will become a problem, but even then it is "only" a display problem, as for the cart calculation the DB price will be used.

    Also depends on what queue you look at. Publishing (event queue) will only include an id for the entity that should be published and load the price from the DB anyway, so even if in the meantime the price in the db has changed a second time, the last updated price will be used for publishing. You could argue that the second publish would then be obsolete, but it also doesn't hurt.

    If you look at the sync queues: those also only include an id of the entry from the related storage table that should be synced, so if the publish event is already processed you should be fine.

    Another thing you could consider if you want to optimize the performance is to avoid using queue:worker:start and use queue:task:start for each individual queue.
    Because this will require a separate worker for each queue it might not be feasible for you for all queues.
    But you could also offload a single queue to it's own worker by using queue:task:start <queue> and adapt \Spryker\Zed\Queue\Business\Worker\Worker::executeOperation to filter out those queues that have their own worker and run the queue:worker:start only for all queues that do not have a separate worker.

  • UPWG9AYH2
    UPWG9AYH2 Posts: 509 🧑🏻‍🚀 - Cadet

    There were some helpful tipps that i will take into account. Thank you so much for your insights 🙂 I think we will not have that heavy price update logic. We hab some headaches using the db id generating mechanism in the past since as long as a worker does pull a new id (for sales for example) but do not save it into the db, another worker will pull the same id and just one is able to save it … this was very annoying to find^^so i just wanna avoid some headaches this time^^

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    For those cases you could change the id field to uuid and generate a uuid in the application. With uuid 4 there is a very small theoretical chance that two processes will generate the exact same id, but in practice is very unlikely to happen. This can also be solved by implementing some sort of retry mechanism during the save (if uuid already exists, just generate a new one and try those, that several uuid are coliding at the same time is very, very unlikely).

    Downside will be you can't order by id anymore, but for ordering by the last/first created entry you can use a created_at field (timestamp) instead.