What are the Slack Archives?
It’s a history of our time together in the Slack Community! There’s a ton of knowledge in here, so feel free to search through the archives for a possible answer to your question.
Because this space is not active, you won’t be able to create a new post or comment here. If you have a question or want to start a discussion about something, head over to our categories and pick one to post in! You can always refer back to a post from Slack Archives if needed; just copy the link to use it as a reference..
Hello everyone, We are encountering an issue when trying to write a large amount of data to RabbitM
Hello everyone,
We are encountering an issue when trying to write a large amount of data to RabbitMQ using the sync:data command.
We call the command using an installer which has a timeout set to 10h. The sync:data process is killed by the timeout. We noticed a “stall” behaviour of the application, nothing happens on the database and RabbitMQ side and the sync process is still running.
After investigating we noticed 2 things happening:
• An exception is logged from php-amqplib (https://github.com/php-amqplib/php-amqplib/blob/9429243609cd40c2afab1d8bafc115188f8dc2f9/PhpAmqpLib/Wire/IO/StreamIO.php#L275), which means fwrite returned the result !== false but with a timeout in the metadata.
• After the exception is logged the application is not exiting until is killed by the main process timeout.
Our questions are:
• Is there any way of getting rid of the timeout or some sort of retry mechanism in Spryker in case writing to RabbitMQ fails?
• Do you have any idea why the process is not exiting when the exception occurs? This issue is happening only on production, we were not able to reproduce it yet on the dev machines. We hope someone else encountered this and managed to fix it.
Thank you!
Comments
-
@UL65CH0MC something you encountered as well, maybe?
0 -
Yes Thomas. It’s just the same behaviour that we had.
0 -
Correct me if I am wrong, but you managed to work aorund this by adjusting EXPORT_CHUNK_SIZE STORAGE_SYNC_CHUNK_SIZE to 20K, both, in your case
0 -
Yes. For our env and big data 20k is not large amount. But 50k+ then there is this “stall” and exceptions form rmq lib
0 -
Thank you for your answers. We will try to change the chunk size and see what happens.
0 -
hi giovanni, what kind of stall behavior you've seen?
0 -
Hi Lucian,
i just seen that queue:worker took very much time (more than 20 min) and none process on server was running.. no data was written in db, no events was consumed... nothing.. The worker was in "stall" withsync.storage.product
queue (it was the only queue on running, but nothing happened). After some minutes, ca. 15 min, than i got the first error from amq and after ca. 1-2 min a lot of exceptions about channel closed from amq (all events in unacked triggered an exception)
I have also attached some screenshot from yesterday...0 -
@tom.lehner Do you have any idea why the default value for AMQP_STREAM_CONNECTION_READ_WRITE_TIMEOUT is 130? (https://github.com/spryker/rabbit-mq/blob/05b18242a564897a153ffc0f13e9493e37ec764c/src/Spryker/Client/RabbitMq/RabbitMqConfig.php#L23) . The default socket read_write timeout is 60. I’m wondering why the value is 130 on Spryker side.
0 -
@UNTTWV4JK cant tell you off the top of my head. Do you suspect this to be a cause for problems?
0 -
if you check the method from amqplib you’ll see that line is happening only when write takes too long but it doesn’t throws exception. I suspect that RabbitMQ flow control comes into play and delays the write (https://www.rabbitmq.com/flow-control.html) which can reach the timeout limit on the client side, thus the timeout exception occurs. I don’t know yet why the application is stalling after the exception occurs.
0 -
I see self also yesterdy this read_write_timeout... also for me is strange 130... I suppose that it is related to worker timeout set to 1 min..
But i would not change this value without know which side-effects can be happend..
there is also a keep_alive conf that maybe can speed up a little bit the process, but also not sure what happend...Here is just missing a docu from Spryker about the configuration possiblities of some process/service..
0 -
We have a holiday event today, but i asked whether there is a subject matter expert available that we can pull into this thread
0 -
ops... Corona @tom.lehner 😉
0 -
dont worry, its a safe and virtual
0 -
🙂 good.. without u and spryker for 2 weeks it will be not cool 🙂
0 -
We try to understand what is happening and have an educated decision when it comes to production configuration.
0 -
@UNTTWV4JK did u tried already to downgrade the chunk_size? Do u have the same problem or not?
Can i ask how is configured your chunk_size for storage?
How many events?I can say that in my case i had ca. 2,7M Events in sync.storage.product and with Chunk of 50K i had the problem, with chunk of 20K not more... And what i see is that it works fine when u have the export chunk size configured as same as storage chunk size (in my case also for sync search chunk size)
0 -
Not yet, but it is on our list.
0 -
i would try this case asap... it will work everything fine and in our case also faster...
because i think that a solution from spryker can take long time in this case... it is not simple to reproduce this case..0
Categories
- All Categories
- 42 Getting Started & Guidelines
- 7 Getting Started in the Community
- 8 Additional Resources
- 7 Community Ideas and Feedback
- 65 Spryker News
- 879 Developer Corner
- 743 Spryker Development
- 84 Spryker Dev Environment
- 360 Spryker Releases
- 3 Oryx frontend framework
- 33 Propel ORM
- 68 Community Projects
- 3 Community Ideation Board
- 30 Hackathon
- 3 PHP Bridge
- 6 Gacela Project
- 22 Job Opportunities
- 3.2K 📜 Slack Archives
- 116 Academy
- 5 Business Users
- 370 Docker
- 551 Slack General
- 2K Help
- 75 Knowledge Sharing
- 6 Random Stuff
- 4 Code Testing
- 32 Product & Business Questions
- 68 Spryker Safari Questions
- 50 Random