What are the Slack Archives?

It’s a history of our time together in the Slack Community! There’s a ton of knowledge in here, so feel free to search through the archives for a possible answer to your question.

Because this space is not active, you won’t be able to create a new post or comment here. If you have a question or want to start a discussion about something, head over to our categories and pick one to post in! You can always refer back to a post from Slack Archives if needed; just copy the link to use it as a reference..

Hey at all! In it is recomm

UPPB2H525
UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet

Hey at all! In https://documentation.spryker.com/docs/dataimporter-speed-optimization it is recommended to use common table expressions (CTE) for inserting or updating big amounts of data via data importer and there is a hint to check examples in the demo shops. However, i can not find anything in the demo shops regarding this.
Is there some more documentation for using cte in importers?

Comments

  • Alberto Reyer
    Alberto Reyer Lead Spryker Solution Architect / Technical Director Posts: 690 🪐 - Explorer

    Here is an example: https://github.com/spryker-shop/suite/blob/master/src/Pyz/Zed/DataImport/Business/Model/ProductPrice/Writer/Sql/ProductPriceSql.php

    But they are not really optimized, instead of having a subset for update and one for insert and upsert (on conflict (...) do update) in combination with COALESECE(table.id_field, nextval(id_sequence)) will give your double the speed in most cases were your CTE execeds a few KB (depends on postgres settings), because if the CTE is bigger than the configured amount it will be persisted to disc in between which slows down the whole process).

  • UPPB2H525
    UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet

    Perfect! Thank you very much!

  • UPPB2H525
    UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet
  • Alberto Reyer
    Alberto Reyer Lead Spryker Solution Architect / Technical Director Posts: 690 🪐 - Explorer

    @UPPB2H525 Feel free to write me a PN (or even better ask here), I did a lot of work in this area in our project to speed up the import process (~300k articles, 1200 categories, 4 category trees, including all data are imported in <4h since we optimized the importers).

    @UJN2JRU4F Might be able to provide you the video capture of one of the user group meetups last year were I gave a few insides how we improved our importers.

  • need to check whether we already cleaned it up

  • UPPB2H525
    UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet

    @UL6DGRULR wow, thank you i appreciate this very much! I will get back to you if I face any problems!

  • USZ0XG6SK
    USZ0XG6SK Posts: 111 🧑🏻‍🚀 - Cadet

    waiting for this video for quite some time now 😉

  • giovanni.piemontese
    giovanni.piemontese Technical Lead @ Löffelhardt Spryker Solution Partner Posts: 871 🧑🏻‍🚀 - Cadet

    Hi, we have import for ~2,8Mio Product Abstract, ~2,8Mio Product Concrete, ~2,8Mio SalesUnit, ~4Mio Product Image, ~2,8Mio Product Quantity, ~1000 Categories, ~3Mio Product Stock etc etc... and with CTE (PostgreSql) we get a whole import in DB in circa. 6Std (Insert, Update in less than 30 mins 😉 ).

    There is a lot of things to optimize to get this time.
    And for all regarding the events that importer have to trigger (only if necessary, otherwise the shop is always on P+S that has no sense).

    If u need some tips or help just write/ask here (not PN -> some information should be shared with everyone). I will try to provide u any help that u need.

  • giovanni.piemontese
    giovanni.piemontese Technical Lead @ Löffelhardt Spryker Solution Partner Posts: 871 🧑🏻‍🚀 - Cadet
    because if the CTE is bigger than the configured amount it will be persisted to disc in between which slows down the whole process).
    

    @UL6DGRULR how did u calculated this value? or better how can i check this case? Thank u.

  • Alberto Reyer
    Alberto Reyer Lead Spryker Solution Architect / Technical Director Posts: 690 🪐 - Explorer

    We didn’t calculated it, in a previous job one of our BI team told me that big CTE’s comes with that cost, so I just tried with this simple trick if it will speed up the importer and it did.
    https://hakibenita.com/be-careful-with-cte-in-postgre-sql

    But it seems that this has changed with Postgres >= 11 (this is also new to me)

  • giovanni.piemontese
    giovanni.piemontese Technical Lead @ Löffelhardt Spryker Solution Partner Posts: 871 🧑🏻‍🚀 - Cadet

    Thank you for the reply. Ok we have psql 12 and as seen in >11 they have smth optimised

  • UPPB2H525
    UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet

    Hey @UL6DGRULR! Do you have any experience regarding CTE with MariaDB 10.5 in Spryker?

  • Alberto Reyer
    Alberto Reyer Lead Spryker Solution Architect / Technical Director Posts: 690 🪐 - Explorer

    Sorry I missed your question as I was on vacation. But sadly I haven’t had any contact with CTE’s on MariaDB so far as we use Postgres in our project.

  • UPPB2H525
    UPPB2H525 Posts: 51 🧑🏻‍🚀 - Cadet

    Thank you