What are the Slack Archives?

It’s a history of our time together in the Slack Community! There’s a ton of knowledge in here, so feel free to search through the archives for a possible answer to your question.

Because this space is not active, you won’t be able to create a new post or comment here. If you have a question or want to start a discussion about something, head over to our categories and pick one to post in! You can always refer back to a post from Slack Archives if needed; just copy the link to use it as a reference..

Hello, I have this philosophical conundrum: I need to iterate (rather I am fixing a code of someone

U02H2U2LQ0L
U02H2U2LQ0L Posts: 9 🧑🏻‍🚀 - Cadet
edited February 2022 in Help

Hello, I have this philosophical conundrum:

I need to iterate (rather I am fixing a code of someone else's) through all abstract products and collect their data into a file from time to time. I need price information, imageset information and so on. Currently it is done so that it iterates through ProductAbstractManager where it executes functions like loadLocalizedAttributes which calls database additionally, also read observers are called which also call database additionally, and then there's also as well event triggers for a every time that an abstract product is being Read. This sounds like quite a large amount of overhead. I thought that maybe this could be done that all the necessary things can be called in one single query in a repository with necessary joins? It is more work to do it this way but maybe this can potentially save lots of processing time? I wonder what is your stance on things like this.
I mean obviously doing it with one single query is faster, less memory intensive (important in this current task), however I am interested in knowing what is the spryker way problems like this. Like how it is generally done in spryker projects, also relying on ProductAbstractManager is pretty convenient since those ReadObservers call necessary facades to expand the product abstract transfer where this additional logic is executed

Also could anyone explain the usefulness of triggering read events? I have never encountered a case where it would be useful to know that a row in database has been read

Comments

  • U01BZ7Q3XRV
    U01BZ7Q3XRV Posts: 148 🧑🏻‍🚀 - Cadet

    it depends a bit on the data that has to be gathered I think, but in general I would rather create a new table for the contents of the file and fill it by listening to product change events. Or (depending on the data you need) maybe read it from product storage client instead of db. Or a combination of both. With the additional table you could also do something like a cronjob that updates a bunch of products per run (to save memory per run) instead of or in addition of the event based approach

  • Alberto Reyer
    Alberto Reyer Posts: 690 🪐 - Explorer

    We had similiar requirements for building up shopping feeds for google and other plattforms and ended up by using a separate table to aggregate the data into via SQL queries and transforming those aggregated data into shopping feeds by reading from the DB in batches and writing those into the related file.
    We do this for all products once per night, as the query to aggregate the data, based on common table expressions (CTE) and upserts, is very fast for ~350k product concretes (~2 minutes).
    Downside is that the CTE we use is very complicated, especially the part to parse and transform product attributes, and only works in Postgres. With MySQL/MariaDB there might be issues with used functionalities that only exist in Postgres.

  • U01T075RRHD
    U01T075RRHD Posts: 118 🧑🏻‍🚀 - Cadet
    edited February 2022

    Hi Paulius, IMHO, the multiple queries to collect all product data are necessary to avoid coupling. Some parts of product information (e.g. images) are not necessarily essential and thus dependencies like this in Spryker are usually decoupled using inversion of control mechanisms like plugin stacks.

  • U01T075RRHD
    U01T075RRHD Posts: 118 🧑🏻‍🚀 - Cadet
    edited February 2022

    The *_storage database tables are a specific view on the system state for a specific use-case (Yves). This is similar to the projections mechanism you find in event sourcing. Now, sometimes it can be sufficient to re-use the already aggregated data from the storage tables but it would be a safer bet to create your own projection for your specific use-case (like already suggested).

  • U02H2U2LQ0L
    U02H2U2LQ0L Posts: 9 🧑🏻‍🚀 - Cadet

    Thanks, yeah Yesterday I realized that the best way would be to aggregate the data on a per-item basis ant store it in storage table, then retrieve and do whatever I need with the data. It was previously done that it tried to query all the products at once, process them, and store in single database row. OF course this kind of functionality doesn't work even on half the products, so I'm doing similarly to what you suggested