Hello, in this tutorial, the import of data using a queue instead of a file is described. But it see

UPWG9AYH2 · November 2021

Hello,
in this tutorial, the import of data using a queue instead of a file is described. But it seems this just describes the usage of the data importers located in the data-import module … how to use a queue based import along with plugin-based importers like category data?

Best regards

https://docs.spryker.com/docs/scos/dev/tutorials-and-howtos/advanced-tutorials/tutori[…]ng-a-default-data-importer-with-the-queue-data-importer.html

Andriy Netseplyayev · November 2021

Hey Ingo, what exactly do you want to achieve?
Maybe it would also worth checking this article: https://docs.spryker.com/docs/scos/dev/data-import/202108.0/importing-data-with-the-queue-data-importer.html

UPWG9AYH2 · November 2021

Hi Andriy. I want to switch from file based import to queue based import and followed the tutorials. But for me it seems the queue approach at least as described does not work for the other import modules like category-data-import since they are loaded/executed via plugin … maybe i miss something, didn’t deep dived into the topic, data import seems huge^^ 🙂

Andriy Netseplyayev · November 2021

The idea of the tutorial (and one of the possible approaches) is:
1. You introduce data importers that read from CSV file and write into the queues;
2. You introduce data importers that read from queues and write to DB.
From the advantages - importers give you all the required infrastructure for organising the process - readers, writers, steps in between, etc. But you have to be careful and organise the code properly, because DataImport module (configs, factory, dependency provide) may grow pretty fast and one can get lost there quite easily.

UPWG9AYH2 · November 2021

Yeah, i already experience its quite complex. Maybe i have to dive a bit deeper first. Thanks so far

UPWG9AYH2 · November 2021

I dived a bit deeper and there are some things that i noticed. Priority for me is the reusability of the spryker import functionality as much as possible. Best would be to just thread the queue message as if it would read from a csv file line by line. I tried first with category data import and found some difficulties.
One thing is that category data via csv is originally written to db using the “CategoryWriterStep” which is just another step on import step pipeline … but when creating a QueueDataImporter, some kind of DataSetWriter is required, which means for me copy the whole logic into the DataSetWriter …
Do i think too complicated here?
What i basically did was to introduce another “importFromQueue” facade entry in the category-data-import module and created the queue importer … but as i said, the step and the datasetwriter seems not compatible which means effort for eevery following assets too.
Any idea?
Best regards and have a nice weekend

UPWG9AYH2 · November 2021

Addition: I also tried to use the CategoryWriterStep as composition (inject to DataSetWriter) but it seems not that easy since in the writer there are dependencies regarding the trigger of events etc.

Andriy Netseplyayev · November 2021

but when creating a QueueDataImporter, some kind of DataSetWriter is required

Correct, but you will be writing to the RabbitMq, not the DB.
Which means, you have 2 importers per entity:
1. Read from CSV / write to Queue
2. Read from Queue / write to DB

UPWG9AYH2 · November 2021

Hi Andriy,
Point 1. is not applicable to us, only point 2 in this case.
The idea is to bring our import data to a format, spryker can understand by default. Til now I thought this is a CSV like row like for the default data import just in a queue suitable format like json.
Bringing the message into this format should be the most effort on our side. But when there is also effort to rewrite the writing logic from queue to db, the whole plan is not affordable anymore. Only transforming our data to a format spryker can understand and then adapt the writer to process this messages correct is some kind of needless double effort. Instead we could adapt the writer so that its accepting messages in our origin format right from the beginning.
But then we will also “loose” the opportunity to use any logic from spryker import. So, i am really not sure which way might fit best. Are there any insights from other projects?
Best regards

Andriy Netseplyayev · November 2021

Point 1. is not applicable to us, only point 2 in this case.

it’s not 1 or 2 it’s 1 and 2.

there is also effort to rewrite the writing logic from queue to db

it’s not “rewriting”, it’s “extending”:
you have now A -> C (files to DB). What you need instead - is to change it to be A -> B.1 and then B.2 -> C. That would be the approach, based on importers. You reuse A and C and introduce B1 - queue writer and B2 - queue reader

UPWG9AYH2 · November 2021

Sorry Andriy, i am not sure if i understood it right.

Okay, A are the files. But we dont have files. Instead we have another queue that has messages in a PIM related format. So assuming this message is our A, the we have to put A -> B1 (get the PIM message and write/transform it to the spryker import queues) and B2 -> C which writes data from this queues to db …
But however, we cannot use the data writing logic that the csv writer would use when taking the data directly from a csv file, correct? My hope was to not have much effort on this side when writing to db.

Andriy Netseplyayev · November 2021

Ah, I didn’t understand that

Instead we have another queue that has messages in a PIM related format

that changes everything! Assuming that mapping from PIM format to Spryker one (like CSV) would be a simple mapping: let your PIM send data to Spryker’s RabbitMQ and do the “transformation part” on the Spryker side. You would have only one import process - read-from-queue - write to DB. You need to create a Queue reader and take care of the data transformation (add additional steps between reader and writer) so that you would fully re-use the writer

UPWG9AYH2 · November 2021

Yeah, so the transformation is not that easy mapping in our case, because one message contains multiple infos for spryker … but importing from a spryker queue means basically: consume the message, import the related entity and ack the message (so its gone after this) … but since we have multiple infos per origin message we either have to reuse the same message on another importers or first split the origin message to spryker entity based messages … or the third option, use multiple writers for one message which means higher dependency/coupling for a single message (harder to change if the origin message changes in future)

UPWG9AYH2 · November 2021

Okay, the last idea just make sense if the message would just cover one domain … for example a message that just contains category infos … and multiple writers on the spryker side would save multiple category related entities … but in our case, a origin PIM message can contain multiple infos for different domains like products, categories, availability etc … so i think we are forced to transform the messages to multiple spryker messages first

Andriy Netseplyayev · November 2021

yes, but you’re thinking already in the right direction, indeed. The decision of splitting or not splitting the message - and where should that happen - is then on you. But technically you can go with and w.o. splitting. I would think here of re-using existing writers if possible, performance and simplicity of the solution 🙂 It’s a triangle, but how will it look like - you need to decide

UPWG9AYH2 · November 2021

Thanks a lot so far for your input @UKJSE6T47. I’ll maybe come back for any further questions 😉 Best regards

UPWG9AYH2 · November 2021

One question at last: Is there any reason why there are some importers are in the import module and others are loaded via plugins from other modules? Or there also a pattern here?

UPWG9AYH2 · November 2021

I could think of that the import module itself gets very huge and the trend is to outsource the import to separate modules …

Andriy Netseplyayev · November 2021

that’s correct, Models you can see there - are from old times. Later on, Spryker started creating *-data-import modules to extend data importers in more modular way

Hello, in this tutorial, the import of data using a queue instead of a file is described. But it see

Comments

Categories