diff --git a/README.md b/README.md index 8cefccf..7ad12cb 100644 --- a/README.md +++ b/README.md @@ -3,4 +3,4 @@ Wutta Framework for data import/export and real-time sync -See docs at https://rattailproject.org/docs/wuttasync/ +See docs at https://docs.wuttaproject.org/wuttasync/ diff --git a/docs/conf.py b/docs/conf.py index 0c78efc..de9af3d 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -30,6 +30,7 @@ exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] intersphinx_mapping = { 'python': ('https://docs.python.org/3/', None), + 'rattail-manual': ('https://docs.wuttaproject.org/rattail-manual/', None), 'wuttjamaican': ('https://docs.wuttaproject.org/wuttjamaican/', None), } diff --git a/docs/index.rst b/docs/index.rst index ea00f77..2173f4e 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,22 +2,51 @@ WuttaSync ========= -This package adds data import/export and real-time sync utilities for -the `Wutta Framework `_. +This provides a "batteries included" way to handle data sync between +arbitrary source and target. -*(NB. the real-time sync has not been added yet.)* +This builds / depends on :doc:`WuttJamaican `, for +sake of a common :term:`config object` and :term:`handler` interface. +It was originally designed for import to / export from the :term:`app +database` but **both** the source and target can be "anything" - +e.g. CSV or Excel file, cloud API, another DB. -The primary use cases in mind are: +The basic idea is as follows: -* keep operational data in sync between various business systems -* import data from user-specified file -* export to file +* read a data set from "source" +* read corresonding data from "target" +* compare the two data sets +* where they differ, create/update/delete records on the target -This isn't really meant to replace typical ETL tools; it is smaller -scale and (hopefully) more flexible. +Although in some cases (e.g. export to CSV) the target has no +meaningful data so all source records are "created" on / written to +the target. -While it of course supports import/export to/from the Wutta :term:`app -database`, it may be used for any "source → target" data flow. +.. note:: + + You may already have guessed, that this approach may not work for + "big data" - and indeed, it is designed for "small" data sets, + ideally 500K records or smaller. It reads both (source/target) + data sets into memory so that is the limiting factor. + + You can work around this to some extent, by limiting the data sets + to a particular date range (or other "partitionable" aspect of the + data), and only syncing that portion. + + However this is not meant to be an ETL engine involving a data + lake/warehouse. It is for more "practical" concerns where some + disparate "systems" must be kept in sync, or basic import from / + export to file. + +The general "source → target" concept can be used for both import and +export, since "everything is an import" from the target's perspective. + +In addition to the import/export framework proper, a CLI framework is +also provided. + +A "real-time sync" framework is also (eventually) planned, similar to +the one developed in the Rattail Project; +cf. :doc:`rattail-manual:data/sync/index`. .. toctree:: diff --git a/src/wuttasync/importing/base.py b/src/wuttasync/importing/base.py index 6e06cfb..a0bc070 100644 --- a/src/wuttasync/importing/base.py +++ b/src/wuttasync/importing/base.py @@ -563,6 +563,7 @@ class Importer: :returns: List of target records which were deleted. """ + model_title = self.get_model_title() deleted = [] changes = changes or 0 @@ -577,6 +578,7 @@ class Importer: obj = cached['object'] # delete target object + log.debug("deleting %s %s: %s", model_title, key, obj) if self.delete_target_object(obj): deleted.append((obj, cached['data']))