diff --git a/docs/conf.py b/docs/conf.py index 0c78efc..de9af3d 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -30,6 +30,7 @@ exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] intersphinx_mapping = { 'python': ('https://docs.python.org/3/', None), + 'rattail-manual': ('https://docs.wuttaproject.org/rattail-manual/', None), 'wuttjamaican': ('https://docs.wuttaproject.org/wuttjamaican/', None), } diff --git a/docs/index.rst b/docs/index.rst index ea00f77..2173f4e 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,22 +2,51 @@ WuttaSync ========= -This package adds data import/export and real-time sync utilities for -the `Wutta Framework `_. +This provides a "batteries included" way to handle data sync between +arbitrary source and target. -*(NB. the real-time sync has not been added yet.)* +This builds / depends on :doc:`WuttJamaican `, for +sake of a common :term:`config object` and :term:`handler` interface. +It was originally designed for import to / export from the :term:`app +database` but **both** the source and target can be "anything" - +e.g. CSV or Excel file, cloud API, another DB. -The primary use cases in mind are: +The basic idea is as follows: -* keep operational data in sync between various business systems -* import data from user-specified file -* export to file +* read a data set from "source" +* read corresonding data from "target" +* compare the two data sets +* where they differ, create/update/delete records on the target -This isn't really meant to replace typical ETL tools; it is smaller -scale and (hopefully) more flexible. +Although in some cases (e.g. export to CSV) the target has no +meaningful data so all source records are "created" on / written to +the target. -While it of course supports import/export to/from the Wutta :term:`app -database`, it may be used for any "source → target" data flow. +.. note:: + + You may already have guessed, that this approach may not work for + "big data" - and indeed, it is designed for "small" data sets, + ideally 500K records or smaller. It reads both (source/target) + data sets into memory so that is the limiting factor. + + You can work around this to some extent, by limiting the data sets + to a particular date range (or other "partitionable" aspect of the + data), and only syncing that portion. + + However this is not meant to be an ETL engine involving a data + lake/warehouse. It is for more "practical" concerns where some + disparate "systems" must be kept in sync, or basic import from / + export to file. + +The general "source → target" concept can be used for both import and +export, since "everything is an import" from the target's perspective. + +In addition to the import/export framework proper, a CLI framework is +also provided. + +A "real-time sync" framework is also (eventually) planned, similar to +the one developed in the Rattail Project; +cf. :doc:`rattail-manual:data/sync/index`. .. toctree::