Products grid inconsistent when filtered by "Vendor (any)" #2

Open
opened 2025-02-16 13:31:25 -06:00 by lance · 1 comment
Owner

I noticed this years ago but finally tried to solve this today, and no luck.

In the Products grid if you filter by e.g. "Vendor (any)" field contains "UNFI" then odds are reasonable this could, on the SQL level, return multiple matches per single product. SQLAlchemy is smart enough to "de-duplicate" these results such that a given product record will only appear once in the results. But unfortunately that is ORM magic and the underlying SQL statement of course does still return multiple rows, and hence the simple query.count() will return this "literal" number of rows.

Result of this is, the grid may say there were 20K results (SQL rows) but in fact there may be only 15K products. And the grid may display only say 18 rows in a page whereas it says it is displaying rows 1-20 or whatever.

I had sort of assumed that I was doing something wrong with my joins etc. But when I was not able to figure that out, I searched online and arrived at the real reason (de-duplication mentioned above).

https://docs.sqlalchemy.org/en/20/faq/sessions.html#faq-query-deduplicating

Nice to see this is a known problem and not my poor code. But also sucks that it's not clear how to fix the Products grid here. Maybe the query could be smarter and use a sub-select statement to prevent the multiple rows per product within raw SQL? Will look another day..

In the meantime the good news is, I believe the grid behavior is still "determinate" in that if you page thru all the data, you should see it all. Just the overall and per-page counts will be inaccurate.

- migrated from https://redmine.rattailproject.org/issues/23 - created 2023-09-28 I noticed this years ago but finally tried to solve this today, and no luck. In the Products grid if you filter by e.g. "Vendor (any)" field contains "UNFI" then odds are reasonable this could, on the SQL level, return multiple matches per single product. SQLAlchemy is smart enough to "de-duplicate" these results such that a given product record will only appear once in the results. But unfortunately that is ORM magic and the underlying SQL statement of course does still return multiple rows, and hence the simple `query.count()` will return this "literal" number of rows. Result of this is, the grid may say there were 20K results (SQL rows) but in fact there may be only 15K products. And the grid may display only say 18 rows in a page whereas it *says* it is displaying rows 1-20 or whatever. I had sort of assumed that I was doing something wrong with my joins etc. But when I was not able to figure that out, I searched online and arrived at the real reason (de-duplication mentioned above). https://docs.sqlalchemy.org/en/20/faq/sessions.html#faq-query-deduplicating Nice to see this is a known problem and not my poor code. But also sucks that it's not clear how to fix the Products grid here. Maybe the query could be smarter and use a sub-select statement to prevent the multiple rows per product within raw SQL? Will look another day.. In the meantime the good news is, I believe the grid behavior is still "determinate" in that if you page thru all the data, you should see it all. Just the overall and per-page counts will be inaccurate.
Author
Owner
cf. https://forgejo.wuttaproject.org/rattail/tailbone/commit/0ee67251889e84435fb0348179ecddfa922c1957
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: rattail/tailbone#2
No description provided.