Skip to content

Worked#28

Open
akankshahu wants to merge 3 commits intoosdldbt:mainfrom
akankshahu:worked
Open

Worked#28
akankshahu wants to merge 3 commits intoosdldbt:mainfrom
akankshahu:worked

Conversation

@akankshahu
Copy link

@akankshahu akankshahu commented Jan 22, 2026

Title: Fix inconsistent data loading (Fixes #24)

Description:
This PR fixes issue #24 where data was missing at high customer counts, causing foreign key failures.

The custom loader was incorrectly trying to read from a write-only pipe connected to psql. This caused stream errors that resulted in dropped rows. I have removed the invalid fgetc calls from the loader headers to ensure all data is flushed correctly.

Copilot AI review requested due to automatic review settings January 22, 2026 08:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes attempts to read from the psql pipe (FILE* p) after writes, which is invalid because the pipe is opened write-only via popen(..., "w").

Changes:

  • Commented out while (fgetc(p) != EOF) ; synchronization/drain loops that attempted to read from a write-only pipe.
  • Added a clarifying comment in the base loader explaining why reading is not possible.

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/include/custom/pgsql/pgloader.h Stops attempting to read from the write-only psql pipe during connect/copy/finish steps.
src/include/custom/pgsql/PGSQLZipCodeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLWatchListLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLWatchItemLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLTradeTypeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLTradeRequestLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLTradeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLTradeHistoryLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLTaxrateLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLStatusTypeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLSettlementLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLSecurityLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLSectorLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLNewsXRefLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLNewsItemLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLLastTradeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLIndustryLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLHoldingSummaryLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLHoldingLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLHoldingHistoryLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLFinancialLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLExchangeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLDailyMarketLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCustomerTaxrateLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCustomerLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCustomerAccountLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCompanyLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCompanyCompetitorLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCommissionRateLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLChargeLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLCashTransactionLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLBrokerLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLAddressLoad.h Removes per-row pipe read attempt after writing a record.
src/include/custom/pgsql/PGSQLAccountPermissionLoad.h Removes per-row pipe read attempt after writing a record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@markwkm
Copy link
Contributor

markwkm commented Jan 26, 2026

In an ideal world we would hope we could eventually add proper code to catch errors. :) I still haven't put enough though into it, but I wonder if the piping should eventually be redone...

I'm going to try to make time for this soon to double check. :)

@akankshahu
Copy link
Author

akankshahu commented Jan 27, 2026

Thanks for the feedback, @markwkm! I completely agree. Removing the loops fixes the immediate data loss/corruption issue, but it definitely leaves us without a way to catch errors if the pipe fails.

@akankshahu
Copy link
Author

@markwkm I tried making it more efficient .Could you please review it?

@markwkm
Copy link
Contributor

markwkm commented Jan 29, 2026

@markwkm I tried making it more efficient .Could you please review it?

I'll hopefully get to try it out at a customer size of 100,000 soon...

I think the rename of the class to CFlatLoader isn't quite complete, the compiler is complaining, but nevertheless let's keep the changes to a minimum for now. "FlatLoader" isn't quite accurate too, it's really a "direct load", but in any event, I'm not sure a rename is necessary at this time. :)

@markwkm
Copy link
Contributor

markwkm commented Jan 30, 2026

I don't think the fgetc() calls are the root cause of the problem. Building for 100,000 customers still causes issues like this when creating the foreign key constraints:

ERROR:  insert or update on table "customer_account" violates foreign key constraint "fk_customer_account_b"
DETAIL:  Key (ca_b_id)=(4300000001) is not present in table "broker".

I'm actually worried it's the piping to psql. I think the path worth trying is to actually use libpq.

@markwkm
Copy link
Contributor

markwkm commented Feb 4, 2026

This patch might help more easily identify when errors happen with the build script. markwkm@04b7d85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inconsistent data with pgsql custom loader

2 participants