Skip to content

[12.0][REF] Generate intermediary clean CSV files #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 42 commits into
base: 12.0
Choose a base branch
from

Conversation

clementmbr
Copy link
Member

Separate extract_registers_spec() and extract_fields_spec() from generate.py in a new file to be launched separately and return intermediary CSV files.

@clementmbr clementmbr marked this pull request as draft May 26, 2020 21:25
Example : EFD ICMS IPI pdf Outubro 2019 p20-21
Fixed by deleting the break breaking the for loop parsing the raw CSV
files when a register's row seems empty

The commit add other small cleaning.
And other refactoring :
- Clean the register's header strings
- Extract the in_out register's requirements from extract_fields_spec()
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from f0e3540 to 0172915 Compare June 3, 2020 19:02
Refactor to build 3 types of csv in 4 steps :

1. extract **raw CSV** from pdf with camelot

2. build **"accurate" CSV** from raw CSV
    - 1 CSV for each module
    - 1 header for each CSV
    - 1 line for each field
    - No field's cells modification (neither loss of information),
    only mapping which cell is under which CSV's column
    - For each field, adding the field's register name and the
    page number to check back in the pdf
    - option to apply "camelot_row_patches" if necessary

3. build a **"usable" python dictionary** from the "authentic CSV"
    - with "interpreted" values like "required",  "type" and "int_size"

4. build **"usable" CSV** or JSON or whatever from the dict
with **no fields modification**, just built from the dict values

And from these "usabale" dictionnaries :
5. build **"odoo-usable" CSV** _from the dict_ with additional columns
to create Odoo objects from the CSV lines.
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from 0172915 to 34df33a Compare June 4, 2020 18:31
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from 988ee8e to 66e0ae4 Compare June 7, 2020 00:55
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from e9979ca to 0bb9343 Compare June 8, 2020 00:52
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from 3a69655 to 0eb1bdb Compare June 8, 2020 01:56
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from 0eb1bdb to 3121507 Compare June 8, 2020 12:18
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from fa23706 to 0af1165 Compare June 12, 2020 15:25
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from d37707d to aedae11 Compare June 12, 2020 21:30
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from aedae11 to a7a2e5d Compare June 12, 2020 21:31
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from d102b58 to 5ba5798 Compare June 15, 2020 14:49
@clementmbr clementmbr force-pushed the 12.0-ref-extract-register branch from 8c644f5 to ff53ac1 Compare June 16, 2020 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant