[12.0][REF] Generate intermediary clean CSV files #2

clementmbr · 2020-05-26T21:12:24Z

Separate extract_registers_spec() and extract_fields_spec() from generate.py in a new file to be launched separately and return intermediary CSV files.

Example : EFD ICMS IPI pdf Outubro 2019 p20-21 Fixed by deleting the break breaking the for loop parsing the raw CSV files when a register's row seems empty The commit add other small cleaning.

And other refactoring : - Clean the register's header strings - Extract the in_out register's requirements from extract_fields_spec()

Refactor to build 3 types of csv in 4 steps : 1. extract **raw CSV** from pdf with camelot 2. build **"accurate" CSV** from raw CSV - 1 CSV for each module - 1 header for each CSV - 1 line for each field - No field's cells modification (neither loss of information), only mapping which cell is under which CSV's column - For each field, adding the field's register name and the page number to check back in the pdf - option to apply "camelot_row_patches" if necessary 3. build a **"usable" python dictionary** from the "authentic CSV" - with "interpreted" values like "required", "type" and "int_size" 4. build **"usable" CSV** or JSON or whatever from the dict with **no fields modification**, just built from the dict values And from these "usabale" dictionnaries : 5. build **"odoo-usable" CSV** _from the dict_ with additional columns to create Odoo objects from the CSV lines.

In order to help hard-coding the modules headers based on 'manual' observation of all the modules headers displayed by this method 'get_all_headers'.

…fields

…properly

clementmbr added 4 commits May 26, 2020 17:37

[REF] Separate extract_register_spec() from generate.py

be8bdc7

[ADD] pre-commit

b111b8a

[ADD] setup

e6ca5f3

[REF] generate.py blacked

8919e42

clementmbr marked this pull request as draft May 26, 2020 21:25

clementmbr added 7 commits May 27, 2020 18:09

[ADD] logging in extract_csv.py

f3e2b19

[REF] rename extract_register_specs.py

8c6c89e

Pre-commit ignore build_csv.py

c03b414

[IMP] build_fields_spec_csv() create one CSV for each register

830b915

[FIX] Avoid skipping a page when a register line is splited in two

08cfc14

Example : EFD ICMS IPI pdf Outubro 2019 p20-21 Fixed by deleting the break breaking the for loop parsing the raw CSV files when a register's row seems empty The commit add other small cleaning.

[IMP] compute pdf max_pages instead of hard coding

b2a825e

[IMP] Build 1 CSV with all the register's fields

c498529

And other refactoring : - Clean the register's header strings - Extract the in_out register's requirements from extract_fields_spec()

clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from f0e3540 to 0172915 Compare June 3, 2020 19:02

clementmbr force-pushed the 12.0-ref-extract-register branch from 0172915 to 34df33a Compare June 4, 2020 18:31

clementmbr added 7 commits June 4, 2020 15:42

[FIX] update .gitignore and add modules registers/fields CSV

61cbb9c

[IMP] add get_all_headers() method

4c8955d

In order to help hard-coding the modules headers based on 'manual' observation of all the modules headers displayed by this method 'get_all_headers'.

[IMP] add method to add row patch from CSV

8336891

[IMP] update .gitignore

b5aea6a

[IMP] Add CLI --patch option with click

3ac9ff0

[IMP] update .gitignore

c0a22a0

[IMP] Build modules fields usable CSV

66e0ae4

clementmbr force-pushed the 12.0-ref-extract-register branch from 988ee8e to 66e0ae4 Compare June 7, 2020 00:55

clementmbr added 4 commits June 7, 2020 09:33

[IMP] update .gitignore

cdb8d0b

[ADD] build_json.py and [REF] extract_csv.py with click

6f499a4

[IMP] pre-commit on scripts files

aa895eb

[REF] Separate extract_sped folder from l10n_br_spec_sped

097deb7

clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from e9979ca to 0bb9343 Compare June 8, 2020 00:52

clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from 3a69655 to 0eb1bdb Compare June 8, 2020 01:56

[ADD] Add README and [FIX] get_mod_headers.py

3121507

clementmbr force-pushed the 12.0-ref-extract-register branch from 0eb1bdb to 3121507 Compare June 8, 2020 12:18

clementmbr added 6 commits June 8, 2020 15:46

[FIX] rows length in accurate_fields.csv

e565ea8

[ADD] ./compare_python-sped.py and info displaying registers with no …

96d1d73

…fields

[IMP] rename extract_sped into sped_extractor

e7b0c26

[ADD] ./download.py script in python

f7db5f0

[IMP] Add heuristic to split field code when joined with description

e57d5d9

[IMP] ./download.py with test and former years option

0af1165

clementmbr force-pushed the 12.0-ref-extract-register branch from fa23706 to 0af1165 Compare June 12, 2020 15:25

clementmbr added 2 commits June 12, 2020 18:07

[IMP] improve heuristic

8e07566

[REF] Remove 'l10n_br_sped' Odoo module

aaa45b3

clementmbr force-pushed the 12.0-ref-extract-register branch 2 times, most recently from d37707d to aedae11 Compare June 12, 2020 21:30

[IMP] Update README

a7a2e5d

clementmbr force-pushed the 12.0-ref-extract-register branch from aedae11 to a7a2e5d Compare June 12, 2020 21:31

clementmbr added 3 commits June 15, 2020 10:00

[IMP] sort MODULE_fields.csv's columns moving "desc" at the end

449d3f5

[IMP] heuristic to catch EFD_PIS_COFINS's registers

267b09c

[FIX] Change patch year

5ba5798

clementmbr force-pushed the 12.0-ref-extract-register branch from d102b58 to 5ba5798 Compare June 15, 2020 14:49

[ADD] Big refactor in order to add 'required' attribute to registers …

ff53ac1

…properly

clementmbr force-pushed the 12.0-ref-extract-register branch from 8c644f5 to ff53ac1 Compare June 16, 2020 00:38

clementmbr added 5 commits June 16, 2020 14:09

[IMP] Update installation README

057f5b0

[IMP] Download specs pdf from csv list with URL

e955d85

[REF] Specs : pdf+patch+extract in various YEAR folder

be1c365

[IMP] First commit for producing nested python-sped style JSON

593b6fb

[UPD] Readme

84d1319

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[12.0][REF] Generate intermediary clean CSV files #2

[12.0][REF] Generate intermediary clean CSV files #2

clementmbr commented May 26, 2020

[12.0][REF] Generate intermediary clean CSV files #2

Are you sure you want to change the base?

[12.0][REF] Generate intermediary clean CSV files #2

Conversation

clementmbr commented May 26, 2020