Skip to content
This repository has been archived by the owner on May 22, 2024. It is now read-only.

Improve Narrative Section Extraction #69

Open
cragwolfe opened this issue Jan 12, 2023 · 0 comments
Open

Improve Narrative Section Extraction #69

cragwolfe opened this issue Jan 12, 2023 · 0 comments
Labels
help wanted Extra attention is needed python Pull requests that update Python code

Comments

@cragwolfe
Copy link
Contributor

Right now, a SECSection regex is used to identify a TOC section in get_section_narrative. That generally works pretty well. The matching TOC title text is then used to look for the section in the content but rather than sticking with the original regex, a more lenient match condition is ultimately used in 10-K’s and 10-Q’s with match_10k_toc_title_to_section. The better thing to do is likely stick with the original matching regex.

The lenient post-TOC match is why the EHC test fails for the BUSINESS section, and may be the reason for other failures as well.

Definition of Done

  • Updated section extraction logic such that fewer tests are marked as xfailed, in particular the EHC case mentioned above.
@cragwolfe cragwolfe added help wanted Extra attention is needed python Pull requests that update Python code labels Jan 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed python Pull requests that update Python code
Projects
None yet
Development

No branches or pull requests

1 participant