Skip to content

False positives when searching dates #582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
g-kozulis opened this issue Nov 1, 2019 · 4 comments
Open

False positives when searching dates #582

g-kozulis opened this issue Nov 1, 2019 · 4 comments

Comments

@g-kozulis
Copy link

OS: Windows 10.0.17763.805
dateparser version: 0.7.2

When using the search_dates() function some numerical and punctuation mark combinations that don't resemble any date format I've ever seen get picked up as dates.

To reproduce run the following code and replace <false positive> with any one of the following:

  • 8: 100M2_
  • 100M
  • 10,00 2
  • 19,60 5
  • 73 20
from dateparser.search import search_dates

search_dates("The following isn't a correct date <false positive>")
@murray-minito
Copy link

Same here on OSX 10.15 with version 0.7.2

Here are some examples of results that should not be dates

search_dates(text,languages=['en'], settings={'STRICT_PARSING': True,'PREFER_DATES_FROM': 'past','DATE_ORDER': 'DMY'}, add_detected_language=True)

-- Clearly wrong
('32° 34’S', datetime.datetime(2013, 10, 16, 23, 59, 7), 'en')
('123°', datetime.datetime(1900, 1, 1, 1, 2, 3), 'en')
('6005', datetime.datetime(2000, 6, 5, 0, 0)
('000', datetime.datetime(1900, 1, 1, 0, 0), 'en')
('of 629', datetime.datetime(1900, 1, 1, 6, 2, 9), 'en')
('>21', datetime.datetime(1900, 1, 1, 2, 1), 'en')

-- I can kind of see where it is getting this but I think it is wrong to do it
('3533', datetime.datetime(2033, 5, 3, 0, 0), 'en')

-- I have lots of numbers in these docs. It should not pick them up and 'make' a date from them
('538400', datetime.datetime(8400, 3, 5, 0, 0), 'en')

@noviluni
Copy link
Collaborator

FYI some cases will be fixed in the next version (after merging this: #786)

@gavishpoddar
Copy link
Contributor

Seems #786 has been merged, can you please close this issue

@Gallaecio
Copy link
Member

Does #786 fix all cases reported here? Otherwise, it makes sense to keep this open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants