Skip to content

Jupyter notebook crashes when using dam_lev with transpose_costs #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BobbyClouser opened this issue Nov 10, 2018 · 5 comments · May be fixed by #20
Open

Jupyter notebook crashes when using dam_lev with transpose_costs #16

BobbyClouser opened this issue Nov 10, 2018 · 5 comments · May be fixed by #20

Comments

@BobbyClouser
Copy link

I'm using dam_lev in a jupyter notebook (5.4.0). Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]. My OS is Windows 10. Running the code below, I get the error:

Process finished with exit code -1073741819 (0xC0000005).

I looked this up and it is an access violation (memory?). The code works fine in Linux and if I don't use transpose_costs it runs in windows also. I've checked that I have all of the required versions of numpy and cython.

Would you suggest anything? Build it myself? Use it in cython?

Thanks,
Bob

=============================================================
import numpy as np
from weighted_levenshtein import lev, osa, dam_lev

ins_costs = np.ones(128, dtype=np.float64)
del_costs = np.ones(128, dtype=np.float64)
sub_costs = np.ones((128, 128), dtype=np.float64)
tp_costs = np.ones((128, 128), dtype=np.float64)

insert costs that should be nearly free

ins_costs[ord('-')] = 0.1
ins_costs[ord('%')] = 0.1
ins_costs[ord(' ')] = 0.1
ins_costs[ord('.')] = 0.1
ins_costs[ord('/')] = 0.1
ins_costs[ord('#')] = 0.1
ins_costs[ord('&')] = 0.1
ins_costs[ord('(')] = 0.1
ins_costs[ord(')')] = 0.1
ins_costs[ord('+')] = 0.1
ins_costs[ord('?')] = 0.1
ins_costs[ord(',')] = 0.1
ins_costs[ord("'")] = 0.1

insert costs that should be nearly free

del_costs[ord('-')] = 0.1
del_costs[ord('%')] = 0.1
del_costs[ord(' ')] = 0.1
del_costs[ord('.')] = 0.1
del_costs[ord('/')] = 0.1
del_costs[ord('#')] = 0.1
del_costs[ord('&')] = 0.1
del_costs[ord('(')] = 0.1
del_costs[ord(')')] = 0.1
del_costs[ord('+')] = 0.1
del_costs[ord('?')] = 0.1
del_costs[ord(',')] = 0.1
del_costs[ord("'")] = 0.1

substitutions that should cost less than 1

sub_costs[ord('C'), ord('S')] = 0.5
sub_costs[ord('S'), ord('C')] = 0.5

sub_costs[ord('O'), ord('0')] = 0.1
sub_costs[ord('0'), ord('O')] = 0.1

transpositions that should cost less than 1

tp_costs[ord('I'), ord('E')] = 0.1
tp_costs[ord('E'), ord('I')] = 0.1

tp_costs[ord('A'), ord('E')] = 0.2
tp_costs[ord('E'), ord('A')] = 0.2

print(dam_lev('ABNANA', 'BANANA', transpose_costs=tp_costs,
substitute_costs=sub_costs,
insert_costs=ins_costs,
delete_costs=del_costs))

@taoxinyi
Copy link

taoxinyi commented Jan 7, 2019

I have the same problem

@RevolutionTech
Copy link
Contributor

weighted-levenshtein was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.

I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄

@LEFTazs
Copy link

LEFTazs commented Feb 3, 2020

weighted-levenshtein was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.

I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄

The problem might be with the line endings. Linux uses \n, while Windows uses \r\n.
@RevolutionTech Are line endings used in any way in the Damerau code logic?

@LEFTazs LEFTazs linked a pull request Feb 3, 2020 that will close this issue
@LEFTazs
Copy link

LEFTazs commented Feb 3, 2020

This should solve it. The problem was caused by negative indexing which caused the memory error on Windows. I presume this didn't cause a crash on Linux and that's why it could work?
Nevertheless, I didn't test it on Linux, hopefully it works there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants