Skip to content

remove dependency to ip2asn #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
romain-fontugne opened this issue Dec 20, 2024 · 29 comments
Open

remove dependency to ip2asn #31

romain-fontugne opened this issue Dec 20, 2024 · 29 comments

Comments

@romain-fontugne
Copy link
Member

This code is currently tricky to deploy because of its dependency to ip2asn. One way to fix this is to use IYP instead.

Steps:

  • remove code related to ip2asn
  • add code to query IYP to get prefix to ASN and prefix to IXP mappings
@TejasNangru
Copy link

@romain-fontugne,
I want to try this issue.

@TejasNangru
Copy link

@romain-fontugne
while setting up the code locally and downloading dependencies, i got this error for 'py-radix':
"ERROR: Failed building wheel for py-radix
Failed to build py-radix
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (py-radix)"

@romain-fontugne
Copy link
Member Author

romain-fontugne commented Dec 24, 2024

I want to try this issue.

sure you can work on this. I can help you to query IYP, if you need help for that.

The error you have is due to a recent problem with py-radix. You can try to install it from the source, the git repo has a fix:
https://github.com/mjschultz/py-radix

@TejasNangru
Copy link

TejasNangru commented Dec 24, 2024

The error you have is due to a recent problem with py-radix. You can try to install it from the source, the git repo has a fix: https://github.com/mjschultz/py-radix

yeah i have installed py-radix from link you provided, but after that running the cmd: python3 setup.py build_ext --inplace
, i am getting this error and it requires c++ build tool, is there any Precompiled Binaries for this?

""running build_ext
Compiling raclette/tracksaggregator_cy.pyx because it changed.
[1/1] Cythonizing raclette/tracksaggregator_cy.pyx
building 'raclette.tracksaggregator_cy' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/""

@romain-fontugne
Copy link
Member Author

there is a cython module you have to compile with a C++ compiler. So yes you need that compiler

@TejasNangru
Copy link

TejasNangru commented Dec 24, 2024

While testing the functionality of the project, getting this:
"AssertionError: chunk_size should be smaller than window_size"
should i modify the value of chunk_size, if it does not create some future issues?

@romain-fontugne
Copy link
Member Author

ah yes sorry, we should update the old configuration files (especially the ones mentioned in the readme).

In production we use chunk_size = 300

@TejasNangru
Copy link

after setting chunk_size value, getting this error:
" python raclette/raclette.py -C conf/asc-start.conf
type error: cannot pickle 'apsw.Connection' object
Traceback (most recent call last):
File "C:\Users\admin\OneDrive\Desktop\A\raclette\raclette\raclette.py", line 159, in main
saver.start()
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 94, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'apsw.Connection' object

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 111, in spawn_main
new_handle = reduction.duplicate(pipe_handle,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\admin\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\reduction.py", line 79, in duplicate
return _winapi.DuplicateHandle(
^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 6] The handle is invalid"

@TejasNangru
Copy link

ig, there is some problem in the code of file: sqlitesaver.py,

@TejasNangru
Copy link

@romain-fontugne

@dpgiakatos
Copy link
Member

Hi, please try running the project on Linux instead of Windows, as the file paths are configured for a Linux environment.

@TejasNangru
Copy link

but i don't know, how to use linux
is there any other way?

@roopeshsn
Copy link
Member

but i don't know, how to use linux is there any other way?

Try WSL

@dpgiakatos
Copy link
Member

but i don't know, how to use linux is there any other way?

Here are two courses you can explore. The first is from the Linux Foundation and consists of about 60 hours of course material, covering a wide range of topics related to Linux. The second course is from freeCodeCamp and is approximately 6 hours long.

Since you are not yet familiar with the Linux environment, I recommend starting with the second course. It will provide you with a solid overview of Linux, including how to run applications and manage files. After completing the second course, I think you will be able to run this project in a Linux environment, and if you find yourself interested in Linux, you can continue with the first course for a more in-depth understanding.

Here are the links to the courses:

@TejasNangru
Copy link

but i don't know, how to use linux is there any other way?

Here are two courses you can explore. The first is from the Linux Foundation and consists of about 60 hours of course material, covering a wide range of topics related to Linux. The second course is from freeCodeCamp and is approximately 6 hours long.

Since you are not yet familiar with the Linux environment, I recommend starting with the second course. It will provide you with a solid overview of Linux, including how to run applications and manage files. After completing the second course, I think you will be able to run this project in a Linux environment, and if you find yourself interested in Linux, you can continue with the first course for a more in-depth understanding.

Here are the links to the courses:

ok sir

@TejasNangru
Copy link

@dpgiakatos @romain-fontugne I am now using WSL for linux environment, but while deploying the code locally, i am now facing this error:
"type error: name 'KAFKA_HOST' is not defined
Traceback (most recent call last):
File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/raclette.py", line 174, in main
i2a = ip2asn.ip2asn(self.ip2asn_db, self.ip2asn_ixp, self.ip2asn_kafka_topic, KAFKA_HOST)
^^^^^^^^^^
NameError: name 'KAFKA_HOST' is not defined"

I want to know if this is reason the current code is tricky to deploy?
if not, then please give the solution to this error.

@TejasNangru
Copy link

Could you please guide me on how to query IYP to get prefix to ASN and prefix to IXP mappings? I’m encountering some difficulty in understanding how to integrate it into the project as a replacement for ip2asn.

@dpgiakatos
Copy link
Member

Hi, please install Kafka locally and then define the KAFKA_HOST.

To obtain the prefix-to-ASN and prefix-to-IXP mappings, we currently use a pickle file, as I observe in the source code. My understanding is that we aim to replace this logic with IYP queries. IYP refers to our Internet Yellow Pages project, and you can refer to the corresponding documentation for guidance on querying the IYP.

Here are the Cypher queries to assist you:

Prefix to ASN:

MATCH (a:AS)-[:ORIGINATE]-(p:Prefix)
RETURN p.prefix, a.asn

Prefix to IXP:

MATCH (p:Prefix)-[:MANAGED_BY]-(i:IXP)
RETURN p.prefix, i.name

@TejasNangru
Copy link

if, dependancy on ip2asn is causing kafka issue
can i directly work on Query IYP instead of kafka?

@TejasNangru
Copy link

and does it require an id password for query IYP?

@dpgiakatos
Copy link
Member

The functionality of the code should not be changed. You need to locate the code within the ip2asn that needs to be replaced with the IYP query. From Kafka, you will retrieve the prefix, so you must use it.

It is important to understand how the code works before making any changes, as the functionality must remain the same. Therefore, first, try to run the code locally without any errors, and then work on understanding the data.

You do not need credentials to log in to the IYP; the username and password are empty.

P.S. For Kafka support you can tag the @InternetHealthReport/iij team.

@TejasNangru
Copy link

Sir, I have installed and set up the kafka now and its working,
but now i am getting this:

python3 raclette/raclette.py -C conf/asc-start.conf
type error: Invalid data stream
Traceback (most recent call last):
File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/raclette.py", line 175, in main
i2a = ip2asn.ip2asn(self.ip2asn_db, self.ip2asn_ixp, self.ip2asn_kafka_topic, KAFKA_HOST)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/lib/ip2asn.py", line 49, in init
self.rtree = pickle.load(bz2.BZ2File(db, "rb"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/bz2.py", line 155, in peek
return self._buffer.peek(n)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/_compression.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: Invalid data stream

@dpgiakatos
Copy link
Member

The error you are encountering is an OSError related to the bz2 module. This happens as the bz2 module may not be properly installed to your OS. This below code will help you verify whether the bz2 module is functioning correctly on your OS.

import pickle
import bz2

# Create a sample object and save it to a BZ2 file
data = {'key': 'value'}
with bz2.BZ2File('sample_data.bz2', 'wb') as f:
    pickle.dump(data, f)

# Read the BZ2 file
with bz2.BZ2File('sample_data.bz2', 'rb') as f:
    loaded_data = pickle.load(f)
    print(loaded_data)  # Should output: {'key': 'value'}

@TejasNangru
Copy link

TejasNangru commented Jan 9, 2025

In the original codebase, there is this value undefined.
raclette_conf_asc-start conf at master · InternetHealthReport_raclette - Google Chrome 09-01-2025 12_29_07

due to this i am getting this error:
type error: [Errno 2] No such file or directory: ''
Traceback (most recent call last):
File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/raclette.py", line 175, in main
i2a = ip2asn.ip2asn(self.ip2asn_db, self.ip2asn_ixp, self.ip2asn_kafka_topic, KAFKA_HOST)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/lib/ip2asn.py", line 53, in init
with open(ixp) as fi:
^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: ''

@dpgiakatos
Copy link
Member

@InternetHealthReport/iij can you help with the above variable?

@romain-fontugne
Copy link
Member Author

in the data/ folder there is a file 'ixs_202310.jsonl'

you can just put data/ixs_202310.jsonl for the ip2asn_ixp value

@romain-fontugne
Copy link
Member Author

Sir, I have installed and set up the kafka now and its working, but now i am getting this:

python3 raclette/raclette.py -C conf/asc-start.conf type error: Invalid data stream Traceback (most recent call last): File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/raclette.py", line 175, in main i2a = ip2asn.ip2asn(self.ip2asn_db, self.ip2asn_ixp, self.ip2asn_kafka_topic, KAFKA_HOST) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/c/Users/admin/OneDrive/Desktop/IHR/raclette/raclette/lib/ip2asn.py", line 49, in init self.rtree = pickle.load(bz2.BZ2File(db, "rb")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/bz2.py", line 155, in peek return self._buffer.peek(n) ^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/_compression.py", line 68, in readinto data = self.read(len(byte_view)) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/_compression.py", line 103, in read data = self._decompressor.decompress(rawblock, size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: Invalid data stream

I'll push a quick fix for that. For the configuration file you are using you shouldn't need kafka

@romain-fontugne
Copy link
Member Author

It should be fixed now.
Actually removing the kafka part in ip2asn (lib/ip2asn.py file) would be good because we are not using that.

@TejasNangru
Copy link

hi guys
now i am not working on this issue
if some new comers want to try this, feel free to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants