-
Notifications
You must be signed in to change notification settings - Fork 457
Test importing the parquet export #2038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hello @manuelwedler , I want to contribute in this issue. Can you please assign me ? |
Hello @halcyonet , sure you can do if you like. You would need to do the following:
During that process it would be good to document the commands you used for the import and any problems you faced. Especially, we need to:
Let me know if that works for you. We can help you in the process. |
Hi @halcyonet are you still working on this? |
I remember @clonker already did something for this in python, probably we want a script in ts/js but we can take inspiration from his code :) |
This is what I used. It creates an SQLite DB though. It can probably be somewhat improved, it'll (f ex) store the entire db twice in the file, which of course is unfortunate but I imagine easy to fix. Codeimport requests
import json
import sqlite3
import pandas as pd
class ANSI:
red = "\033[0;31m"
green = "\033[0;32m"
reset = "\033[0m"
gray = "\033[1;30m"
cyan = "\033[0;36m"
manifest = requests.get('https://export.sourcify.dev/manifest.json').json()
db = sqlite3.connect(database='sourcify.sqlite')
cursor = db.cursor()
cursor.execute('create table if not exists manifest (kind text, file text)')
failed = []
for kind, files in manifest['files'].items():
for file in files:
cursor.execute(f'SELECT ROWID FROM manifest WHERE kind = "{kind}" and file = "{file}"')
print(f"Fetching {ANSI.cyan}{file}{ANSI.reset}: ", end='', flush=True)
if cursor.fetchone():
print(f"{ANSI.gray}SKIP{ANSI.reset}")
else:
try:
pq = pd.read_parquet(f"https://export.sourcify.dev/{file}", storage_options={"User-Agent": "pandas"})
pq.to_sql(kind, db, index=False, if_exists='append')
cursor.execute('insert into manifest values(?,?)', (kind,file))
db.commit()
print(f"{ANSI.green}DONE{ANSI.reset}")
except:
print(f"{ANSI.red}FAIL{ANSI.reset}")
failed.append(file)
if failed:
print(f"{ANSI.red}Failed: {ANSI.cyan}{','.join(failed)}{ANSI.reset}") Little extra on how to interact with it... Codeclass SourcifyDB:
def __init__(self, filename):
self._db = sqlite3.connect(database=filename)
self.cursor = self._db.cursor()
def contract_ids_by_compiler_and_version(self, compiler, version):
ids = self.cursor.execute('select id, compiler_settings from compiled_contracts where compiler == ? and version like ?;', (compiler, version)).fetchall()
ids = [ids[i][0] for i in range(len(ids))]
return ids
def source_hash_and_path_from_contract_id(self, contract_id):
hashes_and_paths = self.cursor.execute('select source_hash, path from compiled_contracts_sources where compilation_id == ?;', (contract_id,)).fetchall()
source_hashes = [hashes_and_paths[i][0] for i in range(len(hashes_and_paths))]
paths = [hashes_and_paths[i][1] for i in range(len(hashes_and_paths))]
return source_hashes, paths
def contract_content(self, source_hash):
c = self.cursor.execute('select content from sources where source_hash == ?;', (source_hash,)).fetchall()
return c[0][0]
def __enter__(self):
return self
def __exit__(self, *args, **kw):
self._db.close() |
It would be nice to try importing the parquet export, and see if we have any issues. Maybe we can learn something from it and improve the documentation based on it.
One thing I'm also thinking of here is that one may want to run Sourcify based on the export. However, when the parquet file is imported, it wouldn't include the functions and constraints. Maybe some documentation about how to add also these would be good.
The text was updated successfully, but these errors were encountered: