Skip to content

[loaders-] guess types (hdf5), understand unsigned int type (npy) #2713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 15, 2025

Conversation

maxfl
Copy link
Contributor

@maxfl maxfl commented Feb 28, 2025

  • hdf5: read datatype codes from 1d arrays and guess column types. Done similarly to npy for floats and (un)signed integers.
  • npy: treat unsigned integer datatype as integer. Was anytype previously.
  • pandas: fix hdf loader

Only existing loaders were modified, no other added.

- hdf5: read datatype codes from 1d arrays and guess column types. Done
similarly to npy for floats and (un)signed integers.
- npy: treat unsigned integer datatype as integer. Was `anytype`
previously.
@CLAassistant
Copy link

CLAassistant commented Feb 28, 2025

CLA assistant check
All committers have signed the CLA.

Previous version was referring to absent `pd.read_hdf5`
@@ -1,4 +1,4 @@
from visidata import VisiData, vd, Sheet, Path, Column, ItemColumn, BaseSheet
from visidata import VisiData, vd, Sheet, Path, Column, ColumnItem, ItemColumn, BaseSheet, anytype
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: ItemColumn is the preferred nomenclature; ColumnItem is deprecated. (All column types are named FooColumn.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @maxfl ! The pandas loader is a little rusty so I appreciate that it's getting some updates.

@maxfl
Copy link
Contributor Author

maxfl commented Mar 5, 2025

@saulpw, you are welcome. There are a few other items that could be improved and I will definitely look more into the loaders as I work with vd more.

On a related note, it is possible to extend vd with root loader. I'm not proposing a PR now as I need to gain some more experience and walk through the check list.

By the way, visidata is wonderful. It is simplified my life so much. Thank you a lot!

@anjakefala
Copy link
Collaborator

Hey @maxfl, did you test the numpy loader?

When I created one that creates small arrays, and opened it with VisiData, the VisiData sheet was just blank. Did you notice something similar?

@anjakefala
Copy link
Collaborator

This was mine, for reference:

import numpy as np

dtype = [
    ('id', np.uint16),
    ('value', np.uint8),
    ('score', np.float32)
]

structured_data = np.zeros(20, dtype=dtype)
for i in range(20):
    structured_data[i] = (i, i % 255, i * 1.5)

np.save('visidata_test.npy', structured_data)

array_2d = np.arange(100, dtype=np.uint8).reshape(10, 10)
np.save('visidata_test_2d.npy', array_2d)

uint8_array = np.array([0, 127, 255], dtype=np.uint8)
uint16_array = np.array([0, 32767, 65535], dtype=np.uint16)
uint32_array = np.array([0, 2147483647, 4294967295], dtype=np.uint32)
uint64_array = np.array([0, 9223372036854775807, 18446744073709551615], dtype=np.uint64)

np.save('uint8_test.npy', uint8_array)
np.save('uint16_test.npy', uint16_array)
np.save('uint32_test.npy', uint32_array)
np.save('uint64_test.npy', uint64_array)

@maxfl
Copy link
Contributor Author

maxfl commented Mar 14, 2025

dear @anjakefala, thank you for testing and providing an example.

In fact I did test the updates and I do use visidata on a daily basis to work with hdf5/npz/root and tsv files. The only change I did for npy/npz is adding support for unsigned int to already existing int by modifying a single line. The original implementation by design works only with record arrays and requires columns to have names. It is not working with regular 1d/2d arrays.

I think I might import the relevant code from hdf5 to numpy loader to make it work with regular arrays, but may be it does not belong to this particular PR.

@anjakefala
Copy link
Collaborator

anjakefala commented Mar 14, 2025

Thank you for that context @maxfl! I will add a small test file for npy, and then will go ahead and merge. =)

@anjakefala anjakefala merged commit 75dc574 into saulpw:develop Mar 15, 2025
14 checks passed
@maxfl
Copy link
Contributor Author

maxfl commented Mar 18, 2025

Thank you for that context @maxfl! I will add a small test file for npy, and then will go ahead and merge. =)

following PR on 2d numpy in !2724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants