A beginner's journey to using AI for data exploration.
State of the art as of: 2025/03/14
Inspired by watching pacman upgrade logs and having watched a fantastic TV series about maths, I tried out, what Cursor IDE is able to do and to help me to prove (or refute), that "Benford's Law" holds for the package's file sizes.
In a first session, the AI returned amazing plots ... that contained a major bug, because the script also recognizes MiB/s
as a file size, which I - in turn - noticed by running a AI generated jupyter notebook.
This notebook can be used to produce these two insightfull plots
Note: I did the part with the logarithmic scale, Cursor failed on that dramatically.
As Cursor IDE told me:
First install Jupyter Notebook if you haven't already:
pip install notebook numpy matplotlib
Then launch Jupyter Notebook by running:
jupyter notebook
(not by trying to run the .ipynb file directly)This will open your web browser with the Jupyter interface. Navigate to where you saved your
notebook. ipynb
file and click on it to open.
- AI sucks at regular expressions, see here
- AI can't fit logarithmic curves to data
In previous chat (whose history I have delete unfortunatley) i explored a pacman.log to reason about, if "Benford's Law" holds for the sizes of arch linux package upgrades. On the way I created (in that order), benford.py, histogram.py and histogram.go. At the end, I chatted with you about, why the law probably doesn't hold for small and very large file sizes.
help me to turn my results into a jupyter notebook.
returned more or less the complete notebook here
there's a bug in the script. the regular expression also matches the the download speed (the column next to the size). correct this
pattern = r'(\d+(?:\.\d+)?)\s*(MiB|KiB)\s+[^\n]*$'
use a look ahead assertion instead
pattern = r'(\d+(?:\.\d+)?)\s*(MiB|KiB)(?=\s+\d+(?:\.\d+)?\s*MiB/s)'
use a negative look ahead asseration to match any but "MiB" followed by "/s"
pattern = r'(\d+(?:\.\d+)?)\s*(MiB|KiB)(?!\s+MiB/s)'
did you mean pattern = r'(\d+(?:\.\d+)?)\s\*(MiB|KiB)(?!/s)'
?
returned a praise of me
add this negative look ahead assertion to the go version
sizeRegex := regexp.MustCompile(
(\d+(?:.\d+)?)\s*(MiB|KiB)(?!/s))
, which doesn't work
this returns: panic: regexp: Compile((\d+(?:\.\d+)?)\s*(MiB|KiB)(?!\/s)
): error parsing regexp: invalid or unsupported Perl syntax: (?!
sizeRegex := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*(MiB|KiB)`) speedRegex := regexp.MustCompile(`/s`) ... if len(matches) >= 3 && !speedRegex.MatchString(matches[0]) { ...