Make `levels` return a `CategoricalArray` #425

nalimilan · 2025-05-27T20:57:46Z

Having levels preserve the eltype of the input is sometimes useful to write generic code. This is only slightly breaking as the result still compares equal to the previous behvior returning unwrapped values.

Fixes #390.

Having `levels` preserve the eltype of the input is sometimes useful to write generic code. This is only slightly breaking as the result still compares equal to the previous behvior returning unwrapped values.

ablaom · 2025-05-29T07:37:10Z

Further to my posts at #390, I have run MLJ integration tests against this branch of CategoricalArrays.jl.

Only two models fail, both provided by the package BetaML.jl:

"PegasosClassifier (BetaML)", "PerceptronClassifier (BetaML)"

I will investigate further shortly and report back here. cc @sylvaticus

I have separately tested CategoricalDistributions, MLJModelInterface, and MLJBase, against this branch, with no fails.

sylvaticus · 2025-05-29T07:42:58Z

I'll too have a look, but I need a few days....

nalimilan · 2025-05-29T17:18:55Z

Thanks. In parallel, let's try whether Nanosoldier can check all direct dependencies:

@nanosoldier `runtests(ALL)`

nanosoldier · 2025-05-29T17:18:59Z

Update on PkgEvalJob 87b50fc vs. 11d43c1: Accepted

nalimilan · 2025-05-29T17:20:24Z

Better check against current release:

@nanosoldier `runtests(ALL, vs = "#v0.10.8")`

nanosoldier · 2025-05-29T17:20:27Z

Update on PkgEvalJob 87b50fc vs. 99faa56: Accepted

nanosoldier · 2025-05-30T09:51:49Z

Update on PkgEvalJob 87b50fc vs. 11d43c1: Running

nanosoldier · 2025-05-30T11:18:54Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

✖ Packages that failed

14 packages failed only on the current version.

Package fails to precompile: 1 packages
Package has test failures: 3 packages
Package tests unexpectedly errored: 9 packages
Tests became inactive: 1 packages

44 packages failed on the previous version too.

✔ Packages that passed tests

57 packages passed tests on the previous version too.

➖ Packages that were skipped altogether

5 packages were skipped on the previous version too.

nanosoldier · 2025-05-30T11:19:00Z

Update on PkgEvalJob 87b50fc vs. 99faa56: Running

nanosoldier · 2025-05-30T12:37:36Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

✖ Packages that failed

17 packages failed only on the current version.

Package fails to precompile: 1 packages
Package has test failures: 3 packages
Package tests unexpectedly errored: 10 packages
Tests became inactive: 1 packages
Test duration exceeded the time limit: 2 packages

41 packages failed on the previous version too.

✔ Packages that passed tests

1 packages passed tests only on the current version.

Other: 1 packages

56 packages passed tests on the previous version too.

➖ Packages that were skipped altogether

5 packages were skipped on the previous version too.

nalimilan · 2025-05-30T21:34:34Z

OK, so 17 new failures with this PR compared with 0.10.8, and 14 new failures when comparing this PR against current master. According to another run, there are 13 new failures when comparing master with 0.10.8. A significant part seems to be related to MLJ.

So it doesn't look like this PR is really more breaking than the previous ones. About 40 dependent packages still pass their tests. Unfortunately, an equivalent number failed even with 0.10.8 (due to using Julia 1.11?), so we can't tell whether they are affected by the changes or not, but results on other packages probably give a representative picture of the breakage rate.

nalimilan · 2025-06-05T07:33:50Z

@sylvaticus Do you know when you'll have the time to look at this?

sylvaticus · 2025-06-05T09:33:45Z

@sylvaticus Do you know when you'll have the time to look at this?

Hello @nalimilan : how can I test a Pool Request of another package on my package ?

Currently, the 2 MLJ function interfaces for the BetaML PerceptronClassifier model are:

function MMI.fit(model::PerceptronClassifier, verbosity, X, y)
 x = MMI.matrix(X)                     # convert table to matrix
 allClasses = levels(y)
 typeof(verbosity) <: Integer || error("Verbosity must be a integer. Current \"steps\" are 0, 1, 2 and 3.")  
 verbosity = mljverbosity_to_betaml_verbosity(verbosity)
 fitresult = BetaML.Perceptron.perceptron(x, y; θ=model.initial_coefficients, θ₀=model.initial_constant, T=model.epochs, nMsgs=0, shuffle=model.shuffle, force_origin=model.force_origin, return_mean_hyperplane=model.return_mean_hyperplane,rng=model.rng, verbosity=verbosity)
 cache=nothing
 report=nothing
 return (fitresult,allClasses), cache, report
end

and

function MMI.predict(model::Union{PerceptronClassifier,PegasosClassifier}, fitresult, Xnew)
    fittedModel      = fitresult[1]
    classes          = fittedModel.classes
    allClasses       = fitresult[2] # as classes do not includes classes unsees at training time
    nLevels          = length(allClasses)
    nRecords         = MMI.nrows(Xnew)
    modelPredictions = BetaML.Perceptron.predict(MMI.matrix(Xnew), fittedModel.θ, fittedModel.θ₀, fittedModel.classes) # vector of dictionaries y_element => prob
    predMatrix       = zeros(Float64,(nRecords,nLevels))
    # Transform the predictions from a vector of dictionaries to a matrix
    # where the rows are the PMF of each record
    for n in 1:nRecords
        for (c,cl) in enumerate(allClasses)
            predMatrix[n,c] = get(modelPredictions[n],cl,0.0)
        end
    end
    predictions = MMI.UnivariateFinite(allClasses,predMatrix,pool=missing)
    return predictions
end

Perhaps it is enough that I change my calls from levels(y) to sort(unique(y)) ?? What do you think ?

nalimilan · 2025-06-05T10:07:21Z

You can do ]add CategoricalArrays#nl/levels.

With this PR, levels(y) returns the same thing as sort(unique(y)), except that it keeps unused levels (as before). To get back the previous behavior you could do unwrap.(levels(y)). But without knowing the error I can't really help.

sylvaticus · 2025-06-05T12:04:03Z

Thank you, Should now be solved in BetaML#Master

nalimilan · 2025-06-05T14:41:56Z

OK. Do you know why this is needed? In theory the idea was that having levels return a CategoricalVector shouldn't be a problem in most cases, and useful in others.

@ablaom Is this PR OK for you?

sylvaticus · 2025-06-05T14:49:11Z

yes, actually the reason is in an inner function where dispatch calls the wrong `predict` function with the CategoricalVector.... I could change that one, it's not so hard, just felt lazy and corrected the MLJ interface...

…

On Thu, 5 Jun 2025 at 16:42, Milan Bouchet-Valat ***@***.***> wrote: *nalimilan* left a comment (JuliaData/CategoricalArrays.jl#425) <#425 (comment)> OK. Do you know why this is needed? In theory the idea was that having levels return a CategoricalVector shouldn't be a problem in most cases, and useful in others. @ablaom <https://github.com/ablaom> Is this PR OK for you? — Reply to this email directly, view it on GitHub <#425 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAP3DZ7JFCVM46YZKRYG3633CBJMVAVCNFSM6AAAAAB6BCHYE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNBUHAZTSNZZGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Antonello Lobianco AgroParisTech, Bureau d'Economie Théorique et Appliquée (ex Laboratoire d'Economie Forestière) 14 Rue Girardet - 54000 Nancy, France Tel: +33.383396852 Email, academic: ***@***.*** Email, personal: ***@***.*** http://antonello.lobianco.org

ablaom · 2025-06-09T10:28:42Z

I have not had time to fully investigate and won't have time for a week or so. It's possible MLJ will have to review every package that uses these methods, which will be unpleasant, but I think the change makes sense, so long as it is tagged as breaking. This is definitely breaking behaviour.

Make levels return a CategoricalArray

4b12853

Having `levels` preserve the eltype of the input is sometimes useful to write generic code. This is only slightly breaking as the result still compares equal to the previous behvior returning unwrapped values.

nalimilan mentioned this pull request May 27, 2025

Make levels return a CategoricalArray #390

Open

Fix doctests

87b50fc

bkamins approved these changes May 27, 2025

View reviewed changes

nalimilan mentioned this pull request May 30, 2025

Add Array constructors and convert methods #420

Merged

Make levels return a CategoricalArray #425

Are you sure you want to change the base?

Make levels return a CategoricalArray #425

Conversation

nalimilan commented May 27, 2025

Uh oh!

ablaom commented May 29, 2025

Uh oh!

sylvaticus commented May 29, 2025

Uh oh!

nalimilan commented May 29, 2025

Uh oh!

nanosoldier commented May 29, 2025

Uh oh!

nalimilan commented May 29, 2025

Uh oh!

nanosoldier commented May 29, 2025

Uh oh!

nanosoldier commented May 30, 2025

Uh oh!

nanosoldier commented May 30, 2025

✖ Packages that failed

✔ Packages that passed tests

➖ Packages that were skipped altogether

Uh oh!

nanosoldier commented May 30, 2025

Uh oh!

nanosoldier commented May 30, 2025

✖ Packages that failed

✔ Packages that passed tests

➖ Packages that were skipped altogether

Uh oh!

nalimilan commented May 30, 2025

Uh oh!

nalimilan commented Jun 5, 2025

Uh oh!

sylvaticus commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nalimilan commented Jun 5, 2025

Uh oh!

sylvaticus commented Jun 5, 2025

Uh oh!

nalimilan commented Jun 5, 2025

Uh oh!

sylvaticus commented Jun 5, 2025 via email

Uh oh!

ablaom commented Jun 9, 2025

Uh oh!

Uh oh!

Make `levels` return a `CategoricalArray` #425

Make `levels` return a `CategoricalArray` #425

sylvaticus commented Jun 5, 2025 •

edited

Loading