-
Notifications
You must be signed in to change notification settings - Fork 35
Make levels
return a CategoricalArray
#425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Having `levels` preserve the eltype of the input is sometimes useful to write generic code. This is only slightly breaking as the result still compares equal to the previous behvior returning unwrapped values.
Further to my posts at #390, I have run MLJ integration tests against this branch of CategoricalArrays.jl. Only two models fail, both provided by the package BetaML.jl: "PegasosClassifier (BetaML)", "PerceptronClassifier (BetaML)" I will investigate further shortly and report back here. cc @sylvaticus I have separately tested CategoricalDistributions, MLJModelInterface, and MLJBase, against this branch, with no fails. |
I'll too have a look, but I need a few days.... |
Thanks. In parallel, let's try whether Nanosoldier can check all direct dependencies:
|
Better check against current release:
|
The package evaluation job you requested has completed - possible new issues were detected. Report summary✖ Packages that failed14 packages failed only on the current version.
44 packages failed on the previous version too. ✔ Packages that passed tests57 packages passed tests on the previous version too. ➖ Packages that were skipped altogether5 packages were skipped on the previous version too. |
The package evaluation job you requested has completed - possible new issues were detected. Report summary✖ Packages that failed17 packages failed only on the current version.
41 packages failed on the previous version too. ✔ Packages that passed tests1 packages passed tests only on the current version.
56 packages passed tests on the previous version too. ➖ Packages that were skipped altogether5 packages were skipped on the previous version too. |
OK, so 17 new failures with this PR compared with 0.10.8, and 14 new failures when comparing this PR against current master. According to another run, there are 13 new failures when comparing master with 0.10.8. A significant part seems to be related to MLJ. So it doesn't look like this PR is really more breaking than the previous ones. About 40 dependent packages still pass their tests. Unfortunately, an equivalent number failed even with 0.10.8 (due to using Julia 1.11?), so we can't tell whether they are affected by the changes or not, but results on other packages probably give a representative picture of the breakage rate. |
@sylvaticus Do you know when you'll have the time to look at this? |
Hello @nalimilan : how can I test a Pool Request of another package on my package ? Currently, the 2 MLJ function interfaces for the BetaML PerceptronClassifier model are: function MMI.fit(model::PerceptronClassifier, verbosity, X, y)
x = MMI.matrix(X) # convert table to matrix
allClasses = levels(y)
typeof(verbosity) <: Integer || error("Verbosity must be a integer. Current \"steps\" are 0, 1, 2 and 3.")
verbosity = mljverbosity_to_betaml_verbosity(verbosity)
fitresult = BetaML.Perceptron.perceptron(x, y; θ=model.initial_coefficients, θ₀=model.initial_constant, T=model.epochs, nMsgs=0, shuffle=model.shuffle, force_origin=model.force_origin, return_mean_hyperplane=model.return_mean_hyperplane,rng=model.rng, verbosity=verbosity)
cache=nothing
report=nothing
return (fitresult,allClasses), cache, report
end and function MMI.predict(model::Union{PerceptronClassifier,PegasosClassifier}, fitresult, Xnew)
fittedModel = fitresult[1]
classes = fittedModel.classes
allClasses = fitresult[2] # as classes do not includes classes unsees at training time
nLevels = length(allClasses)
nRecords = MMI.nrows(Xnew)
modelPredictions = BetaML.Perceptron.predict(MMI.matrix(Xnew), fittedModel.θ, fittedModel.θ₀, fittedModel.classes) # vector of dictionaries y_element => prob
predMatrix = zeros(Float64,(nRecords,nLevels))
# Transform the predictions from a vector of dictionaries to a matrix
# where the rows are the PMF of each record
for n in 1:nRecords
for (c,cl) in enumerate(allClasses)
predMatrix[n,c] = get(modelPredictions[n],cl,0.0)
end
end
predictions = MMI.UnivariateFinite(allClasses,predMatrix,pool=missing)
return predictions
end Perhaps it is enough that I change my calls from |
You can do With this PR, |
Thank you, Should now be solved in BetaML#Master |
OK. Do you know why this is needed? In theory the idea was that having @ablaom Is this PR OK for you? |
yes, actually the reason is in an inner function where dispatch calls the
wrong `predict` function with the CategoricalVector.... I could change
that one, it's not so hard, just felt lazy and corrected the MLJ
interface...
…On Thu, 5 Jun 2025 at 16:42, Milan Bouchet-Valat ***@***.***> wrote:
*nalimilan* left a comment (JuliaData/CategoricalArrays.jl#425)
<#425 (comment)>
OK. Do you know why this is needed? In theory the idea was that having
levels return a CategoricalVector shouldn't be a problem in most cases,
and useful in others.
@ablaom <https://github.com/ablaom> Is this PR OK for you?
—
Reply to this email directly, view it on GitHub
<#425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP3DZ7JFCVM46YZKRYG3633CBJMVAVCNFSM6AAAAAB6BCHYE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNBUHAZTSNZZGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Antonello Lobianco
AgroParisTech, Bureau d'Economie Théorique et Appliquée
(ex Laboratoire d'Economie Forestière)
14 Rue Girardet - 54000 Nancy, France
Tel: +33.383396852
Email, academic: ***@***.***
Email, personal: ***@***.***
http://antonello.lobianco.org
|
I have not had time to fully investigate and won't have time for a week or so. It's possible MLJ will have to review every package that uses these methods, which will be unpleasant, but I think the change makes sense, so long as it is tagged as breaking. This is definitely breaking behaviour. |
Having
levels
preserve the eltype of the input is sometimes useful to write generic code. This is only slightly breaking as the result still compares equal to the previous behvior returning unwrapped values.Fixes #390.