Skip to content

fix some issues with equality of factorizations #41228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions stdlib/LinearAlgebra/src/factorization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,17 @@ Factorization{T}(A::Adjoint{<:Any,<:Factorization}) where {T} =
adjoint(Factorization{T}(parent(A)))
inv(F::Factorization{T}) where {T} = (n = size(F, 1); ldiv!(F, Matrix{T}(I, n, n)))

Base.hash(F::Factorization, h::UInt) = mapreduce(f -> hash(getfield(F, f)), hash, 1:nfields(F); init=h)
Base.:(==)( F::T, G::T) where {T<:Factorization} = all(f -> getfield(F, f) == getfield(G, f), 1:nfields(F))
Base.isequal(F::T, G::T) where {T<:Factorization} = all(f -> isequal(getfield(F, f), getfield(G, f)), 1:nfields(F))::Bool
function Base.hash(F::Factorization, h::UInt)
return mapreduce(f -> hash(getfield(F, f)), hash, 1:nfields(F); init=hash(typeof(F).name.wrapper, h))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told that you should never fetch the internal fields like this. Maybe you can hash propertynames instead?

More generally, should the hash traverse propertynames instead of the fields? That should avoid the issue with non-active memory affecting the hash and maybe make the specialized methods redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told that you should never fetch the internal fields like this. Maybe you can hash propertynames instead?

I agree it's a bit ugly, but I am unsure just using propertynames for the hash is quite enough, since a lot of them like vectors and values for Eigen are quite generic and not necessarily unique. Perhaps that is being a bit too paranoid, but hashing by type identity seems more formally correct, at least to me. For a stdlib, relying on internals is probably also not quite as bad, since it will always get tested and updated alongside other Base changes.

More generally, should the hash traverse propertynames instead of the fields?

Yeah, maybe? I think that would mean hashing the same data twice for some factorizations though, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a bit ugly

I don't think the reason to avoid it was aesthetics but it's really not my department. Others should comment that if it should be avoided. Maybe it used to be a problem but no longer is. Otherwise, let's just keep it.

I think that would mean hashing the same data twice for some factorizations though, right?

Yes. For e.g. Cholesky it would probably end up hashing the same data three times so we should probably have specialized versions. However, hashing the fields seems wrong since the inactive memory will affect the hash so I think using the properties will be the correct, although slow, fallback. E.g.

julia> A = Symmetric(randn(3,3) + 10I, :U)
3×3 Symmetric{Float64, Matrix{Float64}}:
 8.52303    0.491492   0.311278
 0.491492   9.43415   -1.63795
 0.311278  -1.63795    9.80197

julia> Ac = copy(A);

julia> Ac.data[3,1] += 1
1.9147549806916584

julia> hash(cholesky(A)) == hash(cholesky(Ac))
false

julia> cholesky(A).U == cholesky(Ac).U
true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason we discourage use of internal APIs like this is because they might change at any time, which will break such code. I don't think there are any other glaring problems with this approach.

Using propertynames for the fallback seems reasonable, I will change that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, trying this locally, using propertynames fails for bunchkaufman.

For e.g. Cholesky it would probably end up hashing the same data three times so we should probably have specialized versions.

It seems like this is an issue for quite a lot of factorizations, so if we want this to be efficient, we would end up having to define this manually for a lot of cases, which kind of defeats the purpose of having this definition in the first place.

The more I think about it, the more I am convinced that we should just remove these definitions for the abstract type Factorization, since they really don't make much sense in terms of the abstract Factorization interface, but instead have a macro similar to https://github.com/andrewcooke/AutoHashEquals.jl, which defines these for each factorization separately. That would also avoid having to rely on F.name.wrapper. What do you think? It might be somewhat breaking if users define their own factorization types though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump. How do you think we should proceed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the macro based version end up inspecting the fields which was what I argued against in #41228 (comment)

end
function Base.:(==)(F::Factorization, G::Factorization)
typeof(F).name.wrapper == typeof(G).name.wrapper || return false
return all(f -> getfield(F, f) == getfield(G, f), 1:nfields(F))
end
function Base.isequal(F::Factorization, G::Factorization)
typeof(F).name.wrapper == typeof(G).name.wrapper || return false
return all(f -> isequal(getfield(F, f), getfield(G, f)), 1:nfields(F))::Bool
end

function Base.show(io::IO, x::Adjoint{<:Any,<:Factorization})
print(io, "Adjoint of ")
Expand Down
10 changes: 10 additions & 0 deletions stdlib/LinearAlgebra/src/qr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,16 @@ Base.iterate(S::QRCompactWY) = (S.Q, Val(:R))
Base.iterate(S::QRCompactWY, ::Val{:R}) = (S.R, Val(:done))
Base.iterate(S::QRCompactWY, ::Val{:done}) = nothing

function Base.hash(F::QRCompactWY, h::UInt)
return hash(F.factors, hash(UpperTriangular(F.T), hash(QRCompactWY, h)))
end
function Base.:(==)(A::QRCompactWY, B::QRCompactWY)
return A.factors == B.factors && UpperTriangular(A.T) == UpperTriangular(B.T)
end
function Base.isequal(A::QRCompactWY, B::QRCompactWY)
return isequal(A.factors, B.factors) && isequal(UpperTriangular(A.T), UpperTriangular(B.T))
end

"""
QRPivoted <: Factorization

Expand Down
50 changes: 50 additions & 0 deletions stdlib/LinearAlgebra/test/factorization.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# This file is a part of Julia. License is MIT: https://julialang.org/license

module TestFactorization
using Test, LinearAlgebra

@testset "equality for factorizations - $f" for f in Any[
bunchkaufman,
cholesky,
x -> cholesky(x, Val(true)),
eigen,
hessenberg,
lq,
lu,
qr,
x -> qr(x, ColumnNorm()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be manually changed to qr(x, Val(true)) when backporting this to 1.6. @KristofferC what is the best way to ensure this? Should I maybe remove the backport-1.6 label here and open a new PR with that change against the release-1.6 branch?

svd,
schur,
]
A = randn(3, 3)
A = A * A' # ensure A is pos. def. and symmetric
F, G = f(A), f(A)

@test F == G
@test isequal(F, G)
@test hash(F) == hash(G)

f === hessenberg && continue

F = typeof(F).name.wrapper(Base.mapany(1:nfields(F)) do i
x = getfield(F, i)
return x isa AbstractArray{Float64} ? Float32.(x) : x
end...)
G = typeof(G).name.wrapper(Base.mapany(1:nfields(G)) do i
x = getfield(G, i)
return x isa AbstractArray{Float64} ? Float64.(Float32.(x)) : x
end...)

@test F == G
@test isequal(F, G)
@test hash(F) == hash(G)
end

@testset "hash collisions" begin
A, v = randn(2, 2), randn(2)
F, G = LQ(A, v), QR(A, v)
@test !isequal(F, G)
@test hash(F) != hash(G)
end

end
1 change: 1 addition & 0 deletions stdlib/LinearAlgebra/test/testgroups
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ givens
structuredbroadcast
addmul
ldlt
factorization