Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy parent in copy_similar for Symmetric/Hermitian #54473

Merged
merged 6 commits into from
May 20, 2024

Conversation

jishnub
Copy link
Contributor

@jishnub jishnub commented May 15, 2024

This should be equivalent to copying the Symmetric/Hermitian matrix, as only one triangular half is used. However, this reduces latency when the parent is a Matrix (and possibly others too, although I've not checked this).

julia> using LinearAlgebra

julia> A = rand(2,2); H = Hermitian(A);

julia> @time eigen(H);
  0.342207 seconds (440.61 k allocations: 22.926 MiB, 99.95% compilation time) # nightly
  0.260913 seconds (326.62 k allocations: 16.926 MiB, 99.93% compilation time) # This PR

Copying the parent is also marginally faster, although this doesn't really matter in eigen:

julia> A = rand(1000,1000); H = Hermitian(A);

julia> @btime LinearAlgebra.copy_similar($H, Float64);
  1.839 ms (3 allocations: 7.63 MiB) # nightly
  1.760 ms (3 allocations: 7.63 MiB) # This PR

@jishnub jishnub added domain:linear algebra Linear algebra compiler:latency Compiler latency labels May 15, 2024
@fredrikekre
Copy link
Member

99.95% compilation time

What is the timing without compilation? I had a quick look and the only difference is that Symmetric goes through

similar(A::Union{Symmetric,Hermitian}, ::Type{T}, dims::Dims{N}) where {T,N} = similar(parent(A), T, dims)
which shouldn't cost anything compared to what you implement here, or?

@jishnub
Copy link
Contributor Author

jishnub commented May 15, 2024

This PR is primarily about faster compilation. The difference mainly arises from the fact that copyto! is specialized for Arrays to essentially be reduced to a foreigncall to jl_genericmemory_copyto. On the other hand, copyto! for a Hermitian uses the fallback copyto_unaliased! path, which takes longer to compile.

julia> using LinearAlgebra

julia> A = rand(1000,1000); H = Hermitian(A); B = similar(A);

julia> @time copyto!(B, A);
  0.024590 seconds (38.30 k allocations: 1.943 MiB, 81.51% compilation time)

vs

julia> @time copyto!(B, H);
  0.122573 seconds (178.02 k allocations: 9.301 MiB, 13.93% gc time, 95.64% compilation time)

Even leaving the compilation-time aside, the copyto! is faster for Array-to-Array copies:

julia> @btime copyto!($B, $A);
  475.371 μs (0 allocations: 0 bytes)

julia> @btime copyto!($B, $H);
  1.737 ms (0 allocations: 0 bytes)

However, in this specific context, this difference may not matter much, if at all.

@dkarrasch
Copy link
Member

What if we specialized copy_similar to HermOrSym? We could copy then only half of the data from the parent (i.e., without getindex passing through the HermOrSym branches)?

@jishnub
Copy link
Contributor Author

jishnub commented May 15, 2024

The docstring of copy_similar currently says that the type of the output corresponds to similar(A, T, size(A)) in general (although I'm unsure why the three-term similar is specifically necessary). If we relax the output type, we should be able to use copytrito! and wrap the result in a Symmetric/Hermitian, which would be faster in general.

@dkarrasch
Copy link
Member

BTW, should we make use of copytrito! in copyto!(::Hermitian, ::Hermitian)? We could then try something else: define eigencopy_oftype(A::Hermitian, S) = copymutable_oftype(A, S). If I made no mistake, this reduces the runtime by another factor of 3, but increases compile time slightly relative to the status quo.

@dkarrasch
Copy link
Member

I went ahead and opened #54476, since it's orthogonal to this PR, but allows for more options.

@jishnub
Copy link
Contributor Author

jishnub commented May 16, 2024

I don't think we should call copyto! here with two Hermitian matrices, but we should call copytrito! directly on the parents (if not copyto!). This is because we'll unnecessarily compile the mismatched-uplo branch in copyto!. In a way, we're inlining the matched-uplo branch of copyto! in eigencopy_oftype.

@jishnub
Copy link
Contributor Author

jishnub commented May 16, 2024

I've timed the copytrito! and the copyto_similar implementations, and the performance of eigen seems roughly comparable for the two:

With

eigencopy_oftype(A::Hermitian, S) = Hermitian(copytrito!(similar(parent(A), S, size(A)), A.data, A.uplo), sym_uplo(A.uplo))

we obtain

julia> using LinearAlgebra

julia> A = rand(200,200); H = Hermitian(A);

julia> @time eigen(H);
  0.299632 seconds (318.16 k allocations: 17.390 MiB, 97.93% compilation time)

julia> @btime eigen!($H);
  5.333 ms (20 allocations: 698.94 KiB)

whereas, with the copy_similar implementation as in this PR, we obtain

julia> @time eigen(H);
  0.274765 seconds (326.63 k allocations: 17.926 MiB, 98.01% compilation time)

julia> @btime eigen!($H);
  5.505 ms (20 allocations: 698.94 KiB)

Perhaps we should go with copytrito! then, as it'll lower run-times for smaller matrices.

@dkarrasch
Copy link
Member

Hm, failures look real. Something about reference to undefined objects in Matrix{BigFloat}...

@jishnub
Copy link
Contributor Author

jishnub commented May 16, 2024

Yeah, that's one potential catch with using copytrito!, where one half of the matrix is left untouched (and potentially uninitialized). In this case, the issue seems to be that adjoint! and transpose! don't support undef values in a matrix. Perhaps it's prudent to use copyto! for now.

@dkarrasch
Copy link
Member

I don't have time right now to work it out, but as an option, can we catch the potentially bad cases in adjoint! and transpose! by types? Like overload these two for triangular matrices or something? One would need to see the full signature in those failing cases, which is impossible from the stacktrace because almsot all the function calls are inlined. I'll try to have a look later.

@jishnub
Copy link
Contributor Author

jishnub commented May 16, 2024

The specific failing test should be fixed by #54491

@jishnub
Copy link
Contributor Author

jishnub commented May 17, 2024

The failing BunchKaufman test should be fixed by #54509

@jishnub
Copy link
Contributor Author

jishnub commented May 19, 2024

The failing hessenberg test should be fixed by #54518 (I think)

@jishnub jishnub merged commit 047e699 into master May 20, 2024
7 checks passed
@jishnub jishnub deleted the jishnub/eigencopysymhermparent branch May 20, 2024 02:31
@dkarrasch
Copy link
Member

Awesome, how many further improvements this has triggered through failing tests. 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency domain:linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants