Temporal sampling implementation #4994

ChuckHastings · 2025-03-20T04:44:05Z

Temporal sampling implementation. Sampling considers the time stamp of edges, if we arrive at a vertex v with timestamp t1, then when we depart from that vertex to continue sampling we only consider edges that occur after time t1.

PR includes C++ implementation and tests.

This significantly increases C++ compile time. We'll address this during 25.08.

copy-pr-bot · 2025-03-20T04:44:08Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

seunghwak

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

seunghwak · 2025-03-21T18:57:45Z

And for uniform sampling, we may use a uniform sampling primitive for seeds that appear no more than once and use a biased sampling primitive for seeds that appear two or more times.

ChuckHastings · 2025-03-21T19:13:21Z

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

So something akin to what node2vec does... return a bias of 0 if the edge time is invalid, return a bias of 1 if the edge time is valid. Because we're operating on the tagged vertex, each vertex would have its own timestamp... therefore its own computed bias.

If my interpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases where we end up with a high degree vertex that appears multiple times in the frontier.

seunghwak · 2025-03-21T19:38:29Z

erpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases w

Yes, your interpretation is correct. I agree that this will be simpler & faster. For uniform sampling and to avoid the overhead of evaluating bias for every edge, we can use just a default uniform sampling for seeds that appear only once, and use bias values & tagging for seeds that appear more than once.

…irected graph

alexbarghi-nv · 2025-04-28T21:25:54Z

@ChuckHastings what is the status of this PR? Will this be completed in release 25.06?

seunghwak

LGTM (besides few comments about minor cosmetic issues)

seunghwak · 2025-05-15T17:25:30Z

cpp/include/cugraph/detail/shuffle_wrappers.hpp

-template <typename vertex_t, typename value_t>
-std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<value_t>>
+template <typename vertex_t, typename value_vector_t>
+std::tuple<rmm::device_uvector<vertex_t>, value_vector_t>


You can just use dataframe_buffer_type_t (https://github.com/rapidsai/cugraph/blob/branch-25.06/cpp/include/cugraph/utilities/dataframe_buffer.hpp#L91) here.

Just replacing rmm::device_uvector<value_t> with dataframe_buffer_type_t<value_t> will work.

seunghwak · 2025-05-15T17:26:51Z

cpp/include/cugraph/edge_property.hpp

+template <typename... Ts, std::size_t... Is>
+auto view_concat_impl(std::tuple<Ts...> const& tuple, std::index_sequence<Is...>)
+{
+  return view_concat(std::get<Is>(tuple)...);
+}


A convention is to place the detail namespace code at the beginning of the file.

seunghwak · 2025-05-15T17:36:33Z

cpp/include/cugraph/utilities/dataframe_buffer.hpp

+{
+  static_assert(is_std_tuple_of_arithmetic_spans<std::remove_cv_t<BufferType>>::value);
+  if constexpr (is_std_tuple_of_arithmetic_spans<std::remove_cv_t<BufferType>>::value) {
+    return std::get<0>(buffer).size();


Why do we need this function? Can't the function above cover std::tuple of std::span as well?

seunghwak · 2025-05-15T17:47:29Z

cpp/src/sampling/detail/gather_one_hop_edgelist_impl.cuh

+    std::conditional_t<std::is_same_v<edge_properties_t, cuda::std::nullopt_t>,
+                       thrust::tuple<>,
+                       std::conditional_t<std::is_arithmetic_v<edge_properties_t>,
+                                          thrust::tuple<edge_properties_t>,
+                                          edge_properties_t>>


Isn't this edge_properties_tup_type defined above? We can just use it here.

seunghwak · 2025-05-15T18:09:06Z

cpp/src/sampling/detail/gather_one_hop_edgelist_impl.cuh

+  }
+}
+
+// FIXME:  Duplicated across files...


Can't we move this to detail/sampling_utils.hpp to avoid duplication?

seunghwak · 2025-05-15T18:10:11Z

cpp/src/sampling/detail/shuffle_and_organize_output_impl.cuh

@@ -16,9 +16,11 @@

 #pragma once

+#include "cugraph/utilities/shuffle_comm.cuh"
 #include "prims/update_edge_src_dst_property.cuh"  // ??


This might be from a previous PR but why ?? If we are not using this, can't we delete?

seunghwak · 2025-05-15T18:13:40Z

cpp/src/sampling/detail/update_temporal_edge_mask_impl.cuh

+
+  edge_src_property_t<edge_t, edge_time_t, false> edge_src_times(handle, graph_view);
+
+#if 0


Better delete the dead code.

seunghwak · 2025-05-15T18:14:02Z

cpp/src/sampling/detail/update_temporal_edge_mask_impl.cuh

+
+#if 0
+  // FIXME:  This call to update_edge_src_property seems like what I want, but it
+  //         doesn't work.


Do you know why? What happened with this?

This variation of the call is only used in a few places. MG SSSP, and Random Walks are the only places I see it.

Not sure if it's broken there and we haven't noticed, or if I'm doing something wrong here. But I printed the relevant contents of the arrays and they were incorrect. Since I use them as indices into device arrays, I could clearly see that the wrong values affected the output.

I spent a little time trying to diagnose, but didn't have time to really dig in. It's possible I've done something incorrect in the setup, but setting do_expensive_check didn't reveal anything wrong.

How can I reproduce this? I can dig in to see what's going awry here.

I'll try and create a reproducer for you.

seunghwak · 2025-05-15T18:15:25Z

cpp/src/utilities/shuffle_vertices.cuh

-template <typename vertex_t, typename value_t>
-std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<value_t>>
+template <typename vertex_t, typename value_vector_t>
+std::tuple<rmm::device_uvector<vertex_t>, value_vector_t>


We can use dataframe_buffer_type_t<value_t>

… performance until we have a good solution

…s other cuh files

… and get things to build in CI

Temporal sampling implementation, still debugging

dc3b296

ChuckHastings self-assigned this Mar 20, 2025

github-actions bot added cuGraph CMake labels Mar 20, 2025

ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cuGraph CMake labels Mar 20, 2025

ChuckHastings added 2 commits March 21, 2025 07:43

merge latest, resolve conflicts

125c993

Merge branch 'branch-25.04' into temporal_sampling_impl

a8c358d

github-actions bot added cuGraph CMake labels Mar 21, 2025

seunghwak reviewed Mar 21, 2025

View reviewed changes

ChuckHastings and others added 6 commits March 21, 2025 13:30

update validation routine to take span

2fd8a3a

after discussion, add back the halving of unnormalized results on und…

f0bb2d6

…irected graph

Merge branch 'branch-25.04' into temporal_sampling_impl

c324219

fix build_error

e80991f

finished debugging simple case

7fa3126

Merge branch 'branch-25.06' into temporal_sampling_impl

093c6ae

github-actions bot added python ci conda labels Apr 2, 2025

ChuckHastings added 3 commits April 2, 2025 10:24

Merge branch 'pr_5004' into temporal_sampling_impl

23c429a

incorporate Seunghwa's changes

f95ecd9

Merge branch 'branch-25.06' into temporal_sampling_impl

42433bf

github-actions bot removed benchmarks conda datasets labels May 13, 2025

ChuckHastings added 3 commits May 14, 2025 08:58

Merge branch 'branch-25.06' into temporal_sampling_impl

386ed07

address PR comments

9720d66

Merge branch 'branch-25.06' into temporal_sampling_impl

da7dc54

seunghwak approved these changes May 15, 2025

View reviewed changes

try turning down parallel level

55c069f

ChuckHastings requested a review from a team as a code owner May 15, 2025 22:49

ChuckHastings requested a review from bdice May 15, 2025 22:49

try turning down parallel level

06eaa3b

ChuckHastings requested a review from a team as a code owner May 16, 2025 02:52

github-actions bot added the ci label May 16, 2025

ChuckHastings added 7 commits May 16, 2025 09:13

set parallel level for CI

2788555

address PR comments

a37e7dd

missed adding new file in last push

86bf1a6

split up temporal implementation (very inelegant) to improve compiler…

d3b8fea

… performance until we have a good solution

update name of gather_one_hop_functions file to cuh, since it include…

313f4f3

…s other cuh files

Merge branch 'branch-25.06' into temporal_sampling_impl

86bc51f

fix a merge issue... revert hacks in build scripts that I used to try…

03c2e36

… and get things to build in CI

github-actions bot removed the ci label May 23, 2025

ChuckHastings removed request for a team and bdice May 23, 2025 20:36

ChuckHastings added 2 commits May 23, 2025 19:08

update shuffle call missed in merge

a9a87de

get rid of warning, try reducing parallelism in wheel build

72910f2

github-actions bot added the ci label May 25, 2025

ChuckHastings changed the base branch from branch-25.06 to branch-25.08 May 29, 2025 20:51

Merge branch 'branch-25.08' into temporal_sampling_impl

25ab94d

github-actions bot removed the ci label Jun 13, 2025


		edge_src_property_t<edge_t, edge_time_t, false> edge_src_times(handle, graph_view);

		#if 0

Temporal sampling implementation #4994

Are you sure you want to change the base?

Temporal sampling implementation #4994

Uh oh!

Conversation

ChuckHastings commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Mar 20, 2025

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

seunghwak commented Mar 21, 2025

Uh oh!

ChuckHastings commented Mar 21, 2025

Uh oh!

seunghwak commented Mar 21, 2025

Uh oh!

alexbarghi-nv commented Apr 28, 2025

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChuckHastings commented Mar 20, 2025 •

edited

Loading