Skip to content
This repository was archived by the owner on Jun 17, 2024. It is now read-only.

Using the min function in Naiad #22

Open
tmalhotra003 opened this issue Feb 9, 2015 · 2 comments
Open

Using the min function in Naiad #22

tmalhotra003 opened this issue Feb 9, 2015 · 2 comments

Comments

@tmalhotra003
Copy link

Hi Team Naiad,

I am fairly new to Naiad (and must say that it sounds really cool!). I was trying to write a simple Console Application in Visual Studio, where given a set of words, I can find the length of the smallest word. With each epoch of data, I want to calculate and return the length of the smallest word. To do so in Naiad,I wrote the following snippet:

var words = computation.NewInputCollection();
words.SelectMany(x => x.Split(' '))
.Min(w => w, w => w.Length )
.Subscribe(l => { foreach (var element in l) Console.WriteLine(element); });

With this snippet, I am not getting any word lengths at all, and I wanted to ask if I am using the functions right. Any advise/ help would be greatly appreciated!
Thanks in advance!

Best Regards,
Tanya

@frankmcsherry
Copy link
Contributor

Hi Tanya,

This snippet looks good to me. The other culprit may be the snippet of code where you introduce data into the dataflow (the above just describes what happens once it is introduced). Would you be willing to share that snippet also so we can see if there is an obvious culprit?

Thanks,
Frank

Edit: Actually, the program as written doesn't match the verbal description you've supplied, which we can talk about once we actually have it producing any data at all. As written, the program will do a data-parallel min on the collection of strings, where each group of identical strings is reduced to the string with smallest length; since they all have the same length, this should be behaving more like a Distinct. To get closer to the behavior you indicated, I would consider replacing the w => w with w => true, which will cause all records to land in the same bucket (and remove much of the parallelism for the moment; I can explain how to get that back in a bit).

@frankmcsherry
Copy link
Contributor

Hello,

I think NewInputCollection should work (unless something is surprisingly broken, which is possible). What I don't see in the program yet (perhaps I am confused) is the point at which you call words.OnNext(some_data). The snippet above activates the computation, but then exits without introducing any data. I would expect something more like (caveat: typed into a text box; not tested):

using (var computation = NewComputation.FromArgs(ref args))
{
    var words = computation.NewInputCollection();

    words.SelectMany(x => x.Split(' '))
         .Min(w=>true,w=>w.Length)
         .Subscribe(l => { foreach (var element in l) Console.WriteLine(element); });

    computation.Activate();

    words.OnNext(new [] { "hello world" });
    words.OnNext(new [] { "hi there" });
    words.OnNext(new [] { "bye for now" });
    words.OnCompleted();

    computation.Join()
}

Looking at the interface, it seems OnNext takes as an argument an IEnumerable<Weighted<TRecord>>, so this example isn't quite right (the strings would need to be weighted, using .ToWeighted(weight)).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants