Model output predication limit of top 100 #25

dmhenke · 2025-04-07T16:14:43Z

At present, I do not see how to output more than the top 100 predicted hits on a model. For example:
_```
model <- paragraph2vec(x = df_d2v, type = "PV-DM"
vocab <- summary(model, type = "vocabulary", which = "docs")
Sentences <- "my bag of words"
sentences <- setNames(sentences, sentences)
sentences <- strsplit(sentences, split = " ")
model_predictions <-predict(
model,
newdata = sentences,
type = "nearest", which = "sent2doc", top_n = 100)

dim(model_predictions) is at max 100 rows

There appears to be no way to output the predictions for a model with more than 100 "vocabulary"s/doc_id.  Is there a workaround for generating predictions on all available "vocabulary"/doc_id?

Thank you,
David

The text was updated successfully, but these errors were encountered:

jwijffels · 2025-04-07T20:18:17Z

If you use sent2doc at the C++ side an array of length 100 is created - see https://github.com/bnosac/doc2vec/blob/master/src/rcpp_doc2vec.cpp#L151-L164
This array is fixed size, if you need a bigger array, you would have to rewrite a whole part of the C++ code to make it an extensible array. I remember 5 years ago trying this out but stopped due to the amount of work involved of rewriting the code.
That is also why I've put at the R side a stop condition which checks that top_n is not larger than 100: https://github.com/bnosac/doc2vec/blob/master/R/paragraph2vec.R#L380

lkmklsmn · 2025-04-09T20:57:06Z

I want to calculate and export the similarity between a single sentence and ALL docs in a given model. How do you suggest I go about accomplishing this?

jwijffels · 2025-04-10T05:18:33Z

You could remove the line at https://github.com/bnosac/doc2vec/blob/master/R/paragraph2vec.R#L380 and extend the array to bigger than 100 at https://github.com/bnosac/doc2vec/blob/master/src/rcpp_doc2vec.cpp#L151-L164 rebuild the package and test it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model output predication limit of top 100 #25

Model output predication limit of top 100 #25

dmhenke commented Apr 7, 2025

jwijffels commented Apr 7, 2025 •

edited

Loading

lkmklsmn commented Apr 9, 2025

jwijffels commented Apr 10, 2025

Model output predication limit of top 100 #25

Model output predication limit of top 100 #25

Comments

dmhenke commented Apr 7, 2025

dim(model_predictions) is at max 100 rows

jwijffels commented Apr 7, 2025 • edited Loading

lkmklsmn commented Apr 9, 2025

jwijffels commented Apr 10, 2025

jwijffels commented Apr 7, 2025 •

edited

Loading