r - Elegant vectorization of nested for loop - Stack Overflow
I am trying to find an elegant and fast way to vectorrize the simple code below. It basically deals with nested for loops, but the nesting is unusual. The function special_print can be replaced by any function taking the sane vector of arguments.
Thanks for the help, Patrick
vector <- 1:30
special_print = function(i,j,k) {
print(c(vector[i], vector[j], vector[k]))
}
for (i in 1:30) {
for (j in i:30) {
for (k in j:30){
special_print(i,j,k)
}
}
}
My question is to find a way to generate a structure "index" to be used in the following code
apply(index, special_print, MARGIN = 1)
and generating the same output as above
I have tried the following subroutines, but they seem to be take too much time
is.increasing = function(x){
return( all(diff(x) > 0))
}
increasing_index = function(a) {
clean_index <- apply(a,is.increasing, MARGIN = 1)
b = a[clean_index == TRUE,]
return(b)
}
data <- replicate(1:30, 3) %>% as.data.frame()
a <- expand.grid(data) %>% as.data.frame()
index <- increasing_index(a)
I am trying to find an elegant and fast way to vectorrize the simple code below. It basically deals with nested for loops, but the nesting is unusual. The function special_print can be replaced by any function taking the sane vector of arguments.
Thanks for the help, Patrick
vector <- 1:30
special_print = function(i,j,k) {
print(c(vector[i], vector[j], vector[k]))
}
for (i in 1:30) {
for (j in i:30) {
for (k in j:30){
special_print(i,j,k)
}
}
}
My question is to find a way to generate a structure "index" to be used in the following code
apply(index, special_print, MARGIN = 1)
and generating the same output as above
I have tried the following subroutines, but they seem to be take too much time
is.increasing = function(x){
return( all(diff(x) > 0))
}
increasing_index = function(a) {
clean_index <- apply(a,is.increasing, MARGIN = 1)
b = a[clean_index == TRUE,]
return(b)
}
data <- replicate(1:30, 3) %>% as.data.frame()
a <- expand.grid(data) %>% as.data.frame()
index <- increasing_index(a)
Share
Improve this question
asked 20 hours ago
SuperLeoSuperLeo
211 silver badge2 bronze badges
3
|
5 Answers
Reset to default 5One thing slowing you down is working with data frames. Don't do that. Matrices are faster, especially with row operations and apply()
. Once you have a matrix, it is very fast to do a <- a[a[,1] <= a[,2] & a[,2] <= a[,3],]
.
So I'd write your code like this:
a <- as.matrix(expand.grid(1:30, 1:30, 1:30))
a <- a[a[,1] <= a[,2] & a[,2] <= a[,3],]
apply(a, special_print, MARGIN = 1)
This produces the rows in a different order than your for loops, with column 1 varying fastest. If that matters, you can do it this way:
a <- as.matrix(expand.grid(1:30, 1:30, 1:30))[,3:1]
a <- a[a[,1] <= a[,2] & a[,2] <= a[,3],]
apply(a, special_print, MARGIN = 1)
where the first line now reverses the order of the columns.
EDITED to add:
Here's an even better way, inspired by @ThomasIsCoding's answer:
a <- t(combn(1:32, 3) - 0:2)
This takes 3 items from 32, then keeps the first, subtracts 1 from the second, and 2 from the third. That gives a non-decreasing sequence of 3 chosen from 1 to 30. It assumes combn()
always returns the values in increasing order; I couldn't spot that as guaranteed in the docs, but it appears to be true in practice.
I would see if mapply(…, )
would succeed.
mapply( special_func(i=i, j=j, k=k),
i=rep(1:30, each=30*30),
j=rep(1:30, each =30, times=30),
k=rep(1:30, times=30*30)
)
The mapply
function should be efficient, but you are on notice that it is your “special function” is probably responsible for any perceived inefficiency. You should probably be examining its algorithms to see if if they can be vectorized.
Since you have the constraint of "strictly increasing", you can use combn
instead of expand.grid
. See the benchmark below
f1 <- function() {
data <- replicate(3, 1:30) %>% as.data.frame()
a <- expand.grid(data) %>% as.data.frame()
increasing_index(a)
}
f2 <- function() {
as.data.frame(do.call(rbind, combn(1:30, 3, simplify = FALSE)))
}
microbenchmark(
f1 = f1(),
f2 = f2(),
unit = "relative"
)
which gives
Unit: relative
expr min lq mean median uq max neval
f1 80.81545 78.37718 69.98764 77.82142 75.71047 38.96516 100
f2 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 100
It is quite interesting that you want to compute the indices rather than vectorize the function itself. Note that computing the indices and then using apply
is not a guarantee of vectorization. apply
is just a wrapper of for-loop. If your function is not vectorized, it is better to directly call the for-loop.
As I stated in the comments, the overhead of determining the indices prior to using the apply, will inturn make the function be slower than directly running the for loop.
special_print = function(i,j,k) {
i +j+k # simple addition. Assume not vectorized.
}
for_loop <- function(n){
index <- 1
results <- NA # vector to store results--growing after each iteration. BAD
for (i in seq_len(n)) {
for (j in i:n) {
for (k in j:n){
results[index] <- special_print(i,j,k)
index <- index + 1
}
}
}
results
}
use_apply <- function(n){
a <- t(combn(seq(1, n+2), 3) - 0:2)
apply(a, \(x)special_print(x[1],x[2],x[3]), MARGIN = 1)
}
microbenchmark::microbenchmark(use_for_loop = for_loop(100),
use_apply = use_apply(100), times = 10, check = 'equal')
Unit: milliseconds
expr min lq mean median uq max neval
use_for_loop 91.6665 98.6639 105.7263 103.1685 110.1368 126.3438 10
use_apply 267.7191 280.2255 310.4257 307.9291 331.7875 369.0543 10
Note that even with growing vector where the results are being stored, the plain for loop is still faster than apply. Therefore there is no efficiency of obtaining the indices just to loop over again the obtainded indices.
On the other hand, if the function was vectorized then it will be a different issue alltogether:
vectorized_fun <- function(n){
a <- t(combn(seq(1, n+2), 3) - 0:2)
special_print(a[,1], a[,2], a[,3]) ## vectorized function here
}
microbenchmark::microbenchmark(use_for_loop = for_loop(100),
use_combn=vectorized_fun(100), times = 10, check = 'equal')
Unit: milliseconds
expr min lq mean median uq max neval
use_for_loop 81.7556 83.0216 87.56850 84.70145 86.2191 107.7244 10
use_combn 58.7425 61.3285 64.19345 62.53805 63.6604 74.8008 10
In this case, we would even just have to optimize the computation of a
and will lead to better results. eg:
vectorized_fun2 <- function(n){
m <- seq_len(n)
m_1 <- m - 1
m_r <- n - m_1
e <- m_1 * n + m
g <- e + (e - 1) %/%n * n^2
h <- n - sequence(m_r, m_1)
u <- sequence(m_r, g, n + 1)
ind <- sequence(h, u) - 1
i <- ind%/% n^2 %% n + 1
j <- ind%/% n^1 %% n + 1
k <- ind%/% n^0 %% n + 1
special_print(i, j, k) ## vectorized function here
}
microbenchmark::microbenchmark(vectorized_fun2 = vectorized_fun2(100),
use_combn = vectorized_fun(100), times = 10, check = 'equal')
Unit: milliseconds
expr min lq mean median uq max neval
vectorized_fun2 7.5444 7.6159 8.11584 7.8443 8.6796 9.0854 10
use_combn 58.0109 61.0426 63.12551 62.1867 65.4991 68.5709 10
Thus as you can see, You need to use a vectorized function to be able to beat pure for-loop. Otherwise there is no improvement. Also if you cannot vetorize your function, consider using Rcpp/Fortran
For a dataframe with the last column incrementing, until reaching the max number, whereupon the second column increments until max, and then the first column increments:
library(tidyverse)
a <- replicate(3, tibble(1:30), simplify = F) %>% reduce(cross_join)
will produce:
> head(a)
# A tibble: 6 × 3
`1:30.x` `1:30.y` `1:30`
<int> <int> <int>
1 1 1 1
2 1 1 2
3 1 1 3
4 1 1 4
5 1 1 5
6 1 1 6
> tail(a)
# A tibble: 6 × 3
`1:30.x` `1:30.y` `1:30`
<int> <int> <int>
1 30 30 25
2 30 30 26
3 30 30 27
4 30 30 28
5 30 30 29
6 30 30 30
It is a little faster than the f1() and f2() functions above. The a <- t(combn(1:32, 3) - 0:2)
answer gets a little unordered from what you want, I believe, at the end.
If you want a matrix pipe in a %>% as.matrix()
at the end.
- 互联网“一哥”百度不行了?
- SaaS 的历史与变革
- 苹果允许预装软件可卸载 或因遭受安卓手机阵型冲击
- 手机防盗软件拍下“嫌疑人”可当呈堂证供
- Gmail SMTP relay with postfix - Stack Overflow
- amazon ec2 - Flask Babel inconsistent language switching in production - Stack Overflow
- excel - method range of object '_Global' failed trying to modify text - Stack Overflow
- How to Interact with iframe in Google Form “add file” feature Using Python and Playwright? - Stack Overflow
- html - Disable scrolling. Overflow:hidden not working - Stack Overflow
- kotlin - How to add SerialName annotation for a DTO class - Stack Overflow
- javascript - Is there any way of getting an error message from the browsers XslProcessor object when using xsl:message terminate
- matplotlib - Annotate each subplot from a list - Stack Overflow
- python - Windows java.io.EOFException error when trying to show spark dataframe - Stack Overflow
- perl - How to get a diagnostic from Moo when I try to set an unknown attribute in the constructor? - Stack Overflow
- python - Unable to get form data in database and getting type object 'UserRegister' has no attribute 'US
- system verilog - Do delta cycles occur at intermediate stages in SystemVerilog? - Stack Overflow
- keyboard - Issue with java.awt.Robot KeyEvent for Special Characters (e.g., : and ) - Stack Overflow
for
loops or the way of generatingindex
? – ThomasIsCoding Commented 18 hours agofor
loops are outside the body ofspecial_print
, so there's no recursion. – user2554330 Commented 16 hours ago