fitting grouped regression model and extrapolating
up vote
0
down vote
favorite
I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.
I have been following eddis's reply from Apply grouped model back onto data
combinedprofiles <- data.table(df)
#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]
#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]
I have tried adding a dataframe with the new temperatures as new data to the prediction.
newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]
but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'
Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.
For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.
Thanks!
r dplyr
add a comment |
up vote
0
down vote
favorite
I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.
I have been following eddis's reply from Apply grouped model back onto data
combinedprofiles <- data.table(df)
#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]
#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]
I have tried adding a dataframe with the new temperatures as new data to the prediction.
newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]
but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'
Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.
For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.
Thanks!
r dplyr
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.
I have been following eddis's reply from Apply grouped model back onto data
combinedprofiles <- data.table(df)
#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]
#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]
I have tried adding a dataframe with the new temperatures as new data to the prediction.
newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]
but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'
Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.
For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.
Thanks!
r dplyr
I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.
I have been following eddis's reply from Apply grouped model back onto data
combinedprofiles <- data.table(df)
#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]
#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]
I have tried adding a dataframe with the new temperatures as new data to the prediction.
newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]
but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'
Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.
For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.
Thanks!
r dplyr
r dplyr
edited Nov 11 at 12:36
asked Nov 10 at 22:35
maaar
11
11
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50
add a comment |
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
byfitted values
by
each groupSpecies
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean thenewdata
column?
– Blended
Nov 11 at 13:14
I added the column. You can just write additional term insidelist()
of data.table. I think if we do notunlist
your data.frame, it gets error.
– Blended
Nov 11 at 13:27
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
byfitted values
by
each groupSpecies
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean thenewdata
column?
– Blended
Nov 11 at 13:14
I added the column. You can just write additional term insidelist()
of data.table. I think if we do notunlist
your data.frame, it gets error.
– Blended
Nov 11 at 13:27
add a comment |
up vote
0
down vote
Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
byfitted values
by
each groupSpecies
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean thenewdata
column?
– Blended
Nov 11 at 13:14
I added the column. You can just write additional term insidelist()
of data.table. I think if we do notunlist
your data.frame, it gets error.
– Blended
Nov 11 at 13:27
add a comment |
up vote
0
down vote
up vote
0
down vote
Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
byfitted values
by
each groupSpecies
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578
Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
byfitted values
by
each groupSpecies
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578
edited Nov 11 at 13:24
answered Nov 11 at 2:47
Blended
40617
40617
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean thenewdata
column?
– Blended
Nov 11 at 13:14
I added the column. You can just write additional term insidelist()
of data.table. I think if we do notunlist
your data.frame, it gets error.
– Blended
Nov 11 at 13:27
add a comment |
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean thenewdata
column?
– Blended
Nov 11 at 13:14
I added the column. You can just write additional term insidelist()
of data.table. I think if we do notunlist
your data.frame, it gets error.
– Blended
Nov 11 at 13:27
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12
@maaar, Do you mean the
newdata
column?– Blended
Nov 11 at 13:14
@maaar, Do you mean the
newdata
column?– Blended
Nov 11 at 13:14
I added the column. You can just write additional term inside
list()
of data.table. I think if we do not unlist
your data.frame, it gets error.– Blended
Nov 11 at 13:27
I added the column. You can just write additional term inside
list()
of data.table. I think if we do not unlist
your data.frame, it gets error.– Blended
Nov 11 at 13:27
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244102%2ffitting-grouped-regression-model-and-extrapolating%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50