fitting grouped regression model and extrapolating











up vote
0
down vote

favorite












I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.



I have been following eddis's reply from Apply grouped model back onto data



combinedprofiles <- data.table(df)

#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]

#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]


I have tried adding a dataframe with the new temperatures as new data to the prediction.



  newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]


but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'



Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.



For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.



Thanks!










share|improve this question
























  • You are asking us to read too many of the neurons on your cerebral cortex.
    – 42-
    Nov 10 at 23:50

















up vote
0
down vote

favorite












I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.



I have been following eddis's reply from Apply grouped model back onto data



combinedprofiles <- data.table(df)

#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]

#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]


I have tried adding a dataframe with the new temperatures as new data to the prediction.



  newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]


but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'



Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.



For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.



Thanks!










share|improve this question
























  • You are asking us to read too many of the neurons on your cerebral cortex.
    – 42-
    Nov 10 at 23:50















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.



I have been following eddis's reply from Apply grouped model back onto data



combinedprofiles <- data.table(df)

#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]

#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]


I have tried adding a dataframe with the new temperatures as new data to the prediction.



  newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]


but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'



Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.



For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.



Thanks!










share|improve this question















I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t.
I would like to extrapolate the consumption per hour for temperatures where I do not have data.



I have been following eddis's reply from Apply grouped model back onto data



combinedprofiles <- data.table(df)

#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
keyby = h]

#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]


I have tried adding a dataframe with the new temperatures as new data to the prediction.



  newtemp<- data.frame(temp_round=c(6,7))
combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]


but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'



Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.



For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.



Thanks!







r dplyr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 12:36

























asked Nov 10 at 22:35









maaar

11




11












  • You are asking us to read too many of the neurons on your cerebral cortex.
    – 42-
    Nov 10 at 23:50




















  • You are asking us to read too many of the neurons on your cerebral cortex.
    – 42-
    Nov 10 at 23:50


















You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50






You are asking us to read too many of the neurons on your cerebral cortex.
– 42-
Nov 10 at 23:50














1 Answer
1






active

oldest

votes

















up vote
0
down vote













Interpolating



library(tidyverse)
library(data.table)


dplyr to clarify data.table solution you want:



df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows


We can just mutate() the fitted values



df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows


data.table



For this df, we can apply same logic.



setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]



  1. define new column pred by fitted values


  2. by each group Species


Then we get the same result:



df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440


Extrapolating



First of all, the colname of newdata should be set same as the model.



newtemp <- data.frame(Sepal.Width = c(6, 7))


As doing aggregation in data.table, you might do .(predict(mod, newdata)):



dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578




If you want newdata column for each group, you can just add the term inside the list .()



I implemented %>% for readability.



df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578





share|improve this answer























  • Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
    – maaar
    Nov 11 at 13:12










  • @maaar, Do you mean the newdata column?
    – Blended
    Nov 11 at 13:14










  • I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
    – Blended
    Nov 11 at 13:27











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244102%2ffitting-grouped-regression-model-and-extrapolating%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Interpolating



library(tidyverse)
library(data.table)


dplyr to clarify data.table solution you want:



df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows


We can just mutate() the fitted values



df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows


data.table



For this df, we can apply same logic.



setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]



  1. define new column pred by fitted values


  2. by each group Species


Then we get the same result:



df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440


Extrapolating



First of all, the colname of newdata should be set same as the model.



newtemp <- data.frame(Sepal.Width = c(6, 7))


As doing aggregation in data.table, you might do .(predict(mod, newdata)):



dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578




If you want newdata column for each group, you can just add the term inside the list .()



I implemented %>% for readability.



df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578





share|improve this answer























  • Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
    – maaar
    Nov 11 at 13:12










  • @maaar, Do you mean the newdata column?
    – Blended
    Nov 11 at 13:14










  • I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
    – Blended
    Nov 11 at 13:27















up vote
0
down vote













Interpolating



library(tidyverse)
library(data.table)


dplyr to clarify data.table solution you want:



df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows


We can just mutate() the fitted values



df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows


data.table



For this df, we can apply same logic.



setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]



  1. define new column pred by fitted values


  2. by each group Species


Then we get the same result:



df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440


Extrapolating



First of all, the colname of newdata should be set same as the model.



newtemp <- data.frame(Sepal.Width = c(6, 7))


As doing aggregation in data.table, you might do .(predict(mod, newdata)):



dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578




If you want newdata column for each group, you can just add the term inside the list .()



I implemented %>% for readability.



df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578





share|improve this answer























  • Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
    – maaar
    Nov 11 at 13:12










  • @maaar, Do you mean the newdata column?
    – Blended
    Nov 11 at 13:14










  • I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
    – Blended
    Nov 11 at 13:27













up vote
0
down vote










up vote
0
down vote









Interpolating



library(tidyverse)
library(data.table)


dplyr to clarify data.table solution you want:



df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows


We can just mutate() the fitted values



df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows


data.table



For this df, we can apply same logic.



setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]



  1. define new column pred by fitted values


  2. by each group Species


Then we get the same result:



df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440


Extrapolating



First of all, the colname of newdata should be set same as the model.



newtemp <- data.frame(Sepal.Width = c(6, 7))


As doing aggregation in data.table, you might do .(predict(mod, newdata)):



dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578




If you want newdata column for each group, you can just add the term inside the list .()



I implemented %>% for readability.



df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578





share|improve this answer














Interpolating



library(tidyverse)
library(data.table)


dplyr to clarify data.table solution you want:



df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows


We can just mutate() the fitted values



df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows


data.table



For this df, we can apply same logic.



setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]



  1. define new column pred by fitted values


  2. by each group Species


Then we get the same result:



df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440


Extrapolating



First of all, the colname of newdata should be set same as the model.



newtemp <- data.frame(Sepal.Width = c(6, 7))


As doing aggregation in data.table, you might do .(predict(mod, newdata)):



dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578




If you want newdata column for each group, you can just add the term inside the list .()



I implemented %>% for readability.



df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 11 at 13:24

























answered Nov 11 at 2:47









Blended

40617




40617












  • Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
    – maaar
    Nov 11 at 13:12










  • @maaar, Do you mean the newdata column?
    – Blended
    Nov 11 at 13:14










  • I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
    – Blended
    Nov 11 at 13:27


















  • Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
    – maaar
    Nov 11 at 13:12










  • @maaar, Do you mean the newdata column?
    – Blended
    Nov 11 at 13:14










  • I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
    – Blended
    Nov 11 at 13:27
















Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12




Thanks, that works great. Is there a way to add a column with the newdata to the prediction?
– maaar
Nov 11 at 13:12












@maaar, Do you mean the newdata column?
– Blended
Nov 11 at 13:14




@maaar, Do you mean the newdata column?
– Blended
Nov 11 at 13:14












I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
– Blended
Nov 11 at 13:27




I added the column. You can just write additional term inside list() of data.table. I think if we do not unlist your data.frame, it gets error.
– Blended
Nov 11 at 13:27


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244102%2ffitting-grouped-regression-model-and-extrapolating%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues