Generate a crosstab with the means of two dataframes columns
I have two dataframes, one called "students.short", generate by:
students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))
students.short
shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere
and the other called "students.tall":
students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))
students.tall
shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere
and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like
kuopio tampere
studenst.short 38.3 37.6
studenst.tall 43 43
I can't find a clean or easy way to do that, any idea or any help, please?
r crosstab
add a comment |
I have two dataframes, one called "students.short", generate by:
students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))
students.short
shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere
and the other called "students.tall":
students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))
students.tall
shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere
and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like
kuopio tampere
studenst.short 38.3 37.6
studenst.tall 43 43
I can't find a clean or easy way to do that, any idea or any help, please?
r crosstab
1
Please present your data usingdput
. It makes it easier to import your data into R, and improves your chances for a great answer
– Wimpel
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45
add a comment |
I have two dataframes, one called "students.short", generate by:
students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))
students.short
shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere
and the other called "students.tall":
students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))
students.tall
shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere
and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like
kuopio tampere
studenst.short 38.3 37.6
studenst.tall 43 43
I can't find a clean or easy way to do that, any idea or any help, please?
r crosstab
I have two dataframes, one called "students.short", generate by:
students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))
students.short
shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere
and the other called "students.tall":
students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))
students.tall
shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere
and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like
kuopio tampere
studenst.short 38.3 37.6
studenst.tall 43 43
I can't find a clean or easy way to do that, any idea or any help, please?
r crosstab
r crosstab
edited Nov 16 '18 at 10:41
Ángel
asked Nov 16 '18 at 10:14
ÁngelÁngel
698
698
1
Please present your data usingdput
. It makes it easier to import your data into R, and improves your chances for a great answer
– Wimpel
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45
add a comment |
1
Please present your data usingdput
. It makes it easier to import your data into R, and improves your chances for a great answer
– Wimpel
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45
1
1
Please present your data using
dput
. It makes it easier to import your data into R, and improves your chances for a great answer– Wimpel
Nov 16 '18 at 10:20
Please present your data using
dput
. It makes it easier to import your data into R, and improves your chances for a great answer– Wimpel
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45
add a comment |
3 Answers
3
active
oldest
votes
In one go, using data.table
- first, create a named list of the data.tables (using
setDT()
) - then, bind the lists together (using
rbindlist()
, using the names as an id (idcol = TRUE
). - last,
dcast
to wide format, summarising withmean
of thevalue.var
;
shoesize
code
library( data.table )
dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )
# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
add a comment |
Here is a dplyr
driven answer. We basically bind the two data frames first using the .id
argument to differentiate between the data frames. We then group_by
the .id
and population
and calculate the mean, i.e.
library(dplyr)
bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))
which gives,
# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43
add a comment |
Combine your data frames using rbind()
first:
df <- rbind(studnets.short, students.tall)
df$height_cat <- rep(c("short", "tall"), # create categorical height variable
c(nrow(students.short), nrow(students.tall)))
Then use tapply()
. For this mock data frame, it works like this:
df <- data.frame(size = round(rnorm(30, 39, 2)),
pop = sample(c("kuopio", "tampere"), 30, replace = T),
height = sample(c("short", "tall"), 30, replace = T))
tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
# df[c(3, 2)] refers to height and pop columns of df respectively
pop
height kuopio tampere
short 39 39.57143
tall 41 39.22222
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335682%2fgenerate-a-crosstab-with-the-means-of-two-dataframes-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
In one go, using data.table
- first, create a named list of the data.tables (using
setDT()
) - then, bind the lists together (using
rbindlist()
, using the names as an id (idcol = TRUE
). - last,
dcast
to wide format, summarising withmean
of thevalue.var
;
shoesize
code
library( data.table )
dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )
# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
add a comment |
In one go, using data.table
- first, create a named list of the data.tables (using
setDT()
) - then, bind the lists together (using
rbindlist()
, using the names as an id (idcol = TRUE
). - last,
dcast
to wide format, summarising withmean
of thevalue.var
;
shoesize
code
library( data.table )
dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )
# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
add a comment |
In one go, using data.table
- first, create a named list of the data.tables (using
setDT()
) - then, bind the lists together (using
rbindlist()
, using the names as an id (idcol = TRUE
). - last,
dcast
to wide format, summarising withmean
of thevalue.var
;
shoesize
code
library( data.table )
dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )
# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0
In one go, using data.table
- first, create a named list of the data.tables (using
setDT()
) - then, bind the lists together (using
rbindlist()
, using the names as an id (idcol = TRUE
). - last,
dcast
to wide format, summarising withmean
of thevalue.var
;
shoesize
code
library( data.table )
dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )
# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0
answered Nov 17 '18 at 10:47
WimpelWimpel
6,302323
6,302323
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
add a comment |
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.
– Ángel
Nov 17 '18 at 18:25
add a comment |
Here is a dplyr
driven answer. We basically bind the two data frames first using the .id
argument to differentiate between the data frames. We then group_by
the .id
and population
and calculate the mean, i.e.
library(dplyr)
bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))
which gives,
# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43
add a comment |
Here is a dplyr
driven answer. We basically bind the two data frames first using the .id
argument to differentiate between the data frames. We then group_by
the .id
and population
and calculate the mean, i.e.
library(dplyr)
bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))
which gives,
# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43
add a comment |
Here is a dplyr
driven answer. We basically bind the two data frames first using the .id
argument to differentiate between the data frames. We then group_by
the .id
and population
and calculate the mean, i.e.
library(dplyr)
bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))
which gives,
# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43
Here is a dplyr
driven answer. We basically bind the two data frames first using the .id
argument to differentiate between the data frames. We then group_by
the .id
and population
and calculate the mean, i.e.
library(dplyr)
bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))
which gives,
# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43
answered Nov 16 '18 at 10:30
SotosSotos
31.2k51741
31.2k51741
add a comment |
add a comment |
Combine your data frames using rbind()
first:
df <- rbind(studnets.short, students.tall)
df$height_cat <- rep(c("short", "tall"), # create categorical height variable
c(nrow(students.short), nrow(students.tall)))
Then use tapply()
. For this mock data frame, it works like this:
df <- data.frame(size = round(rnorm(30, 39, 2)),
pop = sample(c("kuopio", "tampere"), 30, replace = T),
height = sample(c("short", "tall"), 30, replace = T))
tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
# df[c(3, 2)] refers to height and pop columns of df respectively
pop
height kuopio tampere
short 39 39.57143
tall 41 39.22222
add a comment |
Combine your data frames using rbind()
first:
df <- rbind(studnets.short, students.tall)
df$height_cat <- rep(c("short", "tall"), # create categorical height variable
c(nrow(students.short), nrow(students.tall)))
Then use tapply()
. For this mock data frame, it works like this:
df <- data.frame(size = round(rnorm(30, 39, 2)),
pop = sample(c("kuopio", "tampere"), 30, replace = T),
height = sample(c("short", "tall"), 30, replace = T))
tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
# df[c(3, 2)] refers to height and pop columns of df respectively
pop
height kuopio tampere
short 39 39.57143
tall 41 39.22222
add a comment |
Combine your data frames using rbind()
first:
df <- rbind(studnets.short, students.tall)
df$height_cat <- rep(c("short", "tall"), # create categorical height variable
c(nrow(students.short), nrow(students.tall)))
Then use tapply()
. For this mock data frame, it works like this:
df <- data.frame(size = round(rnorm(30, 39, 2)),
pop = sample(c("kuopio", "tampere"), 30, replace = T),
height = sample(c("short", "tall"), 30, replace = T))
tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
# df[c(3, 2)] refers to height and pop columns of df respectively
pop
height kuopio tampere
short 39 39.57143
tall 41 39.22222
Combine your data frames using rbind()
first:
df <- rbind(studnets.short, students.tall)
df$height_cat <- rep(c("short", "tall"), # create categorical height variable
c(nrow(students.short), nrow(students.tall)))
Then use tapply()
. For this mock data frame, it works like this:
df <- data.frame(size = round(rnorm(30, 39, 2)),
pop = sample(c("kuopio", "tampere"), 30, replace = T),
height = sample(c("short", "tall"), 30, replace = T))
tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
# df[c(3, 2)] refers to height and pop columns of df respectively
pop
height kuopio tampere
short 39 39.57143
tall 41 39.22222
edited Nov 16 '18 at 10:30
answered Nov 16 '18 at 10:21
Milan ValášekMilan Valášek
36319
36319
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335682%2fgenerate-a-crosstab-with-the-means-of-two-dataframes-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please present your data using
dput
. It makes it easier to import your data into R, and improves your chances for a great answer– Wimpel
Nov 16 '18 at 10:20
what is kuopio and tampere?...Just for reference :)
– Sotos
Nov 16 '18 at 10:20
@Sotos, I think they're towns in Finlad?
– Milan Valášek
Nov 16 '18 at 10:23
Thanks for the advice @Wimpel
– Ángel
Nov 16 '18 at 10:45