Generate a crosstab with the means of two dataframes columns

I have two dataframes, one called "students.short", generate by:

students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),

 population=c("kuopio","kuopio","kuopio","tampere",

 "tampere","tampere","tampere","tampere"))



students.short



  shoesize population

1       38     kuopio

2       39     kuopio

3       38     kuopio

4       38     kuopio

5       39    tampere

6       38    tampere

7       37    tampere

8       36    tampere

and the other called "students.tall":

students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),

 population=c("kuopio","kuopio","kuopio","kuopio",

 "tampere","tampere","tampere","tampere"))



students.tall



  shoesize population

1       44     kuopio

2       42     kuopio

3       43     kuopio

4       43     kuopio

5       42    tampere

6       44    tampere

7       43    tampere

8       43    tampere

and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like

                       kuopio   tampere



studenst.short          38.3       37.6



studenst.tall             43         43

I can't find a clean or easy way to do that, any idea or any help, please?

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

1

Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20

what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20

@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23

Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45

add a comment |

I have two dataframes, one called "students.short", generate by:

students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),

 population=c("kuopio","kuopio","kuopio","tampere",

 "tampere","tampere","tampere","tampere"))



students.short



  shoesize population

1       38     kuopio

2       39     kuopio

3       38     kuopio

4       38     kuopio

5       39    tampere

6       38    tampere

7       37    tampere

8       36    tampere

and the other called "students.tall":

students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),

 population=c("kuopio","kuopio","kuopio","kuopio",

 "tampere","tampere","tampere","tampere"))



students.tall



  shoesize population

1       44     kuopio

2       42     kuopio

3       43     kuopio

4       43     kuopio

5       42    tampere

6       44    tampere

7       43    tampere

8       43    tampere

and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like

                       kuopio   tampere



studenst.short          38.3       37.6



studenst.tall             43         43

I can't find a clean or easy way to do that, any idea or any help, please?

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

1

Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20

what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20

@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23

Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45

add a comment |

I have two dataframes, one called "students.short", generate by:

students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),

 population=c("kuopio","kuopio","kuopio","tampere",

 "tampere","tampere","tampere","tampere"))



students.short



  shoesize population

1       38     kuopio

2       39     kuopio

3       38     kuopio

4       38     kuopio

5       39    tampere

6       38    tampere

7       37    tampere

8       36    tampere

and the other called "students.tall":

students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),

 population=c("kuopio","kuopio","kuopio","kuopio",

 "tampere","tampere","tampere","tampere"))



students.tall



  shoesize population

1       44     kuopio

2       42     kuopio

3       43     kuopio

4       43     kuopio

5       42    tampere

6       44    tampere

7       43    tampere

8       43    tampere

and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like

                       kuopio   tampere



studenst.short          38.3       37.6



studenst.tall             43         43

I can't find a clean or easy way to do that, any idea or any help, please?

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

I have two dataframes, one called "students.short", generate by:

students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),

 population=c("kuopio","kuopio","kuopio","tampere",

 "tampere","tampere","tampere","tampere"))



students.short



  shoesize population

1       38     kuopio

2       39     kuopio

3       38     kuopio

4       38     kuopio

5       39    tampere

6       38    tampere

7       37    tampere

8       36    tampere

and the other called "students.tall":

students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),

 population=c("kuopio","kuopio","kuopio","kuopio",

 "tampere","tampere","tampere","tampere"))



students.tall



  shoesize population

1       44     kuopio

2       42     kuopio

3       43     kuopio

4       43     kuopio

5       42    tampere

6       44    tampere

7       43    tampere

8       43    tampere

and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like

                       kuopio   tampere



studenst.short          38.3       37.6



studenst.tall             43         43

I can't find a clean or easy way to do that, any idea or any help, please?

r crosstab

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

edited Nov 16 '18 at 10:41

asked Nov 16 '18 at 10:14

Ángel

698

asked Nov 16 '18 at 10:14

Ángel

698

asked Nov 16 '18 at 10:14

Ángel

698

1

Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20

what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20

@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23

Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45

add a comment |

1

Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20

what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20

@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23

Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45

Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20

what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20

@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23

Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45

add a comment |

3 Answers
3

active

oldest

votes

In one go, using data.table

first, create a named list of the data.tables (using setDT() )

then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

last, dcast to wide format, summarising with mean of the value.var;
shoesize

code

library( data.table )



dcast( rbindlist( list( students.short = setDT( students.short ), 

                        students.tall = setDT( students.tall ) ),

                  idcol = TRUE ),

       .id ~ population, 

       value.var = "shoesize", 

       fun = mean )



#               .id   kuopio tampere

# 1: students.short 38.33333    37.6

# 2:  students.tall 43.00000    43.0

answered Nov 17 '18 at 10:47

Wimpel

6,302323

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

add a comment |

Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.

library(dplyr)



bind_rows(df1, df2, .id = 'group') %>% 

       group_by(group, population) %>% 

       summarise(new = mean(shoesize))

which gives,

# A tibble: 4 x 3

# Groups:   group [?]

  group population   new

  <chr> <fct>      <dbl>

1 1     kuopio      38.3

2 1     tampere     37.6

3 2     kuopio      43  

4 2     tampere     43

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

add a comment |

Combine your data frames using rbind() first:

df <- rbind(studnets.short, students.tall)

df$height_cat <- rep(c("short", "tall"), # create categorical height variable

   c(nrow(students.short), nrow(students.tall)))

Then use tapply(). For this mock data frame, it works like this:

df <- data.frame(size = round(rnorm(30, 39, 2)),

                 pop = sample(c("kuopio", "tampere"), 30, replace = T),

                 height = sample(c("short", "tall"), 30, replace = T))

tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)

# df[c(3, 2)] refers to height and pop columns of df respectively



       pop

height  kuopio  tampere

  short     39 39.57143

  tall      41 39.22222

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335682%2fgenerate-a-crosstab-with-the-means-of-two-dataframes-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

In one go, using data.table

first, create a named list of the data.tables (using setDT() )

then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

last, dcast to wide format, summarising with mean of the value.var;
shoesize

code

library( data.table )



dcast( rbindlist( list( students.short = setDT( students.short ), 

                        students.tall = setDT( students.tall ) ),

                  idcol = TRUE ),

       .id ~ population, 

       value.var = "shoesize", 

       fun = mean )



#               .id   kuopio tampere

# 1: students.short 38.33333    37.6

# 2:  students.tall 43.00000    43.0

answered Nov 17 '18 at 10:47

Wimpel

6,302323

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

add a comment |

In one go, using data.table

first, create a named list of the data.tables (using setDT() )

then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

last, dcast to wide format, summarising with mean of the value.var;
shoesize

code

library( data.table )



dcast( rbindlist( list( students.short = setDT( students.short ), 

                        students.tall = setDT( students.tall ) ),

                  idcol = TRUE ),

       .id ~ population, 

       value.var = "shoesize", 

       fun = mean )



#               .id   kuopio tampere

# 1: students.short 38.33333    37.6

# 2:  students.tall 43.00000    43.0

answered Nov 17 '18 at 10:47

Wimpel

6,302323

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

add a comment |

In one go, using data.table

first, create a named list of the data.tables (using setDT() )

then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

last, dcast to wide format, summarising with mean of the value.var;
shoesize

code

library( data.table )



dcast( rbindlist( list( students.short = setDT( students.short ), 

                        students.tall = setDT( students.tall ) ),

                  idcol = TRUE ),

       .id ~ population, 

       value.var = "shoesize", 

       fun = mean )



#               .id   kuopio tampere

# 1: students.short 38.33333    37.6

# 2:  students.tall 43.00000    43.0

answered Nov 17 '18 at 10:47

Wimpel

6,302323

In one go, using data.table

first, create a named list of the data.tables (using setDT() )

then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

last, dcast to wide format, summarising with mean of the value.var;
shoesize

code

library( data.table )



dcast( rbindlist( list( students.short = setDT( students.short ), 

                        students.tall = setDT( students.tall ) ),

                  idcol = TRUE ),

       .id ~ population, 

       value.var = "shoesize", 

       fun = mean )



#               .id   kuopio tampere

# 1: students.short 38.33333    37.6

# 2:  students.tall 43.00000    43.0

answered Nov 17 '18 at 10:47

Wimpel

6,302323

answered Nov 17 '18 at 10:47

Wimpel

6,302323

answered Nov 17 '18 at 10:47

Wimpel

6,302323

answered Nov 17 '18 at 10:47

Wimpel

6,302323

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

add a comment |

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

– Ángel
Nov 17 '18 at 18:25

add a comment |

library(dplyr)



bind_rows(df1, df2, .id = 'group') %>% 

       group_by(group, population) %>% 

       summarise(new = mean(shoesize))

which gives,

# A tibble: 4 x 3

# Groups:   group [?]

  group population   new

  <chr> <fct>      <dbl>

1 1     kuopio      38.3

2 1     tampere     37.6

3 2     kuopio      43  

4 2     tampere     43

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

add a comment |

library(dplyr)



bind_rows(df1, df2, .id = 'group') %>% 

       group_by(group, population) %>% 

       summarise(new = mean(shoesize))

which gives,

# A tibble: 4 x 3

# Groups:   group [?]

  group population   new

  <chr> <fct>      <dbl>

1 1     kuopio      38.3

2 1     tampere     37.6

3 2     kuopio      43  

4 2     tampere     43

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

add a comment |

library(dplyr)



bind_rows(df1, df2, .id = 'group') %>% 

       group_by(group, population) %>% 

       summarise(new = mean(shoesize))

which gives,

# A tibble: 4 x 3

# Groups:   group [?]

  group population   new

  <chr> <fct>      <dbl>

1 1     kuopio      38.3

2 1     tampere     37.6

3 2     kuopio      43  

4 2     tampere     43

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

library(dplyr)



bind_rows(df1, df2, .id = 'group') %>% 

       group_by(group, population) %>% 

       summarise(new = mean(shoesize))

which gives,

# A tibble: 4 x 3

# Groups:   group [?]

  group population   new

  <chr> <fct>      <dbl>

1 1     kuopio      38.3

2 1     tampere     37.6

3 2     kuopio      43  

4 2     tampere     43

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

answered Nov 16 '18 at 10:30

Sotos

31.2k51741

add a comment |

Combine your data frames using rbind() first:

df <- rbind(studnets.short, students.tall)

df$height_cat <- rep(c("short", "tall"), # create categorical height variable

   c(nrow(students.short), nrow(students.tall)))

Then use tapply(). For this mock data frame, it works like this:

df <- data.frame(size = round(rnorm(30, 39, 2)),

                 pop = sample(c("kuopio", "tampere"), 30, replace = T),

                 height = sample(c("short", "tall"), 30, replace = T))

tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)

# df[c(3, 2)] refers to height and pop columns of df respectively



       pop

height  kuopio  tampere

  short     39 39.57143

  tall      41 39.22222

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

add a comment |

Combine your data frames using rbind() first:

df <- rbind(studnets.short, students.tall)

df$height_cat <- rep(c("short", "tall"), # create categorical height variable

   c(nrow(students.short), nrow(students.tall)))

Then use tapply(). For this mock data frame, it works like this:

df <- data.frame(size = round(rnorm(30, 39, 2)),

                 pop = sample(c("kuopio", "tampere"), 30, replace = T),

                 height = sample(c("short", "tall"), 30, replace = T))

tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)

# df[c(3, 2)] refers to height and pop columns of df respectively



       pop

height  kuopio  tampere

  short     39 39.57143

  tall      41 39.22222

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

add a comment |

Combine your data frames using rbind() first:

df <- rbind(studnets.short, students.tall)

df$height_cat <- rep(c("short", "tall"), # create categorical height variable

   c(nrow(students.short), nrow(students.tall)))

Then use tapply(). For this mock data frame, it works like this:

df <- data.frame(size = round(rnorm(30, 39, 2)),

                 pop = sample(c("kuopio", "tampere"), 30, replace = T),

                 height = sample(c("short", "tall"), 30, replace = T))

tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)

# df[c(3, 2)] refers to height and pop columns of df respectively



       pop

height  kuopio  tampere

  short     39 39.57143

  tall      41 39.22222

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

Combine your data frames using rbind() first:

df <- rbind(studnets.short, students.tall)

df$height_cat <- rep(c("short", "tall"), # create categorical height variable

   c(nrow(students.short), nrow(students.tall)))

Then use tapply(). For this mock data frame, it works like this:

df <- data.frame(size = round(rnorm(30, 39, 2)),

                 pop = sample(c("kuopio", "tampere"), 30, replace = T),

                 height = sample(c("short", "tall"), 30, replace = T))

tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)

# df[c(3, 2)] refers to height and pop columns of df respectively



       pop

height  kuopio  tampere

  short     39 39.57143

  tall      41 39.22222

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

edited Nov 16 '18 at 10:30

answered Nov 16 '18 at 10:21

Milan Valášek

36319

answered Nov 16 '18 at 10:21

Milan Valášek

36319

answered Nov 16 '18 at 10:21

Milan Valášek

36319

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

FNKZU3

搜尋此網誌

Ndtyjky