Generate a crosstab with the means of two dataframes columns












0















I have two dataframes, one called "students.short", generate by:



students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))

students.short

shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere


and the other called "students.tall":



students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))

students.tall

shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere


and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like



                       kuopio   tampere

studenst.short 38.3 37.6

studenst.tall 43 43


I can't find a clean or easy way to do that, any idea or any help, please?










share|improve this question




















  • 1





    Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

    – Wimpel
    Nov 16 '18 at 10:20











  • what is kuopio and tampere?...Just for reference :)

    – Sotos
    Nov 16 '18 at 10:20













  • @Sotos, I think they're towns in Finlad?

    – Milan Valášek
    Nov 16 '18 at 10:23











  • Thanks for the advice @Wimpel

    – Ángel
    Nov 16 '18 at 10:45
















0















I have two dataframes, one called "students.short", generate by:



students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))

students.short

shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere


and the other called "students.tall":



students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))

students.tall

shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere


and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like



                       kuopio   tampere

studenst.short 38.3 37.6

studenst.tall 43 43


I can't find a clean or easy way to do that, any idea or any help, please?










share|improve this question




















  • 1





    Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

    – Wimpel
    Nov 16 '18 at 10:20











  • what is kuopio and tampere?...Just for reference :)

    – Sotos
    Nov 16 '18 at 10:20













  • @Sotos, I think they're towns in Finlad?

    – Milan Valášek
    Nov 16 '18 at 10:23











  • Thanks for the advice @Wimpel

    – Ángel
    Nov 16 '18 at 10:45














0












0








0








I have two dataframes, one called "students.short", generate by:



students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))

students.short

shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere


and the other called "students.tall":



students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))

students.tall

shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere


and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like



                       kuopio   tampere

studenst.short 38.3 37.6

studenst.tall 43 43


I can't find a clean or easy way to do that, any idea or any help, please?










share|improve this question
















I have two dataframes, one called "students.short", generate by:



students.short <- data.frame(shoesize=c(38,39,38,38,39,38,37,36),
population=c("kuopio","kuopio","kuopio","tampere",
"tampere","tampere","tampere","tampere"))

students.short

shoesize population
1 38 kuopio
2 39 kuopio
3 38 kuopio
4 38 kuopio
5 39 tampere
6 38 tampere
7 37 tampere
8 36 tampere


and the other called "students.tall":



students.tall <- data.frame(shoesize=c(44,42,43,43,42,44,43,43),
population=c("kuopio","kuopio","kuopio","kuopio",
"tampere","tampere","tampere","tampere"))

students.tall

shoesize population
1 44 kuopio
2 42 kuopio
3 43 kuopio
4 43 kuopio
5 42 tampere
6 44 tampere
7 43 tampere
8 43 tampere


and I need to create a crosstab between the population (kuopio or tampere) and the means of the shoesize of each dataframes like



                       kuopio   tampere

studenst.short 38.3 37.6

studenst.tall 43 43


I can't find a clean or easy way to do that, any idea or any help, please?







r crosstab






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 10:41







Ángel

















asked Nov 16 '18 at 10:14









ÁngelÁngel

698




698








  • 1





    Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

    – Wimpel
    Nov 16 '18 at 10:20











  • what is kuopio and tampere?...Just for reference :)

    – Sotos
    Nov 16 '18 at 10:20













  • @Sotos, I think they're towns in Finlad?

    – Milan Valášek
    Nov 16 '18 at 10:23











  • Thanks for the advice @Wimpel

    – Ángel
    Nov 16 '18 at 10:45














  • 1





    Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

    – Wimpel
    Nov 16 '18 at 10:20











  • what is kuopio and tampere?...Just for reference :)

    – Sotos
    Nov 16 '18 at 10:20













  • @Sotos, I think they're towns in Finlad?

    – Milan Valášek
    Nov 16 '18 at 10:23











  • Thanks for the advice @Wimpel

    – Ángel
    Nov 16 '18 at 10:45








1




1





Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20





Please present your data using dput. It makes it easier to import your data into R, and improves your chances for a great answer

– Wimpel
Nov 16 '18 at 10:20













what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20







what is kuopio and tampere?...Just for reference :)

– Sotos
Nov 16 '18 at 10:20















@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23





@Sotos, I think they're towns in Finlad?

– Milan Valášek
Nov 16 '18 at 10:23













Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45





Thanks for the advice @Wimpel

– Ángel
Nov 16 '18 at 10:45












3 Answers
3






active

oldest

votes


















1














In one go, using data.table




  • first, create a named list of the data.tables (using setDT() )

  • then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

  • last, dcast to wide format, summarising with mean of the value.var;
    shoesize


code



library( data.table )

dcast( rbindlist( list( students.short = setDT( students.short ),
students.tall = setDT( students.tall ) ),
idcol = TRUE ),
.id ~ population,
value.var = "shoesize",
fun = mean )

# .id kuopio tampere
# 1: students.short 38.33333 37.6
# 2: students.tall 43.00000 43.0





share|improve this answer
























  • Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

    – Ángel
    Nov 17 '18 at 18:25



















1














Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.



library(dplyr)

bind_rows(df1, df2, .id = 'group') %>%
group_by(group, population) %>%
summarise(new = mean(shoesize))


which gives,




# A tibble: 4 x 3
# Groups: group [?]
group population new
<chr> <fct> <dbl>
1 1 kuopio 38.3
2 1 tampere 37.6
3 2 kuopio 43
4 2 tampere 43






share|improve this answer































    0














    Combine your data frames using rbind() first:



    df <- rbind(studnets.short, students.tall)
    df$height_cat <- rep(c("short", "tall"), # create categorical height variable
    c(nrow(students.short), nrow(students.tall)))


    Then use tapply(). For this mock data frame, it works like this:



    df <- data.frame(size = round(rnorm(30, 39, 2)),
    pop = sample(c("kuopio", "tampere"), 30, replace = T),
    height = sample(c("short", "tall"), 30, replace = T))
    tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
    # df[c(3, 2)] refers to height and pop columns of df respectively

    pop
    height kuopio tampere
    short 39 39.57143
    tall 41 39.22222





    share|improve this answer


























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335682%2fgenerate-a-crosstab-with-the-means-of-two-dataframes-columns%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      In one go, using data.table




      • first, create a named list of the data.tables (using setDT() )

      • then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

      • last, dcast to wide format, summarising with mean of the value.var;
        shoesize


      code



      library( data.table )

      dcast( rbindlist( list( students.short = setDT( students.short ),
      students.tall = setDT( students.tall ) ),
      idcol = TRUE ),
      .id ~ population,
      value.var = "shoesize",
      fun = mean )

      # .id kuopio tampere
      # 1: students.short 38.33333 37.6
      # 2: students.tall 43.00000 43.0





      share|improve this answer
























      • Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

        – Ángel
        Nov 17 '18 at 18:25
















      1














      In one go, using data.table




      • first, create a named list of the data.tables (using setDT() )

      • then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

      • last, dcast to wide format, summarising with mean of the value.var;
        shoesize


      code



      library( data.table )

      dcast( rbindlist( list( students.short = setDT( students.short ),
      students.tall = setDT( students.tall ) ),
      idcol = TRUE ),
      .id ~ population,
      value.var = "shoesize",
      fun = mean )

      # .id kuopio tampere
      # 1: students.short 38.33333 37.6
      # 2: students.tall 43.00000 43.0





      share|improve this answer
























      • Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

        – Ángel
        Nov 17 '18 at 18:25














      1












      1








      1







      In one go, using data.table




      • first, create a named list of the data.tables (using setDT() )

      • then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

      • last, dcast to wide format, summarising with mean of the value.var;
        shoesize


      code



      library( data.table )

      dcast( rbindlist( list( students.short = setDT( students.short ),
      students.tall = setDT( students.tall ) ),
      idcol = TRUE ),
      .id ~ population,
      value.var = "shoesize",
      fun = mean )

      # .id kuopio tampere
      # 1: students.short 38.33333 37.6
      # 2: students.tall 43.00000 43.0





      share|improve this answer













      In one go, using data.table




      • first, create a named list of the data.tables (using setDT() )

      • then, bind the lists together (using rbindlist(), using the names as an id (idcol = TRUE).

      • last, dcast to wide format, summarising with mean of the value.var;
        shoesize


      code



      library( data.table )

      dcast( rbindlist( list( students.short = setDT( students.short ),
      students.tall = setDT( students.tall ) ),
      idcol = TRUE ),
      .id ~ population,
      value.var = "shoesize",
      fun = mean )

      # .id kuopio tampere
      # 1: students.short 38.33333 37.6
      # 2: students.tall 43.00000 43.0






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 17 '18 at 10:47









      WimpelWimpel

      6,302323




      6,302323













      • Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

        – Ángel
        Nov 17 '18 at 18:25



















      • Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

        – Ángel
        Nov 17 '18 at 18:25

















      Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

      – Ángel
      Nov 17 '18 at 18:25





      Thank you, I think this is the best answer - understood as a mixture of elegant simplicity and efficiency - that could be expected.

      – Ángel
      Nov 17 '18 at 18:25













      1














      Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.



      library(dplyr)

      bind_rows(df1, df2, .id = 'group') %>%
      group_by(group, population) %>%
      summarise(new = mean(shoesize))


      which gives,




      # A tibble: 4 x 3
      # Groups: group [?]
      group population new
      <chr> <fct> <dbl>
      1 1 kuopio 38.3
      2 1 tampere 37.6
      3 2 kuopio 43
      4 2 tampere 43






      share|improve this answer




























        1














        Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.



        library(dplyr)

        bind_rows(df1, df2, .id = 'group') %>%
        group_by(group, population) %>%
        summarise(new = mean(shoesize))


        which gives,




        # A tibble: 4 x 3
        # Groups: group [?]
        group population new
        <chr> <fct> <dbl>
        1 1 kuopio 38.3
        2 1 tampere 37.6
        3 2 kuopio 43
        4 2 tampere 43






        share|improve this answer


























          1












          1








          1







          Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.



          library(dplyr)

          bind_rows(df1, df2, .id = 'group') %>%
          group_by(group, population) %>%
          summarise(new = mean(shoesize))


          which gives,




          # A tibble: 4 x 3
          # Groups: group [?]
          group population new
          <chr> <fct> <dbl>
          1 1 kuopio 38.3
          2 1 tampere 37.6
          3 2 kuopio 43
          4 2 tampere 43






          share|improve this answer













          Here is a dplyr driven answer. We basically bind the two data frames first using the .id argument to differentiate between the data frames. We then group_by the .id and population and calculate the mean, i.e.



          library(dplyr)

          bind_rows(df1, df2, .id = 'group') %>%
          group_by(group, population) %>%
          summarise(new = mean(shoesize))


          which gives,




          # A tibble: 4 x 3
          # Groups: group [?]
          group population new
          <chr> <fct> <dbl>
          1 1 kuopio 38.3
          2 1 tampere 37.6
          3 2 kuopio 43
          4 2 tampere 43







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 16 '18 at 10:30









          SotosSotos

          31.2k51741




          31.2k51741























              0














              Combine your data frames using rbind() first:



              df <- rbind(studnets.short, students.tall)
              df$height_cat <- rep(c("short", "tall"), # create categorical height variable
              c(nrow(students.short), nrow(students.tall)))


              Then use tapply(). For this mock data frame, it works like this:



              df <- data.frame(size = round(rnorm(30, 39, 2)),
              pop = sample(c("kuopio", "tampere"), 30, replace = T),
              height = sample(c("short", "tall"), 30, replace = T))
              tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
              # df[c(3, 2)] refers to height and pop columns of df respectively

              pop
              height kuopio tampere
              short 39 39.57143
              tall 41 39.22222





              share|improve this answer






























                0














                Combine your data frames using rbind() first:



                df <- rbind(studnets.short, students.tall)
                df$height_cat <- rep(c("short", "tall"), # create categorical height variable
                c(nrow(students.short), nrow(students.tall)))


                Then use tapply(). For this mock data frame, it works like this:



                df <- data.frame(size = round(rnorm(30, 39, 2)),
                pop = sample(c("kuopio", "tampere"), 30, replace = T),
                height = sample(c("short", "tall"), 30, replace = T))
                tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
                # df[c(3, 2)] refers to height and pop columns of df respectively

                pop
                height kuopio tampere
                short 39 39.57143
                tall 41 39.22222





                share|improve this answer




























                  0












                  0








                  0







                  Combine your data frames using rbind() first:



                  df <- rbind(studnets.short, students.tall)
                  df$height_cat <- rep(c("short", "tall"), # create categorical height variable
                  c(nrow(students.short), nrow(students.tall)))


                  Then use tapply(). For this mock data frame, it works like this:



                  df <- data.frame(size = round(rnorm(30, 39, 2)),
                  pop = sample(c("kuopio", "tampere"), 30, replace = T),
                  height = sample(c("short", "tall"), 30, replace = T))
                  tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
                  # df[c(3, 2)] refers to height and pop columns of df respectively

                  pop
                  height kuopio tampere
                  short 39 39.57143
                  tall 41 39.22222





                  share|improve this answer















                  Combine your data frames using rbind() first:



                  df <- rbind(studnets.short, students.tall)
                  df$height_cat <- rep(c("short", "tall"), # create categorical height variable
                  c(nrow(students.short), nrow(students.tall)))


                  Then use tapply(). For this mock data frame, it works like this:



                  df <- data.frame(size = round(rnorm(30, 39, 2)),
                  pop = sample(c("kuopio", "tampere"), 30, replace = T),
                  height = sample(c("short", "tall"), 30, replace = T))
                  tapply(df$size, INDEX = df[c(3, 2)], mean, na.rm=T)
                  # df[c(3, 2)] refers to height and pop columns of df respectively

                  pop
                  height kuopio tampere
                  short 39 39.57143
                  tall 41 39.22222






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 16 '18 at 10:30

























                  answered Nov 16 '18 at 10:21









                  Milan ValášekMilan Valášek

                  36319




                  36319






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335682%2fgenerate-a-crosstab-with-the-means-of-two-dataframes-columns%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Florida Star v. B. J. F.

                      Danny Elfman

                      Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues