How do I aggregate certain columns from data frame by a Unique ID?












0















I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.



I have the following code:



aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
FUN=mean, na.rm = TRUE)


This function aggregates every single column. I am looking to only aggregate a certain amount of columns.



How would I include only certain columns?










share|improve this question























  • You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

    – thelatemail
    Nov 13 '18 at 1:30


















0















I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.



I have the following code:



aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
FUN=mean, na.rm = TRUE)


This function aggregates every single column. I am looking to only aggregate a certain amount of columns.



How would I include only certain columns?










share|improve this question























  • You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

    – thelatemail
    Nov 13 '18 at 1:30
















0












0








0








I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.



I have the following code:



aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
FUN=mean, na.rm = TRUE)


This function aggregates every single column. I am looking to only aggregate a certain amount of columns.



How would I include only certain columns?










share|improve this question














I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.



I have the following code:



aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
FUN=mean, na.rm = TRUE)


This function aggregates every single column. I am looking to only aggregate a certain amount of columns.



How would I include only certain columns?







r aggregate rscript






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 1:28









gracergracer

92




92













  • You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

    – thelatemail
    Nov 13 '18 at 1:30





















  • You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

    – thelatemail
    Nov 13 '18 at 1:30



















You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

– thelatemail
Nov 13 '18 at 1:30







You want to specify a variable to aggregate - aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE) . Alternatively, use the formula interface aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE) . See this old answer - stackoverflow.com/a/9723314/496803

– thelatemail
Nov 13 '18 at 1:30














3 Answers
3






active

oldest

votes


















1














If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:



pitchingstat %>%
group_by(PitcherID) %>%
summarise_at(vars(col1:coln), mean, na.rm = TRUE)


Check out link below for more examples:
https://dplyr.tidyverse.org/reference/summarise_all.html






share|improve this answer































    0














    Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)






    share|improve this answer































      0














      How about?:



      library(tidyverse)
      aggpitch <- pitchingstat %>%
      group_by(PitcherID) %>%
      summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here


      or



      library(tidyverse)
      aggpitch <- pitchingstat %>%
      select(var_1, var_2)
      group_by(PitcherID) %>%
      summarise(pitcher_mean = mean(var_1),
      pitcher_mean2 = mean(var_2))


      I think this works but could use a dummy example of your data to play with.






      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272463%2fhow-do-i-aggregate-certain-columns-from-data-frame-by-a-unique-id%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1














        If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:



        pitchingstat %>%
        group_by(PitcherID) %>%
        summarise_at(vars(col1:coln), mean, na.rm = TRUE)


        Check out link below for more examples:
        https://dplyr.tidyverse.org/reference/summarise_all.html






        share|improve this answer




























          1














          If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:



          pitchingstat %>%
          group_by(PitcherID) %>%
          summarise_at(vars(col1:coln), mean, na.rm = TRUE)


          Check out link below for more examples:
          https://dplyr.tidyverse.org/reference/summarise_all.html






          share|improve this answer


























            1












            1








            1







            If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:



            pitchingstat %>%
            group_by(PitcherID) %>%
            summarise_at(vars(col1:coln), mean, na.rm = TRUE)


            Check out link below for more examples:
            https://dplyr.tidyverse.org/reference/summarise_all.html






            share|improve this answer













            If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:



            pitchingstat %>%
            group_by(PitcherID) %>%
            summarise_at(vars(col1:coln), mean, na.rm = TRUE)


            Check out link below for more examples:
            https://dplyr.tidyverse.org/reference/summarise_all.html







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 13 '18 at 5:17









            On_an_islandOn_an_island

            758




            758

























                0














                Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)






                share|improve this answer




























                  0














                  Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)






                  share|improve this answer


























                    0












                    0








                    0







                    Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)






                    share|improve this answer













                    Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 13 '18 at 1:30









                    12b345b6b7812b345b6b78

                    782115




                    782115























                        0














                        How about?:



                        library(tidyverse)
                        aggpitch <- pitchingstat %>%
                        group_by(PitcherID) %>%
                        summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here


                        or



                        library(tidyverse)
                        aggpitch <- pitchingstat %>%
                        select(var_1, var_2)
                        group_by(PitcherID) %>%
                        summarise(pitcher_mean = mean(var_1),
                        pitcher_mean2 = mean(var_2))


                        I think this works but could use a dummy example of your data to play with.






                        share|improve this answer






























                          0














                          How about?:



                          library(tidyverse)
                          aggpitch <- pitchingstat %>%
                          group_by(PitcherID) %>%
                          summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here


                          or



                          library(tidyverse)
                          aggpitch <- pitchingstat %>%
                          select(var_1, var_2)
                          group_by(PitcherID) %>%
                          summarise(pitcher_mean = mean(var_1),
                          pitcher_mean2 = mean(var_2))


                          I think this works but could use a dummy example of your data to play with.






                          share|improve this answer




























                            0












                            0








                            0







                            How about?:



                            library(tidyverse)
                            aggpitch <- pitchingstat %>%
                            group_by(PitcherID) %>%
                            summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here


                            or



                            library(tidyverse)
                            aggpitch <- pitchingstat %>%
                            select(var_1, var_2)
                            group_by(PitcherID) %>%
                            summarise(pitcher_mean = mean(var_1),
                            pitcher_mean2 = mean(var_2))


                            I think this works but could use a dummy example of your data to play with.






                            share|improve this answer















                            How about?:



                            library(tidyverse)
                            aggpitch <- pitchingstat %>%
                            group_by(PitcherID) %>%
                            summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here


                            or



                            library(tidyverse)
                            aggpitch <- pitchingstat %>%
                            select(var_1, var_2)
                            group_by(PitcherID) %>%
                            summarise(pitcher_mean = mean(var_1),
                            pitcher_mean2 = mean(var_2))


                            I think this works but could use a dummy example of your data to play with.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 13 '18 at 4:49

























                            answered Nov 13 '18 at 4:43









                            QAsenaQAsena

                            404




                            404






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272463%2fhow-do-i-aggregate-certain-columns-from-data-frame-by-a-unique-id%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Florida Star v. B. J. F.

                                Danny Elfman

                                Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues