Aggregrate rows that are identical but for a single column [duplicate]












0
















This question already has an answer here:




  • Collapse / concatenate / aggregate a column to a single comma separated string within each group

    2 answers




I have a data frame that looks like:



entry color value1 value2
A blue 1 0
A green 1 0
B green 1 0
C red 0 1


I want to combine the rows that have the same value for all columns other than color for the output:



entry color        value1 value2
A blue, green 1 0
B green 1 0
C red 0 1


I have attempted to use plyr and ddply as suggested in answer to this question https://www.biostars.org/p/167028/. This code produces the correct output for this minimal worked example. However, my actual dataset has 600 columns so this is not a viable solution. Does anyone have any suggestions that will work with larger datasets?



df2 <-ddply(df, .(entry), summarize,
color=paste(unique(color),collapse=","),
value1=paste(unique(value1),collapse=",") ,
value2= paste(unique(value2),collapse=","))


The example data frame is:



df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))


Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input



First create a vector of column names, minus the column of interest, using its index:



cnames <- names(df)[-2]


Then using the group_by_at function from dplyr:



df %>%
group_by_at(vars(one_of(cnames))) %>%
summarise(color=paste(unique(color), collapse=",")) %>%
ungroup()









share|improve this question















marked as duplicate by Ronak Shah r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 13 '18 at 13:41


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.























    0
















    This question already has an answer here:




    • Collapse / concatenate / aggregate a column to a single comma separated string within each group

      2 answers




    I have a data frame that looks like:



    entry color value1 value2
    A blue 1 0
    A green 1 0
    B green 1 0
    C red 0 1


    I want to combine the rows that have the same value for all columns other than color for the output:



    entry color        value1 value2
    A blue, green 1 0
    B green 1 0
    C red 0 1


    I have attempted to use plyr and ddply as suggested in answer to this question https://www.biostars.org/p/167028/. This code produces the correct output for this minimal worked example. However, my actual dataset has 600 columns so this is not a viable solution. Does anyone have any suggestions that will work with larger datasets?



    df2 <-ddply(df, .(entry), summarize,
    color=paste(unique(color),collapse=","),
    value1=paste(unique(value1),collapse=",") ,
    value2= paste(unique(value2),collapse=","))


    The example data frame is:



    df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))


    Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input



    First create a vector of column names, minus the column of interest, using its index:



    cnames <- names(df)[-2]


    Then using the group_by_at function from dplyr:



    df %>%
    group_by_at(vars(one_of(cnames))) %>%
    summarise(color=paste(unique(color), collapse=",")) %>%
    ungroup()









    share|improve this question















    marked as duplicate by Ronak Shah r
    Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    Nov 13 '18 at 13:41


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.





















      0












      0








      0









      This question already has an answer here:




      • Collapse / concatenate / aggregate a column to a single comma separated string within each group

        2 answers




      I have a data frame that looks like:



      entry color value1 value2
      A blue 1 0
      A green 1 0
      B green 1 0
      C red 0 1


      I want to combine the rows that have the same value for all columns other than color for the output:



      entry color        value1 value2
      A blue, green 1 0
      B green 1 0
      C red 0 1


      I have attempted to use plyr and ddply as suggested in answer to this question https://www.biostars.org/p/167028/. This code produces the correct output for this minimal worked example. However, my actual dataset has 600 columns so this is not a viable solution. Does anyone have any suggestions that will work with larger datasets?



      df2 <-ddply(df, .(entry), summarize,
      color=paste(unique(color),collapse=","),
      value1=paste(unique(value1),collapse=",") ,
      value2= paste(unique(value2),collapse=","))


      The example data frame is:



      df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))


      Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input



      First create a vector of column names, minus the column of interest, using its index:



      cnames <- names(df)[-2]


      Then using the group_by_at function from dplyr:



      df %>%
      group_by_at(vars(one_of(cnames))) %>%
      summarise(color=paste(unique(color), collapse=",")) %>%
      ungroup()









      share|improve this question

















      This question already has an answer here:




      • Collapse / concatenate / aggregate a column to a single comma separated string within each group

        2 answers




      I have a data frame that looks like:



      entry color value1 value2
      A blue 1 0
      A green 1 0
      B green 1 0
      C red 0 1


      I want to combine the rows that have the same value for all columns other than color for the output:



      entry color        value1 value2
      A blue, green 1 0
      B green 1 0
      C red 0 1


      I have attempted to use plyr and ddply as suggested in answer to this question https://www.biostars.org/p/167028/. This code produces the correct output for this minimal worked example. However, my actual dataset has 600 columns so this is not a viable solution. Does anyone have any suggestions that will work with larger datasets?



      df2 <-ddply(df, .(entry), summarize,
      color=paste(unique(color),collapse=","),
      value1=paste(unique(value1),collapse=",") ,
      value2= paste(unique(value2),collapse=","))


      The example data frame is:



      df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))


      Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input



      First create a vector of column names, minus the column of interest, using its index:



      cnames <- names(df)[-2]


      Then using the group_by_at function from dplyr:



      df %>%
      group_by_at(vars(one_of(cnames))) %>%
      summarise(color=paste(unique(color), collapse=",")) %>%
      ungroup()




      This question already has an answer here:




      • Collapse / concatenate / aggregate a column to a single comma separated string within each group

        2 answers








      r






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 16:25







      RAS

















      asked Nov 13 '18 at 13:25









      RASRAS

      446




      446




      marked as duplicate by Ronak Shah r
      Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 13 '18 at 13:41


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









      marked as duplicate by Ronak Shah r
      Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 13 '18 at 13:41


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          1 Answer
          1






          active

          oldest

          votes


















          2














          You can try a tidyverse



          library(tidyverse)
          df %>%
          group_by(entry, value1, value2) %>%
          summarise(color=paste(unique(color), collapse=",")) %>%
          ungroup()
          # A tibble: 3 x 4
          entry value1 value2 color
          <fct> <dbl> <dbl> <chr>
          1 A 1 0 blue,green
          2 B 1 0 green
          3 C 0 1 red


          Remove the unique if you want to list duplicates as well like this
          summarise(color=toString(color))






          share|improve this answer





















          • 3





            FYI paste(x, collapse = ', ') is wrapped in toString(x)

            – Sotos
            Nov 13 '18 at 13:32













          • thanks for the hint...forgot this function :)

            – Jimbou
            Nov 13 '18 at 13:34






          • 1





            This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

            – RAS
            Nov 13 '18 at 16:28


















          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          You can try a tidyverse



          library(tidyverse)
          df %>%
          group_by(entry, value1, value2) %>%
          summarise(color=paste(unique(color), collapse=",")) %>%
          ungroup()
          # A tibble: 3 x 4
          entry value1 value2 color
          <fct> <dbl> <dbl> <chr>
          1 A 1 0 blue,green
          2 B 1 0 green
          3 C 0 1 red


          Remove the unique if you want to list duplicates as well like this
          summarise(color=toString(color))






          share|improve this answer





















          • 3





            FYI paste(x, collapse = ', ') is wrapped in toString(x)

            – Sotos
            Nov 13 '18 at 13:32













          • thanks for the hint...forgot this function :)

            – Jimbou
            Nov 13 '18 at 13:34






          • 1





            This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

            – RAS
            Nov 13 '18 at 16:28
















          2














          You can try a tidyverse



          library(tidyverse)
          df %>%
          group_by(entry, value1, value2) %>%
          summarise(color=paste(unique(color), collapse=",")) %>%
          ungroup()
          # A tibble: 3 x 4
          entry value1 value2 color
          <fct> <dbl> <dbl> <chr>
          1 A 1 0 blue,green
          2 B 1 0 green
          3 C 0 1 red


          Remove the unique if you want to list duplicates as well like this
          summarise(color=toString(color))






          share|improve this answer





















          • 3





            FYI paste(x, collapse = ', ') is wrapped in toString(x)

            – Sotos
            Nov 13 '18 at 13:32













          • thanks for the hint...forgot this function :)

            – Jimbou
            Nov 13 '18 at 13:34






          • 1





            This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

            – RAS
            Nov 13 '18 at 16:28














          2












          2








          2







          You can try a tidyverse



          library(tidyverse)
          df %>%
          group_by(entry, value1, value2) %>%
          summarise(color=paste(unique(color), collapse=",")) %>%
          ungroup()
          # A tibble: 3 x 4
          entry value1 value2 color
          <fct> <dbl> <dbl> <chr>
          1 A 1 0 blue,green
          2 B 1 0 green
          3 C 0 1 red


          Remove the unique if you want to list duplicates as well like this
          summarise(color=toString(color))






          share|improve this answer















          You can try a tidyverse



          library(tidyverse)
          df %>%
          group_by(entry, value1, value2) %>%
          summarise(color=paste(unique(color), collapse=",")) %>%
          ungroup()
          # A tibble: 3 x 4
          entry value1 value2 color
          <fct> <dbl> <dbl> <chr>
          1 A 1 0 blue,green
          2 B 1 0 green
          3 C 0 1 red


          Remove the unique if you want to list duplicates as well like this
          summarise(color=toString(color))







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 13:32

























          answered Nov 13 '18 at 13:31









          JimbouJimbou

          9,72611230




          9,72611230








          • 3





            FYI paste(x, collapse = ', ') is wrapped in toString(x)

            – Sotos
            Nov 13 '18 at 13:32













          • thanks for the hint...forgot this function :)

            – Jimbou
            Nov 13 '18 at 13:34






          • 1





            This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

            – RAS
            Nov 13 '18 at 16:28














          • 3





            FYI paste(x, collapse = ', ') is wrapped in toString(x)

            – Sotos
            Nov 13 '18 at 13:32













          • thanks for the hint...forgot this function :)

            – Jimbou
            Nov 13 '18 at 13:34






          • 1





            This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

            – RAS
            Nov 13 '18 at 16:28








          3




          3





          FYI paste(x, collapse = ', ') is wrapped in toString(x)

          – Sotos
          Nov 13 '18 at 13:32







          FYI paste(x, collapse = ', ') is wrapped in toString(x)

          – Sotos
          Nov 13 '18 at 13:32















          thanks for the hint...forgot this function :)

          – Jimbou
          Nov 13 '18 at 13:34





          thanks for the hint...forgot this function :)

          – Jimbou
          Nov 13 '18 at 13:34




          1




          1





          This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

          – RAS
          Nov 13 '18 at 16:28





          This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

          – RAS
          Nov 13 '18 at 16:28



          Popular posts from this blog

          Florida Star v. B. J. F.

          Error while running script in elastic search , gateway timeout

          Adding quotations to stringified JSON object values