Aggregrate rows that are identical but for a single column [duplicate]

This question already has an answer here:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

2 answers

I have a data frame that looks like:

entry color value1 value2

A     blue  1      0

A     green 1      0

B     green 1      0

C     red   0      1

I want to combine the rows that have the same value for all columns other than color for the output:

entry color        value1 value2

A     blue, green  1      0

B     green        1      0

C     red          0      1

I have attempted to use plyr and ddply as suggested in answer to this question https://www.biostars.org/p/167028/. This code produces the correct output for this minimal worked example. However, my actual dataset has 600 columns so this is not a viable solution. Does anyone have any suggestions that will work with larger datasets?

df2 <-ddply(df, .(entry), summarize,

         color=paste(unique(color),collapse=","), 

         value1=paste(unique(value1),collapse=",") ,

         value2= paste(unique(value2),collapse=","))

The example data frame is:

df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))

Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input

First create a vector of column names, minus the column of interest, using its index:

cnames <- names(df)[-2]

Then using the group_by_at function from dplyr:

df %>%

 group_by_at(vars(one_of(cnames))) %>%

 summarise(color=paste(unique(color), collapse=",")) %>% 

 ungroup()

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

marked as duplicate by Ronak Shah r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 13 '18 at 13:41

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

2 answers

I have a data frame that looks like:

entry color value1 value2

A     blue  1      0

A     green 1      0

B     green 1      0

C     red   0      1

I want to combine the rows that have the same value for all columns other than color for the output:

entry color        value1 value2

A     blue, green  1      0

B     green        1      0

C     red          0      1

df2 <-ddply(df, .(entry), summarize,

         color=paste(unique(color),collapse=","), 

         value1=paste(unique(value1),collapse=",") ,

         value2= paste(unique(value2),collapse=","))

The example data frame is:

df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))

Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input

First create a vector of column names, minus the column of interest, using its index:

cnames <- names(df)[-2]

Then using the group_by_at function from dplyr:

df %>%

 group_by_at(vars(one_of(cnames))) %>%

 summarise(color=paste(unique(color), collapse=",")) %>% 

 ungroup()

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

marked as duplicate by Ronak Shah r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 13 '18 at 13:41

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

2 answers

I have a data frame that looks like:

entry color value1 value2

A     blue  1      0

A     green 1      0

B     green 1      0

C     red   0      1

I want to combine the rows that have the same value for all columns other than color for the output:

entry color        value1 value2

A     blue, green  1      0

B     green        1      0

C     red          0      1

df2 <-ddply(df, .(entry), summarize,

         color=paste(unique(color),collapse=","), 

         value1=paste(unique(value1),collapse=",") ,

         value2= paste(unique(value2),collapse=","))

The example data frame is:

df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))

Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input

First create a vector of column names, minus the column of interest, using its index:

cnames <- names(df)[-2]

Then using the group_by_at function from dplyr:

df %>%

 group_by_at(vars(one_of(cnames))) %>%

 summarise(color=paste(unique(color), collapse=",")) %>% 

 ungroup()

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

This question already has an answer here:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

2 answers

I have a data frame that looks like:

entry color value1 value2

A     blue  1      0

A     green 1      0

B     green 1      0

C     red   0      1

I want to combine the rows that have the same value for all columns other than color for the output:

entry color        value1 value2

A     blue, green  1      0

B     green        1      0

C     red          0      1

df2 <-ddply(df, .(entry), summarize,

         color=paste(unique(color),collapse=","), 

         value1=paste(unique(value1),collapse=",") ,

         value2= paste(unique(value2),collapse=","))

The example data frame is:

df <- data.frame("entry" = c("A", "A", "B", "C"),"color" = c("blue", "green", "green", "red"), "value1" = c(1,1,1,0), "value2" = c(0,0,0,1))

Edit: to solve the scalability problem I adapted the below answer and Group by multiple columns in dplyr, using string vector input

First create a vector of column names, minus the column of interest, using its index:

cnames <- names(df)[-2]

Then using the group_by_at function from dplyr:

df %>%

 group_by_at(vars(one_of(cnames))) %>%

 summarise(color=paste(unique(color), collapse=",")) %>% 

 ungroup()

This question already has an answer here:

Collapse / concatenate / aggregate a column to a single comma separated string within each group

2 answers

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

edited Nov 13 '18 at 16:25

asked Nov 13 '18 at 13:25

RAS

446

asked Nov 13 '18 at 13:25

RAS

446

asked Nov 13 '18 at 13:25

RAS

446

marked as duplicate by Ronak Shah r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 13 '18 at 13:41

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Ronak Shah r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 13 '18 at 13:41

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

1 Answer
1

active

oldest

votes

You can try a tidyverse

library(tidyverse)

df %>% 

  group_by(entry, value1, value2) %>% 

  summarise(color=paste(unique(color), collapse=",")) %>% 

  ungroup()

# A tibble: 3 x 4

  entry value1 value2 color     

  <fct>  <dbl>  <dbl> <chr>     

1 A          1      0 blue,green

2 B          1      0 green     

3 C          0      1 red

Remove the unique if you want to list duplicates as well like this
summarise(color=toString(color))

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

3

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

1

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can try a tidyverse

library(tidyverse)

df %>% 

  group_by(entry, value1, value2) %>% 

  summarise(color=paste(unique(color), collapse=",")) %>% 

  ungroup()

# A tibble: 3 x 4

  entry value1 value2 color     

  <fct>  <dbl>  <dbl> <chr>     

1 A          1      0 blue,green

2 B          1      0 green     

3 C          0      1 red

Remove the unique if you want to list duplicates as well like this
summarise(color=toString(color))

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

3

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

1

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

add a comment |

You can try a tidyverse

library(tidyverse)

df %>% 

  group_by(entry, value1, value2) %>% 

  summarise(color=paste(unique(color), collapse=",")) %>% 

  ungroup()

# A tibble: 3 x 4

  entry value1 value2 color     

  <fct>  <dbl>  <dbl> <chr>     

1 A          1      0 blue,green

2 B          1      0 green     

3 C          0      1 red

Remove the unique if you want to list duplicates as well like this
summarise(color=toString(color))

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

3

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

1

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

add a comment |

You can try a tidyverse

library(tidyverse)

df %>% 

  group_by(entry, value1, value2) %>% 

  summarise(color=paste(unique(color), collapse=",")) %>% 

  ungroup()

# A tibble: 3 x 4

  entry value1 value2 color     

  <fct>  <dbl>  <dbl> <chr>     

1 A          1      0 blue,green

2 B          1      0 green     

3 C          0      1 red

Remove the unique if you want to list duplicates as well like this
summarise(color=toString(color))

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

You can try a tidyverse

library(tidyverse)

df %>% 

  group_by(entry, value1, value2) %>% 

  summarise(color=paste(unique(color), collapse=",")) %>% 

  ungroup()

# A tibble: 3 x 4

  entry value1 value2 color     

  <fct>  <dbl>  <dbl> <chr>     

1 A          1      0 blue,green

2 B          1      0 green     

3 C          0      1 red

Remove the unique if you want to list duplicates as well like this
summarise(color=toString(color))

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

edited Nov 13 '18 at 13:32

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

answered Nov 13 '18 at 13:31

Jimbou

9,72611230

3

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

1

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

add a comment |

3

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

1

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

FYI paste(x, collapse = ', ') is wrapped in toString(x)

– Sotos
Nov 13 '18 at 13:32

thanks for the hint...forgot this function :)

– Jimbou
Nov 13 '18 at 13:34

This does work for the small number of columns in the minimal example, but was not scalable for 600 columns. Instead selecting a vector of column names and group_by_at solved the scalability problem.

– RAS
Nov 13 '18 at 16:28

add a comment |

This page is only for reference, If you need detailed information, please check here

zO 3IQAryr68ALVv,tTYEkHIT4m l,l,hE6aWgDCIwmACKp3rCPV7zdC0,mO,HgdoK667IkrZ fND,yBW3OGC8jv

搜尋此網誌

Ndtyjky