graphing a social network graph from data in dataframe [closed]
up vote
-2
down vote
favorite
I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?
dataframe example:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
I tried using graph.data.frame but that only gives connection between the first two columns.
r dataframe igraph social-networking
closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-2
down vote
favorite
I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?
dataframe example:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
I tried using graph.data.frame but that only gives connection between the first two columns.
r dataframe igraph social-networking
closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya
If this question can be reworded to fit the rules in the help center, please edit the question.
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
2
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?
dataframe example:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
I tried using graph.data.frame but that only gives connection between the first two columns.
r dataframe igraph social-networking
I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?
dataframe example:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
I tried using graph.data.frame but that only gives connection between the first two columns.
r dataframe igraph social-networking
r dataframe igraph social-networking
edited Nov 11 at 22:07
asked Nov 11 at 18:02
Duane Edwards
123
123
closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya
If this question can be reworded to fit the rules in the help center, please edit the question.
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
2
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26
add a comment |
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
2
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
2
2
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
We can try with the ggraph package, but we have to arrange well the data.
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .
Now we manage the data to put them in the graph:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

That should be coherent with data and:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
We can try with the ggraph package, but we have to arrange well the data.
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .
Now we manage the data to put them in the graph:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

That should be coherent with data and:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
add a comment |
up vote
2
down vote
accepted
We can try with the ggraph package, but we have to arrange well the data.
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .
Now we manage the data to put them in the graph:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

That should be coherent with data and:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
We can try with the ggraph package, but we have to arrange well the data.
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .
Now we manage the data to put them in the graph:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

That should be coherent with data and:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
We can try with the ggraph package, but we have to arrange well the data.
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .
Now we manage the data to put them in the graph:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

That should be coherent with data and:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
answered Nov 11 at 23:25
s_t
2,9552928
2,9552928
add a comment |
add a comment |
Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04
2
Please add a reproducible example.
– arg0naut
Nov 11 at 18:05
380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26