graphing a social network graph from data in dataframe [closed]











up vote
-2
down vote

favorite












I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?



dataframe example:



data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))


I tried using graph.data.frame but that only gives connection between the first two columns.










share|improve this question















closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya

If this question can be reworded to fit the rules in the help center, please edit the question.













  • Are there 380 individuals or 9?
    – hrbrmstr
    Nov 11 at 18:04






  • 2




    Please add a reproducible example.
    – arg0naut
    Nov 11 at 18:05










  • 380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
    – Duane Edwards
    Nov 11 at 18:26















up vote
-2
down vote

favorite












I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?



dataframe example:



data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))


I tried using graph.data.frame but that only gives connection between the first two columns.










share|improve this question















closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya

If this question can be reworded to fit the rules in the help center, please edit the question.













  • Are there 380 individuals or 9?
    – hrbrmstr
    Nov 11 at 18:04






  • 2




    Please add a reproducible example.
    – arg0naut
    Nov 11 at 18:05










  • 380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
    – Duane Edwards
    Nov 11 at 18:26













up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?



dataframe example:



data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))


I tried using graph.data.frame but that only gives connection between the first two columns.










share|improve this question















I have a dataframe of 380 observations of 9 variables. The data represents the cooperation between persons doing similar projects. In the first column is the main node and the other columns represent the persons s/he cooperated with on a project which each column representing one person. So if perchance researcher in row 1 column 1 cooperated with five persons, their names will be in five columns, and if researcher in row 2 column 1 cooperated with 3 persons their names will be in the other three columns. Obviously there will be many empty columns as not all researchers cooperate with same amount of persons. With this data, how do I plot this into a network graph?



dataframe example:



data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))


I tried using graph.data.frame but that only gives connection between the first two columns.







r dataframe igraph social-networking






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 22:07

























asked Nov 11 at 18:02









Duane Edwards

123




123




closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya

If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by phiver, Cindy Meister, jogo, Rui Barradas, Shiladitya Nov 12 at 2:37


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – phiver, jogo, Shiladitya

If this question can be reworded to fit the rules in the help center, please edit the question.












  • Are there 380 individuals or 9?
    – hrbrmstr
    Nov 11 at 18:04






  • 2




    Please add a reproducible example.
    – arg0naut
    Nov 11 at 18:05










  • 380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
    – Duane Edwards
    Nov 11 at 18:26


















  • Are there 380 individuals or 9?
    – hrbrmstr
    Nov 11 at 18:04






  • 2




    Please add a reproducible example.
    – arg0naut
    Nov 11 at 18:05










  • 380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
    – Duane Edwards
    Nov 11 at 18:26
















Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04




Are there 380 individuals or 9?
– hrbrmstr
Nov 11 at 18:04




2




2




Please add a reproducible example.
– arg0naut
Nov 11 at 18:05




Please add a reproducible example.
– arg0naut
Nov 11 at 18:05












380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26




380 represents the amount of research projects. for some projects there are only one researcher, while for others they are as much as nine.
– Duane Edwards
Nov 11 at 18:26












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










We can try with the ggraph package, but we have to arrange well the data.



# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels


First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).



edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips


Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .



Now we manage the data to put them in the graph:



# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))

# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)

# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)

# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))


enter image description here



That should be coherent with data and:



> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1





share|improve this answer




























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    We can try with the ggraph package, but we have to arrange well the data.



    # this are your data
    data <- data.frame(
    author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
    author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
    author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
    author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

    # here you load some nice package
    library(tidyr) # to tidy the data
    library(ggraph) # to plot nice network data with the semantic of ggplot
    library(tidygraph) # to work with networks
    library(ggrepel) # to not have overlapping labels


    First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).



    edges <-rbind(
    expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
    expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
    expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
    )
    edges
    # A tibble: 15 x 2
    a b
    <fct> <fct>
    1 Joan Terrence
    2 John Joan
    3 Kerry Rick
    4 Michelle N/A
    5 Paul Collin
    6 Joan Joan
    7 John Terrence
    8 Kerry Michelle
    9 Michelle Michelle
    10 Paul Paul
    11 Joan N/A
    12 John Michelle
    13 Kerry Collin
    14 Michelle N/A
    15 Paul Phillips


    Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .



    Now we manage the data to put them in the graph:



    # create edges
    edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

    # create nodes
    nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
    group_by(researcher) %>%
    summarise(n = sum(n))

    # now we have to have the match between edges and nodes
    edges1$a <- match(edges1$a, nodes$researcher)
    edges1$b <- match(edges1$b, nodes$researcher)

    # declare the data as graph data
    tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
    tidy <- tidy %>%
    activate(edges) %>%
    arrange(desc(weight)
    )

    # now the plot: you have several options to do, here a basic one
    ggraph(tidy, layout = "gem") +
    geom_node_point(aes(size=n)) + # size of the node the frequency
    geom_edge_link(aes(width = weight), # here you set the edges
    # thickness as frequency
    arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
    end_cap = circle(3, 'mm'), alpha = 0.8) +
    scale_edge_width(range = c(0.2, 2)) +
    geom_text_repel(aes(x = x, y=y , label=researcher))


    enter image description here



    That should be coherent with data and:



    > edges1
    # A tibble: 14 x 3
    # Groups: a [?]
    a b weight
    <int> <int> <int>
    1 1 1 1
    2 1 7 1
    3 1 9 1
    4 2 1 1
    5 2 9 1
    6 2 4 1
    7 3 6 1
    8 3 8 1
    9 3 4 1
    10 4 7 2
    11 4 4 1
    12 5 6 1
    13 5 5 1
    14 5 10 1
    > nodes
    # A tibble: 10 x 2
    researcher n
    <fct> <dbl>
    1 Joan 5
    2 John 3
    3 Kerry 3
    4 Michelle 6
    5 Paul 4
    6 Collin 2
    7 N/A 3
    8 Rick 1
    9 Terrence 2
    10 Phillips 1





    share|improve this answer

























      up vote
      2
      down vote



      accepted










      We can try with the ggraph package, but we have to arrange well the data.



      # this are your data
      data <- data.frame(
      author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
      author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
      author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
      author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

      # here you load some nice package
      library(tidyr) # to tidy the data
      library(ggraph) # to plot nice network data with the semantic of ggplot
      library(tidygraph) # to work with networks
      library(ggrepel) # to not have overlapping labels


      First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).



      edges <-rbind(
      expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
      expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
      expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
      )
      edges
      # A tibble: 15 x 2
      a b
      <fct> <fct>
      1 Joan Terrence
      2 John Joan
      3 Kerry Rick
      4 Michelle N/A
      5 Paul Collin
      6 Joan Joan
      7 John Terrence
      8 Kerry Michelle
      9 Michelle Michelle
      10 Paul Paul
      11 Joan N/A
      12 John Michelle
      13 Kerry Collin
      14 Michelle N/A
      15 Paul Phillips


      Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .



      Now we manage the data to put them in the graph:



      # create edges
      edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

      # create nodes
      nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
      group_by(researcher) %>%
      summarise(n = sum(n))

      # now we have to have the match between edges and nodes
      edges1$a <- match(edges1$a, nodes$researcher)
      edges1$b <- match(edges1$b, nodes$researcher)

      # declare the data as graph data
      tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
      tidy <- tidy %>%
      activate(edges) %>%
      arrange(desc(weight)
      )

      # now the plot: you have several options to do, here a basic one
      ggraph(tidy, layout = "gem") +
      geom_node_point(aes(size=n)) + # size of the node the frequency
      geom_edge_link(aes(width = weight), # here you set the edges
      # thickness as frequency
      arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
      end_cap = circle(3, 'mm'), alpha = 0.8) +
      scale_edge_width(range = c(0.2, 2)) +
      geom_text_repel(aes(x = x, y=y , label=researcher))


      enter image description here



      That should be coherent with data and:



      > edges1
      # A tibble: 14 x 3
      # Groups: a [?]
      a b weight
      <int> <int> <int>
      1 1 1 1
      2 1 7 1
      3 1 9 1
      4 2 1 1
      5 2 9 1
      6 2 4 1
      7 3 6 1
      8 3 8 1
      9 3 4 1
      10 4 7 2
      11 4 4 1
      12 5 6 1
      13 5 5 1
      14 5 10 1
      > nodes
      # A tibble: 10 x 2
      researcher n
      <fct> <dbl>
      1 Joan 5
      2 John 3
      3 Kerry 3
      4 Michelle 6
      5 Paul 4
      6 Collin 2
      7 N/A 3
      8 Rick 1
      9 Terrence 2
      10 Phillips 1





      share|improve this answer























        up vote
        2
        down vote



        accepted







        up vote
        2
        down vote



        accepted






        We can try with the ggraph package, but we have to arrange well the data.



        # this are your data
        data <- data.frame(
        author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
        author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
        author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
        author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

        # here you load some nice package
        library(tidyr) # to tidy the data
        library(ggraph) # to plot nice network data with the semantic of ggplot
        library(tidygraph) # to work with networks
        library(ggrepel) # to not have overlapping labels


        First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).



        edges <-rbind(
        expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
        expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
        expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
        )
        edges
        # A tibble: 15 x 2
        a b
        <fct> <fct>
        1 Joan Terrence
        2 John Joan
        3 Kerry Rick
        4 Michelle N/A
        5 Paul Collin
        6 Joan Joan
        7 John Terrence
        8 Kerry Michelle
        9 Michelle Michelle
        10 Paul Paul
        11 Joan N/A
        12 John Michelle
        13 Kerry Collin
        14 Michelle N/A
        15 Paul Phillips


        Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .



        Now we manage the data to put them in the graph:



        # create edges
        edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

        # create nodes
        nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
        group_by(researcher) %>%
        summarise(n = sum(n))

        # now we have to have the match between edges and nodes
        edges1$a <- match(edges1$a, nodes$researcher)
        edges1$b <- match(edges1$b, nodes$researcher)

        # declare the data as graph data
        tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
        tidy <- tidy %>%
        activate(edges) %>%
        arrange(desc(weight)
        )

        # now the plot: you have several options to do, here a basic one
        ggraph(tidy, layout = "gem") +
        geom_node_point(aes(size=n)) + # size of the node the frequency
        geom_edge_link(aes(width = weight), # here you set the edges
        # thickness as frequency
        arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
        end_cap = circle(3, 'mm'), alpha = 0.8) +
        scale_edge_width(range = c(0.2, 2)) +
        geom_text_repel(aes(x = x, y=y , label=researcher))


        enter image description here



        That should be coherent with data and:



        > edges1
        # A tibble: 14 x 3
        # Groups: a [?]
        a b weight
        <int> <int> <int>
        1 1 1 1
        2 1 7 1
        3 1 9 1
        4 2 1 1
        5 2 9 1
        6 2 4 1
        7 3 6 1
        8 3 8 1
        9 3 4 1
        10 4 7 2
        11 4 4 1
        12 5 6 1
        13 5 5 1
        14 5 10 1
        > nodes
        # A tibble: 10 x 2
        researcher n
        <fct> <dbl>
        1 Joan 5
        2 John 3
        3 Kerry 3
        4 Michelle 6
        5 Paul 4
        6 Collin 2
        7 N/A 3
        8 Rick 1
        9 Terrence 2
        10 Phillips 1





        share|improve this answer












        We can try with the ggraph package, but we have to arrange well the data.



        # this are your data
        data <- data.frame(
        author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
        author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
        author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
        author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

        # here you load some nice package
        library(tidyr) # to tidy the data
        library(ggraph) # to plot nice network data with the semantic of ggplot
        library(tidygraph) # to work with networks
        library(ggrepel) # to not have overlapping labels


        First, you should prepare your data. Due you have a father row, author_1, and sons, you can manage to do this for each combinations of author_1 and author_n, due you should have only one columns. It clearly works also if you have not a hierarchical dataset. You should have all the combinations of doubles father-sons for each rows, and rbind() do it, merging all the combinations (easier to do than to explain).



        edges <-rbind(
        expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
        expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
        expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
        )
        edges
        # A tibble: 15 x 2
        a b
        <fct> <fct>
        1 Joan Terrence
        2 John Joan
        3 Kerry Rick
        4 Michelle N/A
        5 Paul Collin
        6 Joan Joan
        7 John Terrence
        8 Kerry Michelle
        9 Michelle Michelle
        10 Paul Paul
        11 Joan N/A
        12 John Michelle
        13 Kerry Collin
        14 Michelle N/A
        15 Paul Phillips


        Remember, if you would plot the N/A, you left this as is, in other hand you add at the end this %>% filter(b != 'N/A') .



        Now we manage the data to put them in the graph:



        # create edges
        edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

        # create nodes
        nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
        group_by(researcher) %>%
        summarise(n = sum(n))

        # now we have to have the match between edges and nodes
        edges1$a <- match(edges1$a, nodes$researcher)
        edges1$b <- match(edges1$b, nodes$researcher)

        # declare the data as graph data
        tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
        tidy <- tidy %>%
        activate(edges) %>%
        arrange(desc(weight)
        )

        # now the plot: you have several options to do, here a basic one
        ggraph(tidy, layout = "gem") +
        geom_node_point(aes(size=n)) + # size of the node the frequency
        geom_edge_link(aes(width = weight), # here you set the edges
        # thickness as frequency
        arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
        end_cap = circle(3, 'mm'), alpha = 0.8) +
        scale_edge_width(range = c(0.2, 2)) +
        geom_text_repel(aes(x = x, y=y , label=researcher))


        enter image description here



        That should be coherent with data and:



        > edges1
        # A tibble: 14 x 3
        # Groups: a [?]
        a b weight
        <int> <int> <int>
        1 1 1 1
        2 1 7 1
        3 1 9 1
        4 2 1 1
        5 2 9 1
        6 2 4 1
        7 3 6 1
        8 3 8 1
        9 3 4 1
        10 4 7 2
        11 4 4 1
        12 5 6 1
        13 5 5 1
        14 5 10 1
        > nodes
        # A tibble: 10 x 2
        researcher n
        <fct> <dbl>
        1 Joan 5
        2 John 3
        3 Kerry 3
        4 Michelle 6
        5 Paul 4
        6 Collin 2
        7 N/A 3
        8 Rick 1
        9 Terrence 2
        10 Phillips 1






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 at 23:25









        s_t

        2,9552928




        2,9552928















            Popular posts from this blog

            The Sandy Post

            Danny Elfman

            Pages that link to "Head v. Amoskeag Manufacturing Co."