R - return row indexes using grepl












0















I would like to return the row index for a text match. The text match is done via the adist function in R. The code creates some text data, calculates the edit distances and returns the 5 best matches.



name <- c("holiday inn", "geico", "zgf", "morton phillips")
address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md",
"811 quincy st washington dc", "1911 1st st rockville md")

source1 <- data.frame(name, address)

name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long
diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")

name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")
address <- c("2 reads way new castle de", "248 w 4th st newark de",
"1100 21st st nw washington dc", "1804 w 5th st wilmington de",
"1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",
"400 lafayette pl tupelo ms", "811 quincy st washington dc")
source2 <- data.frame(name, name2, address)

#calculate edit distance for name and address
dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)
dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)

#assemble data frame
imat <- apply(dist.mat.nm, 1, order)[1:5, ]
top.nm <- data.frame(name = source1$name)
tmp <- apply(imat, 1, function(i) source2$name[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.nm <- cbind(top.nm, tmp)

imat <- apply(dist.mat.ad, 1, order)[1:5, ]
top.ad <- data.frame(address = source1$address)
tmp <- apply(imat, 1, function(i) source2$address[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.ad <- cbind(top.ad, tmp)


What I would like to do is, 1. for each top.nm and top.ad column, add a column index.match for the row index where the match was found and 2. add a column distance containing the edit distance for that match (i.e. the value of adist).



So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).



I have tried using which(grepl(paste(source1$name, collapse = "|"), source2$name, fixed = F)) to return row indexes but for some reason it only returns a single value, not a whole vector. Any help or advice would be appreciated.










share|improve this question


















  • 1





    I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

    – Dave2e
    Nov 14 '18 at 19:15
















0















I would like to return the row index for a text match. The text match is done via the adist function in R. The code creates some text data, calculates the edit distances and returns the 5 best matches.



name <- c("holiday inn", "geico", "zgf", "morton phillips")
address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md",
"811 quincy st washington dc", "1911 1st st rockville md")

source1 <- data.frame(name, address)

name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long
diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")

name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")
address <- c("2 reads way new castle de", "248 w 4th st newark de",
"1100 21st st nw washington dc", "1804 w 5th st wilmington de",
"1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",
"400 lafayette pl tupelo ms", "811 quincy st washington dc")
source2 <- data.frame(name, name2, address)

#calculate edit distance for name and address
dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)
dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)

#assemble data frame
imat <- apply(dist.mat.nm, 1, order)[1:5, ]
top.nm <- data.frame(name = source1$name)
tmp <- apply(imat, 1, function(i) source2$name[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.nm <- cbind(top.nm, tmp)

imat <- apply(dist.mat.ad, 1, order)[1:5, ]
top.ad <- data.frame(address = source1$address)
tmp <- apply(imat, 1, function(i) source2$address[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.ad <- cbind(top.ad, tmp)


What I would like to do is, 1. for each top.nm and top.ad column, add a column index.match for the row index where the match was found and 2. add a column distance containing the edit distance for that match (i.e. the value of adist).



So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).



I have tried using which(grepl(paste(source1$name, collapse = "|"), source2$name, fixed = F)) to return row indexes but for some reason it only returns a single value, not a whole vector. Any help or advice would be appreciated.










share|improve this question


















  • 1





    I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

    – Dave2e
    Nov 14 '18 at 19:15














0












0








0








I would like to return the row index for a text match. The text match is done via the adist function in R. The code creates some text data, calculates the edit distances and returns the 5 best matches.



name <- c("holiday inn", "geico", "zgf", "morton phillips")
address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md",
"811 quincy st washington dc", "1911 1st st rockville md")

source1 <- data.frame(name, address)

name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long
diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")

name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")
address <- c("2 reads way new castle de", "248 w 4th st newark de",
"1100 21st st nw washington dc", "1804 w 5th st wilmington de",
"1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",
"400 lafayette pl tupelo ms", "811 quincy st washington dc")
source2 <- data.frame(name, name2, address)

#calculate edit distance for name and address
dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)
dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)

#assemble data frame
imat <- apply(dist.mat.nm, 1, order)[1:5, ]
top.nm <- data.frame(name = source1$name)
tmp <- apply(imat, 1, function(i) source2$name[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.nm <- cbind(top.nm, tmp)

imat <- apply(dist.mat.ad, 1, order)[1:5, ]
top.ad <- data.frame(address = source1$address)
tmp <- apply(imat, 1, function(i) source2$address[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.ad <- cbind(top.ad, tmp)


What I would like to do is, 1. for each top.nm and top.ad column, add a column index.match for the row index where the match was found and 2. add a column distance containing the edit distance for that match (i.e. the value of adist).



So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).



I have tried using which(grepl(paste(source1$name, collapse = "|"), source2$name, fixed = F)) to return row indexes but for some reason it only returns a single value, not a whole vector. Any help or advice would be appreciated.










share|improve this question














I would like to return the row index for a text match. The text match is done via the adist function in R. The code creates some text data, calculates the edit distances and returns the 5 best matches.



name <- c("holiday inn", "geico", "zgf", "morton phillips")
address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md",
"811 quincy st washington dc", "1911 1st st rockville md")

source1 <- data.frame(name, address)

name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long
diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")

name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")
address <- c("2 reads way new castle de", "248 w 4th st newark de",
"1100 21st st nw washington dc", "1804 w 5th st wilmington de",
"1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",
"400 lafayette pl tupelo ms", "811 quincy st washington dc")
source2 <- data.frame(name, name2, address)

#calculate edit distance for name and address
dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)
dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)

#assemble data frame
imat <- apply(dist.mat.nm, 1, order)[1:5, ]
top.nm <- data.frame(name = source1$name)
tmp <- apply(imat, 1, function(i) source2$name[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.nm <- cbind(top.nm, tmp)

imat <- apply(dist.mat.ad, 1, order)[1:5, ]
top.ad <- data.frame(address = source1$address)
tmp <- apply(imat, 1, function(i) source2$address[i])
colnames(tmp) <- paste("top", 1:5, sep = ".")
top.ad <- cbind(top.ad, tmp)


What I would like to do is, 1. for each top.nm and top.ad column, add a column index.match for the row index where the match was found and 2. add a column distance containing the edit distance for that match (i.e. the value of adist).



So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).



I have tried using which(grepl(paste(source1$name, collapse = "|"), source2$name, fixed = F)) to return row indexes but for some reason it only returns a single value, not a whole vector. Any help or advice would be appreciated.







r levenshtein-distance grepl stringdist






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 18:06









jvalentijvalenti

143113




143113








  • 1





    I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

    – Dave2e
    Nov 14 '18 at 19:15














  • 1





    I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

    – Dave2e
    Nov 14 '18 at 19:15








1




1





I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15





I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306344%2fr-return-row-indexes-using-grepl%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306344%2fr-return-row-indexes-using-grepl%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Lugert, Oklahoma