R - return row indexes using grepl

I would like to return the row index for a text match. The text match is done via the adist function in R. The code creates some text data, calculates the edit distances and returns the 5 best matches.

name <- c("holiday inn", "geico", "zgf", "morton phillips")

address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md", 

"811 quincy st washington dc", "1911 1st st rockville md")



source1 <- data.frame(name, address)



 name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long 

diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")



name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")

address <- c("2 reads way new castle de", "248 w 4th st newark de",

 "1100 21st st nw washington dc", "1804 w 5th st wilmington de",

 "1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",

 "400 lafayette pl tupelo ms", "811 quincy st washington dc")

source2 <- data.frame(name, name2, address)



#calculate edit distance for name and address

dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)

dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)



#assemble data frame

imat <- apply(dist.mat.nm, 1, order)[1:5, ]

top.nm <- data.frame(name = source1$name)

tmp <- apply(imat, 1, function(i) source2$name[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.nm <- cbind(top.nm, tmp)



imat <- apply(dist.mat.ad, 1, order)[1:5, ]

top.ad <- data.frame(address = source1$address)

tmp <- apply(imat, 1, function(i) source2$address[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.ad <- cbind(top.ad, tmp)

What I would like to do is, 1. for each top.nm and top.ad column, add a column index.match for the row index where the match was found and 2. add a column distance containing the edit distance for that match (i.e. the value of adist).

So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).

I have tried using which(grepl(paste(source1$name, collapse = "|"), source2$name, fixed = F)) to return row indexes but for some reason it only returns a single value, not a whole vector. Any help or advice would be appreciated.

asked Nov 14 '18 at 18:06

jvalenti

143113

1

I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15

add a comment |

name <- c("holiday inn", "geico", "zgf", "morton phillips")

address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md", 

"811 quincy st washington dc", "1911 1st st rockville md")



source1 <- data.frame(name, address)



 name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long 

diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")



name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")

address <- c("2 reads way new castle de", "248 w 4th st newark de",

 "1100 21st st nw washington dc", "1804 w 5th st wilmington de",

 "1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",

 "400 lafayette pl tupelo ms", "811 quincy st washington dc")

source2 <- data.frame(name, name2, address)



#calculate edit distance for name and address

dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)

dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)



#assemble data frame

imat <- apply(dist.mat.nm, 1, order)[1:5, ]

top.nm <- data.frame(name = source1$name)

tmp <- apply(imat, 1, function(i) source2$name[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.nm <- cbind(top.nm, tmp)



imat <- apply(dist.mat.ad, 1, order)[1:5, ]

top.ad <- data.frame(address = source1$address)

tmp <- apply(imat, 1, function(i) source2$address[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.ad <- cbind(top.ad, tmp)

So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).

asked Nov 14 '18 at 18:06

jvalenti

143113

1

I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15

add a comment |

name <- c("holiday inn", "geico", "zgf", "morton phillips")

address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md", 

"811 quincy st washington dc", "1911 1st st rockville md")



source1 <- data.frame(name, address)



 name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long 

diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")



name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")

address <- c("2 reads way new castle de", "248 w 4th st newark de",

 "1100 21st st nw washington dc", "1804 w 5th st wilmington de",

 "1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",

 "400 lafayette pl tupelo ms", "811 quincy st washington dc")

source2 <- data.frame(name, name2, address)



#calculate edit distance for name and address

dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)

dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)



#assemble data frame

imat <- apply(dist.mat.nm, 1, order)[1:5, ]

top.nm <- data.frame(name = source1$name)

tmp <- apply(imat, 1, function(i) source2$name[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.nm <- cbind(top.nm, tmp)



imat <- apply(dist.mat.ad, 1, order)[1:5, ]

top.ad <- data.frame(address = source1$address)

tmp <- apply(imat, 1, function(i) source2$address[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.ad <- cbind(top.ad, tmp)

So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).

asked Nov 14 '18 at 18:06

jvalenti

143113

name <- c("holiday inn", "geico", "zgf", "morton phillips")

address <- c("400 lafayette pl tupelo ms", "227 geico plaza chevy chase md", 

"811 quincy st washington dc", "1911 1st st rockville md")



source1 <- data.frame(name, address)



 name <- c("williams sonoma", "mamas bbq", "davis polk", "hop a long 

diner","joes crag shack", "mike lowry place", "holiday inn", "zummer")



name2 <- c(NA, NA, NA, NA, NA, NA, "hi express", "zummer gunsul frasca")

address <- c("2 reads way new castle de", "248 w 4th st newark de",

 "1100 21st st nw washington dc", "1804 w 5th st wilmington de",

 "1208 kenwood parkway holdridge nb", "4203 ocean drive miami fl",

 "400 lafayette pl tupelo ms", "811 quincy st washington dc")

source2 <- data.frame(name, name2, address)



#calculate edit distance for name and address

dist.mat.nm <- adist(source1$name, source2$name, partial = T, ignore.case = TRUE)

dist.mat.ad <- adist(source1$address, source2$address, partial = TRUE, ignore.case = TRUE)



#assemble data frame

imat <- apply(dist.mat.nm, 1, order)[1:5, ]

top.nm <- data.frame(name = source1$name)

tmp <- apply(imat, 1, function(i) source2$name[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.nm <- cbind(top.nm, tmp)



imat <- apply(dist.mat.ad, 1, order)[1:5, ]

top.ad <- data.frame(address = source1$address)

tmp <- apply(imat, 1, function(i) source2$address[i])

colnames(tmp) <- paste("top", 1:5, sep = ".")

top.ad <- cbind(top.ad, tmp)

So, for top.nm.1, the index.match column would be: c(7, 6, 4, 1).

r levenshtein-distance grepl stringdist

asked Nov 14 '18 at 18:06

jvalenti

143113

asked Nov 14 '18 at 18:06

jvalenti

143113

asked Nov 14 '18 at 18:06

jvalenti

143113

asked Nov 14 '18 at 18:06

jvalenti

143113

asked Nov 14 '18 at 18:06

jvalenti

143113

1

I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15

add a comment |

1

I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15

I think you want to try this: sapply(top.nm$top.1, function(x){grep(x, source2$name)})

– Dave2e
Nov 14 '18 at 19:15

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306344%2fr-return-row-indexes-using-grepl%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky