Regex to change all text to lowercase but leave out parts of text that start and end in a specific way
Is there a way to change all text to lowercase except the words that start with a specific combination of letters ("ABC") and end with a white space (dots, hyphen, underscore can be within?
Preserve capitalization in words like "ABCkjkJ.90_1 " or "ABC-12_OLL " but lowercase everything else?
Find:
(I have no idea)
[^ABC][s]$
Replace with:
L$1
Also, how should I delete all punctuation from the rest of the text (not the ones starting with ABC)?
regex language-agnostic
add a comment |
Is there a way to change all text to lowercase except the words that start with a specific combination of letters ("ABC") and end with a white space (dots, hyphen, underscore can be within?
Preserve capitalization in words like "ABCkjkJ.90_1 " or "ABC-12_OLL " but lowercase everything else?
Find:
(I have no idea)
[^ABC][s]$
Replace with:
L$1
Also, how should I delete all punctuation from the rest of the text (not the ones starting with ABC)?
regex language-agnostic
Regexes are not language agnostic.L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.
– Wiktor Stribiżew
Nov 16 '18 at 6:11
add a comment |
Is there a way to change all text to lowercase except the words that start with a specific combination of letters ("ABC") and end with a white space (dots, hyphen, underscore can be within?
Preserve capitalization in words like "ABCkjkJ.90_1 " or "ABC-12_OLL " but lowercase everything else?
Find:
(I have no idea)
[^ABC][s]$
Replace with:
L$1
Also, how should I delete all punctuation from the rest of the text (not the ones starting with ABC)?
regex language-agnostic
Is there a way to change all text to lowercase except the words that start with a specific combination of letters ("ABC") and end with a white space (dots, hyphen, underscore can be within?
Preserve capitalization in words like "ABCkjkJ.90_1 " or "ABC-12_OLL " but lowercase everything else?
Find:
(I have no idea)
[^ABC][s]$
Replace with:
L$1
Also, how should I delete all punctuation from the rest of the text (not the ones starting with ABC)?
regex language-agnostic
regex language-agnostic
edited Nov 16 '18 at 6:55
Antidisestablishmentarianism
asked Nov 16 '18 at 4:17
AntidisestablishmentarianismAntidisestablishmentarianism
114
114
Regexes are not language agnostic.L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.
– Wiktor Stribiżew
Nov 16 '18 at 6:11
add a comment |
Regexes are not language agnostic.L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.
– Wiktor Stribiżew
Nov 16 '18 at 6:11
Regexes are not language agnostic.
L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.– Wiktor Stribiżew
Nov 16 '18 at 6:11
Regexes are not language agnostic.
L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.– Wiktor Stribiżew
Nov 16 '18 at 6:11
add a comment |
1 Answer
1
active
oldest
votes
The problem boils down to matching words that don't start with ABC
. Because words in your string can contain dots and hyphens, which aren't word characters, we can't use b
to determine the start of a word, unfortunately - instead, match the preceding space (or the beginning of the string) with
(?: |^)
and then negative lookahead for abc
, and match as many words, dots, or hyphens as possible:
(?: |^)(?!abc)[w.-]*
Then, lowercase every full match.
https://regex101.com/r/QSShDu/1
Example, for input:
Baz Buzz ABCkjkJ.90_1 ABC-12_OLL Foo Bar
you get
baz buzz ABCkjkJ.90_1 ABC-12_OLL foo bar
If the ABC
part always occurs at the beginning of the string, then it's a lot easier - just capture the first word in a group, then capture the rest of the string in a group, and capitalize the rest of the string:
([w.-]*)(.+)
replace with
1L2
https://regex101.com/r/QSShDu/2
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg[._-]
to remove all dots, underscores, and dashes
– CertainPerformance
Nov 16 '18 at 7:33
|
show 10 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53331371%2fregex-to-change-all-text-to-lowercase-but-leave-out-parts-of-text-that-start-and%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem boils down to matching words that don't start with ABC
. Because words in your string can contain dots and hyphens, which aren't word characters, we can't use b
to determine the start of a word, unfortunately - instead, match the preceding space (or the beginning of the string) with
(?: |^)
and then negative lookahead for abc
, and match as many words, dots, or hyphens as possible:
(?: |^)(?!abc)[w.-]*
Then, lowercase every full match.
https://regex101.com/r/QSShDu/1
Example, for input:
Baz Buzz ABCkjkJ.90_1 ABC-12_OLL Foo Bar
you get
baz buzz ABCkjkJ.90_1 ABC-12_OLL foo bar
If the ABC
part always occurs at the beginning of the string, then it's a lot easier - just capture the first word in a group, then capture the rest of the string in a group, and capitalize the rest of the string:
([w.-]*)(.+)
replace with
1L2
https://regex101.com/r/QSShDu/2
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg[._-]
to remove all dots, underscores, and dashes
– CertainPerformance
Nov 16 '18 at 7:33
|
show 10 more comments
The problem boils down to matching words that don't start with ABC
. Because words in your string can contain dots and hyphens, which aren't word characters, we can't use b
to determine the start of a word, unfortunately - instead, match the preceding space (or the beginning of the string) with
(?: |^)
and then negative lookahead for abc
, and match as many words, dots, or hyphens as possible:
(?: |^)(?!abc)[w.-]*
Then, lowercase every full match.
https://regex101.com/r/QSShDu/1
Example, for input:
Baz Buzz ABCkjkJ.90_1 ABC-12_OLL Foo Bar
you get
baz buzz ABCkjkJ.90_1 ABC-12_OLL foo bar
If the ABC
part always occurs at the beginning of the string, then it's a lot easier - just capture the first word in a group, then capture the rest of the string in a group, and capitalize the rest of the string:
([w.-]*)(.+)
replace with
1L2
https://regex101.com/r/QSShDu/2
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg[._-]
to remove all dots, underscores, and dashes
– CertainPerformance
Nov 16 '18 at 7:33
|
show 10 more comments
The problem boils down to matching words that don't start with ABC
. Because words in your string can contain dots and hyphens, which aren't word characters, we can't use b
to determine the start of a word, unfortunately - instead, match the preceding space (or the beginning of the string) with
(?: |^)
and then negative lookahead for abc
, and match as many words, dots, or hyphens as possible:
(?: |^)(?!abc)[w.-]*
Then, lowercase every full match.
https://regex101.com/r/QSShDu/1
Example, for input:
Baz Buzz ABCkjkJ.90_1 ABC-12_OLL Foo Bar
you get
baz buzz ABCkjkJ.90_1 ABC-12_OLL foo bar
If the ABC
part always occurs at the beginning of the string, then it's a lot easier - just capture the first word in a group, then capture the rest of the string in a group, and capitalize the rest of the string:
([w.-]*)(.+)
replace with
1L2
https://regex101.com/r/QSShDu/2
The problem boils down to matching words that don't start with ABC
. Because words in your string can contain dots and hyphens, which aren't word characters, we can't use b
to determine the start of a word, unfortunately - instead, match the preceding space (or the beginning of the string) with
(?: |^)
and then negative lookahead for abc
, and match as many words, dots, or hyphens as possible:
(?: |^)(?!abc)[w.-]*
Then, lowercase every full match.
https://regex101.com/r/QSShDu/1
Example, for input:
Baz Buzz ABCkjkJ.90_1 ABC-12_OLL Foo Bar
you get
baz buzz ABCkjkJ.90_1 ABC-12_OLL foo bar
If the ABC
part always occurs at the beginning of the string, then it's a lot easier - just capture the first word in a group, then capture the rest of the string in a group, and capitalize the rest of the string:
([w.-]*)(.+)
replace with
1L2
https://regex101.com/r/QSShDu/2
edited Nov 16 '18 at 4:50
answered Nov 16 '18 at 4:27
CertainPerformanceCertainPerformance
96.2k165786
96.2k165786
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg[._-]
to remove all dots, underscores, and dashes
– CertainPerformance
Nov 16 '18 at 7:33
|
show 10 more comments
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg[._-]
to remove all dots, underscores, and dashes
– CertainPerformance
Nov 16 '18 at 7:33
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thanks! If that matters, the "ABC***" string is always at the beginning of the line. Each line invariably starts with "ABC" and gibberish characters that need to maintain their capitalization, but the rest of the line contains the text that needs to be lowercase.
– Antidisestablishmentarianism
Nov 16 '18 at 4:46
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Thank you so much! One last thing, if I'm not getting too impertinent: how do I delete all punctuation except the apostrophe from the rest of the string? Replace - sth like ([w.-]*)(WS)? And what do I replace it with?
– Antidisestablishmentarianism
Nov 16 '18 at 6:40
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg
[._-]
– CertainPerformance
Nov 16 '18 at 7:17
Put every punctuation character you want to remove in a character set, then replace every occurrence with the empty string. eg
[._-]
– CertainPerformance
Nov 16 '18 at 7:17
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
I decided to forget about the apostrophe. tried ([w.-]*)([[:punct:]]) replaced with 1[._-]2. That didn't work
– Antidisestablishmentarianism
Nov 16 '18 at 7:28
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg
[._-]
to remove all dots, underscores, and dashes– CertainPerformance
Nov 16 '18 at 7:33
You shouldn't try matching word characters, only match the punctuation marks you want to remove, in a character set. eg
[._-]
to remove all dots, underscores, and dashes– CertainPerformance
Nov 16 '18 at 7:33
|
show 10 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53331371%2fregex-to-change-all-text-to-lowercase-but-leave-out-parts-of-text-that-start-and%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Regexes are not language agnostic.
L
and other case changing operators are not supported in many regex libraries. Other features you may need for this task may differ from regex library to regex library.– Wiktor Stribiżew
Nov 16 '18 at 6:11