How to correctly write regular expression to match ASCII control chars Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?How can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>
Stars Make Stars
Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?
How did the aliens keep their waters separated?
Is it possible to ask for a hotel room without minibar/extra services?
Mortgage adviser recommends a longer term than necessary combined with overpayments
Why is "Captain Marvel" translated as male in Portugal?
I'm having difficulty getting my players to do stuff in a sandbox campaign
Writing Thesis: Copying from published papers
Why use gamma over alpha radiation?
What did Darwin mean by 'squib' here?
Can't figure this one out.. What is the missing box?
Why is there no army of Iron-Mans in the MCU?
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Is there folklore associating late breastfeeding with low intelligence and/or gullibility?
How to market an anarchic city as a tourism spot to people living in civilized areas?
How to say that you spent the night with someone, you were only sleeping and nothing else?
Am I ethically obligated to go into work on an off day if the reason is sudden?
Can smartphones with the same camera sensor have different image quality?
Direct Experience of Meditation
Single author papers against my advisor's will?
Fishing simulator
Are my PIs rude or am I just being too sensitive?
How to retrograde a note sequence in Finale?
Can I throw a sword that doesn't have the Thrown property at someone?
How to correctly write regular expression to match ASCII control chars
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?How can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
5 hours ago
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
regular-expressions
edited 6 hours ago
serghei
asked 7 hours ago
sergheiserghei
189110
189110
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
5 hours ago
add a comment |
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
5 hours ago
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
And as I can see
À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
5 hours ago
And as I can see
À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
5 hours ago
add a comment |
1 Answer
1
active
oldest
votes
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "583"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
edited 4 hours ago
answered 4 hours ago
DrewDrew
49.2k463108
49.2k463108
add a comment |
add a comment |
Thanks for contributing an answer to Emacs Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
6 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
5 hours ago
And as I can see
Àis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
5 hours ago