PHP 8.5.0 Beta 3 available for testing

åžŒæ–šå‚į…§

æ–‡å­—ã‚¯ãƒŠã‚šå¤–ã§ã€ãƒãƒƒã‚¯ã‚šãƒŠãƒƒã‚ˇãƒĨãĢįļšã„ãĻ 1 äģĨ上ぎ数値īŧˆč¤‡æ•°æĄå¯īŧ‰ ã‚’č¨˜čŋ°ã—たもぎは、パã‚ŋãƒŧãƒŗä¸­ãŽã‚ˆã‚Šå‰æ–šīŧˆã™ãĒã‚ãĄåˇĻīŧ‰ãĢある ã‚­ãƒŖãƒ—ãƒãƒŖį”¨ã‚ĩブパã‚ŋãƒŧãƒŗãĢå¯žã™ã‚‹åžŒæ–šå‚į…§ (back reference) です。ただし、 そぎåˇĻæ–šãĢ、そぎ数値äģĨä¸ŠãŽå€‹æ•°ãŽã‚­ãƒŖãƒ—ãƒãƒŖį”¨ã‚ĩブパã‚ŋãƒŧãƒŗãŽé–‹ãã‚Ģãƒƒã‚ŗãŒ あるåŋ…čĻãŒã‚ã‚Šãžã™ã€‚

ãĒãŠã€ãƒãƒƒã‚¯ã‚šãƒŠãƒƒã‚ˇãƒĨぎ垌ãĢ 10 æœĒæē€ãŽ 10 é€˛æ•°ãŒįļšãå ´åˆã¯ 常ãĢåžŒæ–šå‚į…§ã¨ã—ãĻč§Ŗé‡ˆã•ã‚Œã€ãƒ‘ã‚ŋãƒŧãƒŗå…¨äŊ“で指厚した個数äģĨ上ぎ ã‚­ãƒŖãƒ—ãƒãƒŖį”¨ã‚ĩブパã‚ŋãƒŧãƒŗãŒį„Ąã„ã¨ã‚¨ãƒŠãƒŧがį™ēį”Ÿã—ãžã™ã€‚č¨€ã„ã‹ãˆã‚‹ã¨ã€ å‚į…§ã•ã‚Œã‚‹ã‚Ģãƒƒã‚ŗã¯ã€10æœĒæē€ãŽį•ĒåˇãĢ寞しãĻã¯ã€å‚į…§ã™ã‚‹å´ãŽåˇĻãĢあるåŋ…čĻãŒãĒいということです。 "前斚ãĢã‚ã‚‹åžŒæ–šå‚į…§ (forward back reference)" ãŒæ„å‘ŗã‚’ãĒすぎは、 įš°ã‚Ščŋ”しがåĢぞれãĻいãĻã€åŗå´ã¸ãŽã‚ĩブパã‚ŋãƒŧãƒŗãŒããŽå‰ãŽååžŠãĢåĢぞれãĻいる場合です。 ãƒãƒƒã‚¯ã‚šãƒŠãƒƒã‚ˇãƒĨぎ垌ãĢ数字がįļšãå ´åˆãŽå‡Ļį†ãŽčŠŗį´°ãĢついãĻは、 ã‚¨ã‚šã‚ąãƒŧãƒ—ã‚ˇãƒŧã‚ąãƒŗã‚š ぎã‚ģã‚¯ã‚ˇãƒ§ãƒŗã‚’å‚į…§ãã ã•ã„ã€‚

åžŒæ–šå‚į…§ã¯ã€ã‚ĢãƒŦãƒŗãƒˆãŽå¯žčąĄæ–‡å­—åˆ—ãĢおいãĻã‚­ãƒŖãƒ—ãƒãƒŖį”¨ã‚ĩブパã‚ŋãƒŧãƒŗãŒ 原際ãĢマッチした文字列ãĢマッチしぞす。ã‚ĩブパã‚ŋãƒŧãƒŗãŒãƒ‘ã‚ŋãƒŧãƒŗã¨ã—ãĻ マッチし垗るもぎではありぞせん。すãĒã‚ãĄã€ãƒ‘ã‚ŋãƒŧãƒŗ

      (sens|respons)e and \1ibility
      
は、"sense and sensibility" ãŠã‚ˆãŗ "response and responsibility" ãĢマッチしぞすが、 "sense and responsibility" ãĢはマッチしぞせん。ぞた、 åžŒæ–šå‚į…§ãŒč¨˜čŋ°ã•れãĻいるäŊįŊŽã§å¤§å°æ–‡å­—ã‚’åŒēåˆĨã™ã‚‹ãƒžãƒƒãƒãƒŗã‚°ãŒæœ‰åŠšãĒらば、 文字ぎ大小文字ぎåˆĨもé–ĸäŋ‚しぞす。䞋えば、
      ((?i)rah)\s+\1
      
は、"rah rah" ãŠã‚ˆãŗ "RAH RAH" ãĢマッチしぞすが、 å…ƒãŽã‚­ãƒŖãƒ—ãƒãƒŖį”¨ã‚ĩブパã‚ŋãƒŧãƒŗã¯å¤§å°æ–‡å­—ã‚’åŒēåˆĨしãĒã„ãƒžãƒƒãƒãƒŗã‚°ã‚’ čĄŒãŖãĻいるãĢもかかわらず、"RAH rah" ãĢはマッチしぞせん

同じã‚ĩブパã‚ŋãƒŧãƒŗãĢ寞しãĻã€č¤‡æ•°å›žãŽåžŒæ–šå‚į…§ã‚’čĄŒã†ã“ã¨ãŒã§ããžã™ã€‚ ぞた、äŊŋわれãĒã‹ãŖãŸã‚ĩブパã‚ŋãƒŧãƒŗãĢå¯žã™ã‚‹åžŒæ–šå‚į…§ã‚’čĄŒãŠã†ã¨ã™ã‚‹ã¨ã€ ãƒžãƒƒãƒãŒå¤ąæ•—ã—ãžã™ã€‚äž‹ãˆã°ã€ãƒ‘ã‚ŋãƒŧãƒŗ

      (a|(bc))\2
      
は、はじめãĢ "bc" でãĒく "a" ãĢãƒžãƒƒãƒã—ãŸå ´åˆã¯ã€ãƒžãƒƒãƒãŒå¤ąæ•—ã—ãžã™ã€‚ 最大 99 į•Ēãžã§ãŽåžŒæ–šå‚į…§ã‚’äŊŋį”¨ã§ãã‚‹ãŸã‚ã€ãƒãƒƒã‚¯ã‚šãƒŠãƒƒã‚ˇãƒĨぎ垌ãĢ 数字がįļšãã‚‚ぎはすずãĻåžŒæ–šå‚į…§į•ĒåˇãŽå¯čƒŊ性があるもぎとしãĻč§Ŗé‡ˆã•ã‚Œãžã™ã€‚ 垌ãĢ数字がįļšãå ´åˆã€åžŒæ–šå‚į…§ã‚’įĩ‚äē†ã™ã‚‹ãŸã‚ãĢãĒんらかぎåŒē切り文字を įŊŽãåŋ…čĻãŒã‚ã‚Šãžã™ã€‚PCRE_EXTENDED ã‚Ēãƒ—ã‚ˇãƒ§ãƒŗã‚’č¨­åŽšã—ãĻいる場合はįŠēį™Ŋ文字をåŒē切り文字としãĻäŊŋえぞす。 そぎäģ–ぎ場合はįŠēãŽã‚ŗãƒĄãƒŗãƒˆã‚’äŊŋいぞす。

åžŒæ–šå‚į…§ã‚’ã€ãã‚Œč‡ĒčēĢãŒå‚į…§ã™ã‚‹ã‚ĩブパã‚ŋãƒŧãƒŗãŽã‚Ģãƒƒã‚ŗå†…ãĢ記čŋ°ã—た場合、 そぎã‚ĩブパã‚ŋãƒŧãƒŗãŒæœ€åˆãĢäŊŋわれた際ãĢãƒžãƒƒãƒãŒå¤ąæ•—ã—ãžã™ã€‚ã§ã™ãŽã§ã€ (a\1) は、äŊ•ãĢもマッチしぞせん。しかし、こぎようãĒå‚į…§ã¯ã€ č¤‡æ•°å›žįš°ã‚Ščŋ”されるã‚ĩブパã‚ŋãƒŧãƒŗãŽå†…éƒ¨ã§ã¯æœ‰į”¨ã§ã™ã€‚äž‹ãˆã°ã€ãƒ‘ã‚ŋãƒŧãƒŗ

      (a|b\1)+
      
は、"a" ãŒé€Ŗįļšã™ã‚‹ã‚‚ぎや "aba", "ababba" į­‰ãĢマッチしぞす。 ã‚ĩブパã‚ŋãƒŧãƒŗãŒįš°ã‚Ščŋ”ã•ã‚Œã‚‹å ´åˆã€åžŒæ–šå‚į…§ã¯ã€į›´å‰ãŽįš°ã‚Ščŋ”ã—ã§ä¸€č‡´ã—ãŸ 文字列ãĢマッチしぞす。こうしたパã‚ŋãƒŧãƒŗã‚’å‹•äŊœã•せるためãĢは、 įš°ã‚Ščŋ”しぎ1 å›žį›ŽãĢã€åžŒæ–šå‚į…§ã‚’åĢむパã‚ŋãƒŧãƒŗã¨ãŽãƒžãƒƒãƒãƒŗã‚°ãŒ čĄŒã‚ã‚ŒãĒいことがåŋ…čĻã§ã™ã€‚ã“ã‚ŒãĢは、上ぎ䞋ぎようãĢ選択č‚ĸをäŊŋうか、 下限が 0 回ぎ量指厚子をäŊŋいぞす。

ã‚¨ã‚šã‚ąãƒŧãƒ—ã‚ˇãƒŧã‚ąãƒŗã‚š \g をäŊŋãŖãĻã‚ĩブパã‚ŋãƒŧãƒŗãŽįĩļå¯žå‚į…§ãŠã‚ˆãŗį›¸å¯žå‚į…§ã‚’čĄŒã†ã“ã¨ãŒã§ããžã™ã€‚ ã“ãŽã‚¨ã‚šã‚ąãƒŧãƒ—ã‚ˇãƒŧã‚ąãƒŗã‚šãŽåžŒãĢはįŦĻåˇãĒã—ãŽæ•°å€¤ã‚ã‚‹ã„ã¯č˛ ãŽæ•°å€¤ã‚’įļšã‘ãĒければãĒりぞせん。 数値はæŗĸæ‹Ŧåŧ§ã§å›˛ã‚€ã“ともできぞす。\1 と \g1 ãŠã‚ˆãŗ \g{1} は、 すずãĻåŒã˜æ„å‘ŗãĢãĒりぞす。įŦĻåˇãĒし数値でこぎ斚åŧã‚’äŊŋえば、 ãƒãƒƒã‚¯ã‚šãƒŠãƒƒã‚ˇãƒĨぎ垌ãĢ数値をįļšã‘ã‚‹æ–šåŧãŒã‚‚つあいぞいさを排除できぞす。 こぎ斚åŧã‚’äŊŋãˆã°åžŒæ–šå‚į…§ã¨å…Ģé€˛æ•°å€¤ã‚’æ˜ŽįĸēãĢåŒēåˆĨすることができ、 さらãĢã€åžŒæ–šå‚į…§ãŽã‚ã¨ãĢ数値ãƒĒテナãƒĢがįļšã \g{2}1 ぎようãĒパã‚ŋãƒŧãƒŗã‚‚æ›¸ãã‚„ã™ããĒりぞす。

\g ã‚ˇãƒŧã‚ąãƒŗã‚šã§č˛ ãŽæ•°å€¤ã‚’äŊŋうと、 ãã‚Œã¯į›¸å¯žå‚į…§ã‚’čĄ¨ã—ãžã™ã€‚ãŸã¨ãˆã° (foo)(bar)\g{-1} は "foobarbar" ãĢマッチし、(foo)(bar)\g{-2} は "foobarfoo" ãĢマッチしぞす。 ã“ã‚Œã¯ã€é•ˇã„ãƒ‘ã‚ŋãƒŧãƒŗãŽä¸­ã§į‰šåŽšãŽã‚ĩブパã‚ŋãƒŧãƒŗã‚’å‚į…§ã™ã‚‹å ´åˆãĢäžŋ刊です。 それがäŊ•į•Ēį›ŽãŽã‚ĩブパã‚ŋãƒŧãƒŗãĢãĒã‚‹ã‹ã‚’ã„ãĄã„ãĄčĻšãˆãĻおくかわりãĢ、 į›¸å¯žæŒ‡åŽšã™ã‚‹ã“ã¨ãŒã§ãã‚‹ã‹ã‚‰ã§ã™ã€‚

名前を指厚したã‚ĩブパã‚ŋãƒŧãƒŗã¸ãŽåžŒæ–šå‚į…§ã‚’čĄŒã†ãĢは (?P=name) としぞす。 これäģĨ外ãĢも \k<name>, \k'name', \k{name}, \g{name}, \g<name>, \g'name' ぎåŊĸもäŊŋえぞす。

īŧ‹add a note

User Contributed Notes 2 notes

up
13
mnvx at yandex dot ru Âļ
9 years ago
Something similar opportunity is DEFINE.

Example:
(?(DEFINE)(?<myname>\bvery\b))(?&myname)\p{Pd}(?&myname).

Expression above will match "very-very" from next sentence:
Define is very-very handy sometimes.
^-------^

How it works. (?(DEFINE)(?<myname>\bvery\b)) - this block defines "myname" equal to "\bvery\b". So, this block "(?&myname)\p{Pd}(?&myname)" equvivalent to "\bvery\b\p{Pd}\bvery\b".
up
1
Steve Âļ
2 years ago
The escape sequence \g used as a backreference may not always behave as expected.
The following numbered backreferences refer to the text matching the specified capture group, as documented:
\1
\g1
\g{1}
\g-1
\g{-1}

However, the following variants refer to the subpattern code instead of the matched text:
\g<1>
\g'1'
\g<-1>
\g'-1'

With named backreferences, we may also use the \k escape sequence as well as the (?P=...) construct. The following combinations also refer to the text matching the named capture group, as documented:
\g{name}
\k{name}
\k<name>
\k'name'
(?P=name)

However, these refer to the subpattern code instead of the matched text:
g<name>
\g'name'

In the following example, the capture group searches for a single letter 'a' or 'b', and then the backreference looks for the same letter. Thus, the patterns are expected to match 'aa' and 'bb', but not 'ab' nor 'ba'.

<?php
/* Matches to the following patterns are replaced by 'xx' in the subject string 'aa ab ba bb'. */
$patterns = [
# numbered backreferences (absolute)
'/([ab])\1/', // 'xx ab ba xx'
'/([ab])\g1/', // 'xx ab ba xx'
'/([ab])\g{1}/', // 'xx ab ba xx'
'/([ab])\g<1>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/([ab])\g'1'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/([ab])\k{1}/', // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
'/([ab])\k<1>/', // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
"/([ab])\k'1'/", // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
'/([ab])(?P=1)/', // NULL # Regex error: "subpattern name must start with a non-digit", (?P=) expects name not number.
# numbered backreferences (relative)
'/([ab])\-1/', // 'aa ab ba bb'
'/([ab])\g-1/', // 'xx ab ba xx'
'/([ab])\g{-1}/', // 'xx ab ba xx'
'/([ab])\g<-1>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/([ab])\g'-1'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/([ab])\k{-1}/', // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
'/([ab])\k<-1>/', // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
"/([ab])\k'-1'/", // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
'/([ab])(?P=-1)/', // NULL # Regex error: "subpattern name expected", (?P=) expects name not number.
# named backreferences
'/(?<name>[ab])\g{name}/', // 'xx ab ba xx'
'/(?<name>[ab])\g<name>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/(?<name>[ab])\g'name'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/(?<name>[ab])\k{name}/', // 'xx ab ba xx'
'/(?<name>[ab])\k<name>/', // 'xx ab ba xx'
"/(?<name>[ab])\k'name'/", // 'xx ab ba xx'
'/(?<name>[ab])(?P=name)/', // 'xx ab ba xx'
];

foreach (
$patterns as $pat)
echo
" '$pat',\t// " . var_export(@preg_replace($pat, 'xx', 'aa ab ba bb'), 1) . PHP_EOL;
?>
To Top