Python Regular Expression¶

1. 일반적인 문자는 그 문자 자체로 매칭이 된다. 특수 문자는 다음과 같다.¶

\       특수 문자 Escape (or start a sequence)
.       줄바꿈을 제외한 모든 문자 (참고 re.DOTALL)
^       문자열의 시작 (참고 re.MULTILINE)
$       문자열의 마지막 (re.MULTILINE)
[]      문자 집합
|       또는
()      Capture 그룹 생성 (우선순위 지정)

2. 집합기호 '`[`' 문자 이후에 사용할 수 있는 특수 문자는 다음과 같다.¶

]       집합의 끝
-       범위 (예 a-c 는 a, b 또는 c를 의미)
^       Negate를 의미

Quantifiers (append '?' for non-greedy):

{m}     Exactly m repetitions
{m,n}   From m (default 0) to n (default infinity)
*       0 or more. Same as {,}
+       1 or more. Same as {1,}
?       0 or 1. Same as {,1}

Special sequences::

\A  Start of string
\b  Match empty string at word (\w+) boundary
\B  Match empty string not at word boundary
\d  Digit
\D  Non-digit
\s  Whitespace - [ \t\n\r\f\v]과 동일 (참고: LOCALE, UNICODE)
\S  Non-whitespace
\w  Alphanumeric: [0-9a-zA-Z_], see LOCALE
\W  Non-alphanumeric
\Z  End of string
\g<id>  Match prev named or numbered group,
        '<' & '>' are literal, e.g. \g<0>
        or \g<name> (not \g0 or \gname)

Special character escapes are much like those already escaped in Python string literals. Hence regex '\n' is same as regex '\\n'::

\a  ASCII Bell (BEL)
\f  ASCII Formfeed
\n  ASCII Linefeed
\r  ASCII Carriage return
\t  ASCII Tab
\v  ASCII Vertical tab
\\  A single backslash
\xHH   Two digit hexadecimal character goes here
\OOO   Three digit octal char (or just use an
       initial zero, e.g. \0, \09)
\DD    Decimal number 1 to 99, match
       previous numbered group

Extensions. Do not cause grouping, except 'P<name>'::

(?iLmsux)     Match empty string, sets re.X flags
(?:...)       Non-capturing version of regular parens
(?P<name>...) Create a named capturing group.
(?P=name)     Match whatever matched prev named group
(?#...)       A comment; ignored.
(?=...)       Lookahead assertion, match without consuming
(?!...)       Negative lookahead assertion
(?<=...)      Lookbehind assertion, match if preceded
(?<!...)      Negative lookbehind assertion
(?(id)y|n)    Match 'y' if group 'id' matched, else 'n'

Flags for re.compile(), etc. Combine with '|'::

re.I == re.IGNORECASE   Ignore case
re.L == re.LOCALE       Make \w, \b, and \s locale dependent
re.M == re.MULTILINE    Multiline
re.S == re.DOTALL       Dot matches all (including newline)
re.U == re.UNICODE      Make \w, \b, \d, and \s unicode dependent
re.X == re.VERBOSE      Verbose (unescaped whitespace in pattern
                        is ignored, and '#' marks comment lines)

Module level functions::

compile(pattern[, flags]) -> RegexObject
match(pattern, string[, flags]) -> MatchObject
search(pattner, string[, flags]) -> MatchObject
findall(pattern, string[, flags]) -> list of strings
finditer(pattern, string[, flags]) -> iter of MatchObjects
split(pattern, string[, maxsplit, flags]) -> list of strings
sub(pattern, repl, string[, count, flags]) -> string
subn(pattern, repl, string[, count, flags]) -> (string, int)
escape(string) -> string
purge() # the re cache

RegexObjects (returned from compile())::

.match(string[, pos, endpos]) -> MatchObject
.search(string[, pos, endpos]) -> MatchObject
.findall(string[, pos, endpos]) -> list of strings
.finditer(string[, pos, endpos]) -> iter of MatchObjects
.split(string[, maxsplit]) -> list of strings
.sub(repl, string[, count]) -> string
.subn(repl, string[, count]) -> (string, int)
.flags      # int, Passed to compile()
.groups     # int, Number of capturing groups
.groupindex # {}, Maps group names to ints
.pattern    # string, Passed to compile()

MatchObjects (returned from match() and search())::

.expand(template) -> string, Backslash & group expansion
.group([group1...]) -> string or tuple of strings, 1 per arg
.groups([default]) -> tuple of all groups, non-matching=default
.groupdict([default]) -> {}, Named groups, non-matching=default
.start([group]) -> int, Start/end of substring match by group
.end([group]) -> int, Group defaults to 0, the whole match
.span([group]) -> tuple (match.start(group), match.end(group))
.pos       int, Passed to search() or match()
.endpos    int, "
.lastindex int, Index of last matched capturing group
.lastgroup string, Name of last matched capturing group
.re        regex, As passed to search() or match()
.string    string, "

In [ ]:

Python Regular Expression¶

1. 일반적인 문자는 그 문자 자체로 매칭이 된다. 특수 문자는 다음과 같다.¶

2. 집합기호 '[' 문자 이후에 사용할 수 있는 특수 문자는 다음과 같다.¶

2. 집합기호 '`[`' 문자 이후에 사용할 수 있는 특수 문자는 다음과 같다.¶