Python正则表达式re模块简明笔记(Python正则表达式re模块速成指南)

原创

ithorizon 6个月前 (10-19) 阅读数 30 #后端开发

Python正则表达式re模块简明笔记

一、正则表达式简介

正则表达式（Regular Expression，简称：Regex）是一种用于匹配字符串中字符组合的模式。Python 的 re 模块提供了对正则表达式的拥护，它允许我们进行复杂化的文本处理和模式匹配。

二、re 模块的基本使用

首先，我们需要导入 re 模块：


import re

接下来，我们可以使用 re 模块提供的方法进行正则表达式的匹配、搜索、替换等操作。

三、re.match() 和 re.search() 方法

re.match() 方法用于从字符串的起始位置起始匹配正则表达式，如果匹配胜利，则返回一个匹配对象；如果匹配落败，则返回 None。


import re
pattern = r"hello"
string = "hello world"
match = re.match(pattern, string)
if match:
    print("匹配胜利")
else:
    print("匹配落败")

re.search() 方法用于在字符串中搜索正则表达式，如果匹配胜利，则返回一个匹配对象；如果匹配落败，则返回 None。


import re
pattern = r"world"
string = "hello world"
match = re.search(pattern, string)
if match:
    print("匹配胜利")
else:
    print("匹配落败")

四、匹配规则

以下是正则表达式中常用的匹配规则：

.：匹配除换行符以外的任意字符。

\w：匹配字母、数字或下划线。

\W：匹配非字母、数字或下划线的字符。

\s：匹配任意空白字符。

\S：匹配非空白字符。

\d：匹配数字。

\D：匹配非数字。

[]：匹配括号内的任意一个字符。

[^]：匹配不在括号内的任意一个字符。

{}：匹配前面的子表达式出现指定次数。

+：匹配前面的子表达式至少出现一次。

?：匹配前面的子表达式最多出现一次。

*：匹配前面的子表达式出现任意次数。

|：匹配左右任意一个表达式。

五、re.sub() 方法

re.sub() 方法用于替换字符串中的匹配项。其基本语法如下：


re.sub(pattern, replacement, string, flags=0)

其中，pattern 为正则表达式，replacement 为替换内容，string 为待替换的字符串，flags 为可选参数，用于设置正则表达式的匹配标志。


import re
pattern = r"world"
replacement = "Python"
string = "hello world"
result = re.sub(pattern, replacement, string)
print(result)  # 输出：hello Python

六、re.split() 方法

re.split() 方法用于按照正则表达式拆分字符串。其基本语法如下：


re.split(pattern, string, maxsplit=0, flags=0)

其中，pattern 为正则表达式，string 为待拆分的字符串，maxsplit 为可选参数，描述最多拆分次数，flags 为可选参数，用于设置正则表达式的匹配标志。


import re
pattern = r"\s"
string = "hello world"
result = re.split(pattern, string)
print(result)  # 输出：['hello', 'world']

七、re.finditer() 方法

re.finditer() 方法用于在字符串中查找所有匹配正则表达式的子串，并返回一个迭代器。其基本语法如下：


re.finditer(pattern, string, flags=0)

其中，pattern 为正则表达式，string 为待查找的字符串，flags 为可选参数，用于设置正则表达式的匹配标志。


import re
pattern = r"\w"
string = "hello world"
matches = re.finditer(pattern, string)
for match in matches:
    print(match.group())  # 输出：'h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd'

八、re.compile() 方法

re.compile() 方法用于编译正则表达式，生成一个正则表达式对象。编译后的正则表达式对象可以重复使用，尽也许降低损耗匹配高效能。其基本语法如下：


re.compile(pattern, flags=0)

其中，pattern 为正则表达式，flags 为可选参数，用于设置正则表达式的匹配标志。


import re
pattern = r"\w"
compiled_pattern = re.compile(pattern)
string = "hello world"
match = compiled_pattern.search(string)
if match:
    print(match.group())  # 输出：'h'

九、正则表达式的高级用法

除了基本的匹配规则外，正则表达式还拥护一些高级用法，如分组、引用、前瞻、后瞻等。

9.1 分组

使用圆括号 () 可以将正则表达式分成多个组，方便后续操作。


import re
pattern = r"(\w+)\s(\w+)"
string = "hello world"
match = re.match(pattern, string)
if match:
    print(match.group(1))  # 输出：'hello'
    print(match.group(2))  # 输出：'world'

9.2 引用

使用反斜杠 \ 加上组号可以引用分组匹配的内容。


import re
pattern = r"(\w+)\s\1"
string = "hello hello"
match = re.match(pattern, string)
if match:
    print(match.group())  # 输出：'hello hello'

9.3 前瞻和后瞻

前瞻和后瞻用于检查字符串中的某个位置前后是否有指定的模式。

(?=...)：正向前瞻，检查后面是否跟有指定的模式。

(?<=...)：反向前瞻，检查前面是否跟有指定的模式。

(?!...)：负向前瞻，检查后面是否没有指定的模式。

(?<!...)：负向后瞻，检查前面是否没有指定的模式。


import re
pattern = r"(\w+)(?=\s\1)"
string = "hello hello world"
match = re.match(pattern, string)
if match:
    print(match.group())  # 输出：'hello'

十、总结

本文简要介绍了 Python re 模块的基本使用方法，包括匹配、搜索、替换、拆分、迭代匹配等。正则表达式是一种有力的文本处理工具，通过灵活运用 re 模块，我们可以高效地进行字符串处理和模式匹配。

文章标签：后端开发

上一篇：如何选好C++书籍？(如何选择优质C++学习书籍？) 下一篇：你的Django网站需要搜索功能吗？("Django网站如何快速添加高效搜索功能？")