【Python】从文本字符串中提取数字、电话号码、日期、网址的方法汇总（全！）

原创

ithorizon 7个月前 (09-16) 阅读数 179 #Python

从文本字符串中提取信息的方法汇总

从文本字符串中提取数字、电话号码、日期、网址的方法汇总（全！）

在Python中，我们可以利用正则表达式（Regular Expression）和一些专门的库，如re，来从文本字符串中提取我们想要的特定信息。以下是一些常用信息的提取方法。

1. 提取数字

要提取字符串中的所有数字，可以使用以下正则表达式：

\d+

下面是相应的Python代码示例：


import re
text = "There are 123 numbers in this 456 text."
numbers = re.findall(r'\d+', text)
print(numbers)  # 输出：['123', '456']

2. 提取电话号码

提取电话号码稍微繁复些，出于电话号码的格式多种多样。这里是一个基本的正则表达式例子：


1\d{10}

假设我们只关心中国大陆的电话号码，以下为代码示例：


import re
text = "My phone number is 13812345678, you can also call me at 123-4567-8901."
phone_numbers = re.findall(r'1\d{10}', text)
print(phone_numbers)  # 输出：['13812345678']

3. 提取日期

日期的提取依赖性于具体的日期格式，以下是提取常见YYYY-MM-DD格式日期的例子：


\d{4}-\d{2}-\d{2}

代码示例：


import re
text = "Today's date is 2023-11-08 and tomorrow is 2023-11-09."
dates = re.findall(r'\d{4}-\d{2}-\d{2}', text)
print(dates)  # 输出：['2023-11-08', '2023-11-09']

4. 提取网址

提取网址URL可以通过以下正则表达式实现：


https?://[a-zA-Z0-9./]+

代码示例：


import re
text = "Visit our website at http://www.example.com or https://www.example2.com."
urls = re.findall(r'https?://[a-zA-Z0-9./]+', text)
print(urls)  # 输出：['http://www.example.com', 'https://www.example2.com']

以上例子展示了基础的信息提取方法，实际应用中也许需要采取具体情况进行调整。

文章标签： Python