统计文本中单词的出现次数

2738. 统计文本中单词的出现次数

表：Files

+-------------+---------+
| 列名        | 类型    |
+-- ----------+---------+
| file_name   | varchar |
| content     | text    |
+-------------+---------+
file_name 为该表的主键（具有唯一值的列）。
每行包含 file_name 和该文件的内容。

编写解决方案，找出单词 'bull' 和 'bear' 作为 独立词 出现的次数，不考虑任何它出现在两侧没有空格的情况（例如，'bullet', 'bears', 'bull.'，或者 'bear' 在句首或句尾不会被考虑）。

返回单词 'bull' 和 'bear' 以及它们对应的出现次数，顺序没有限制 。

结果的格式如下所示：

示例 1：

输入：
Files 表:
+------------+----------------------------------------------------------------------------------+
| file_name  | contenet                                                                         | 
+------------+----------------------------------------------------------------------------------+
| draft1.txt | The stock exchange predicts a bull market which would make many investors happy. | 
| draft2.txt | The stock exchange predicts a bull market which would make many investors happy, |
|            | but analysts warn of possibility of too much optimism and that in fact we are    |
|            | awaiting a bear market.                                                          | 
| draft3.txt | The stock exchange predicts a bull market which would make many investors happy, |
|            | but analysts warn of possibility of too much optimism and that in fact we are    |
|            | awaiting a bear market. As always predicting the future market is an uncertain   |
|            | game and all investors should follow their instincts and best practices.         | 
+------------+----------------------------------------------------------------------------------+
输出： 
+------+-------+
| word | count |  
+------+-------+
| bull | 3     | 
| bear | 2     | 
+------+-------+
解释：
- 单词 "bull" 在 "draft1.txt" 中出现1次，在 "draft2.txt" 中出现 1 次，在 "draft3.txt" 中出现 1 次。因此，单词 "bull" 的总出现次数为 3 次。
- 单词 "bear" 在 "draft2.txt" 中出现1次，在 "draft3.txt" 中出现 1 次。因此，单词 "bear" 的总出现次数为 2 次。

原站题解

去查看

上次编辑到这里，代码来自缓存点击恢复默认模板

# Write your MySQL query statement below

mysql 解法, 执行用时: 197 ms, 内存消耗: 0 B, 提交时间: 2023-10-15 17:24:31

# Write your MySQL query statement below

with t as(
    select  content from Files
), c as(
select content, replace(content,concat(' ' , 'bull' , ' '),concat(' ' , 'aaa' , ' ')) as lc, replace(content,concat(' ' , 'bear' , ' '),concat(' ' , 'bbb' , ' ')) as bc from t
), c1 as(
    select length(content) - length(lc) as bullnum, length(content)-length(bc) as bearnum  from c
), c2 as(
    select 'bull' as word,sum(case when bullnum >= 1 then 1 else 0 end ) as count from c1
    union
    select 'bear' as word,sum(case when bearnum >= 1 then 1 else 0 end ) as count from c1
)
select * from  c2

pythondata 解法, 执行用时: 248 ms, 内存消耗: 59.3 MB, 提交时间: 2023-10-15 17:24:03

import pandas as pd
import re
def count_occurrences(files: pd.DataFrame) -> pd.DataFrame:
    files['bull_count']=files['content'].str.count(r'.*\s\bbull\b\s.*').cumsum()
    files['bear_count']=files['content'].str.count(r'.*\s\bbear\b\s.*').cumsum()
    total_bull_count = files.iloc[-1]['bull_count']
    total_bear_count = files.iloc[-1]['bear_count']
    total_counts = {'bull': total_bull_count, 'bear': total_bear_count}
    df = pd.DataFrame(list(total_counts.items()), columns=['word', 'count'])
    return df

pythondata 解法, 执行用时: 224 ms, 内存消耗: 58.2 MB, 提交时间: 2023-10-15 17:23:46

import pandas as pd

def count_occurrences(files: pd.DataFrame) -> pd.DataFrame:
    bull_count = bear_count = 0
    for s in files['content']:
        if ' bull ' in s:
            bull_count += 1
        if ' bear ' in s:
            bear_count += 1
    return pd.DataFrame({'word': ['bull', 'bear'], 'count': [bull_count, bear_count]})

上一题

下一题

详情