# Write your MySQL query statement below
2738. 统计文本中单词的出现次数
表:Files
+-------------+---------+ | 列名 | 类型 | +-- ----------+---------+ | file_name | varchar | | content | text | +-------------+---------+ file_name 为该表的主键(具有唯一值的列)。 每行包含 file_name 和该文件的内容。
编写解决方案,找出单词 'bull' 和 'bear' 作为 独立词 出现的次数,不考虑任何它出现在两侧没有空格的情况(例如,'bullet', 'bears', 'bull.',或者 'bear' 在句首或句尾 不会 被考虑)。
返回单词 'bull' 和 'bear' 以及它们对应的出现次数,顺序没有限制 。
结果的格式如下所示:
示例 1:
输入: Files 表: +------------+----------------------------------------------------------------------------------+ | file_name | contenet | +------------+----------------------------------------------------------------------------------+ | draft1.txt | The stock exchange predicts a bull market which would make many investors happy. | | draft2.txt | The stock exchange predicts a bull market which would make many investors happy, | | | but analysts warn of possibility of too much optimism and that in fact we are | | | awaiting a bear market. | | draft3.txt | The stock exchange predicts a bull market which would make many investors happy, | | | but analysts warn of possibility of too much optimism and that in fact we are | | | awaiting a bear market. As always predicting the future market is an uncertain | | | game and all investors should follow their instincts and best practices. | +------------+----------------------------------------------------------------------------------+ 输出: +------+-------+ | word | count | +------+-------+ | bull | 3 | | bear | 2 | +------+-------+ 解释: - 单词 "bull" 在 "draft1.txt" 中出现1次,在 "draft2.txt" 中出现 1 次,在 "draft3.txt" 中出现 1 次。因此,单词 "bull" 的总出现次数为 3 次。 - 单词 "bear" 在 "draft2.txt" 中出现1次,在 "draft3.txt" 中出现 1 次。因此,单词 "bear" 的总出现次数为 2 次。
原站题解
mysql 解法, 执行用时: 197 ms, 内存消耗: 0 B, 提交时间: 2023-10-15 17:24:31
# Write your MySQL query statement below with t as( select content from Files ), c as( select content, replace(content,concat(' ' , 'bull' , ' '),concat(' ' , 'aaa' , ' ')) as lc, replace(content,concat(' ' , 'bear' , ' '),concat(' ' , 'bbb' , ' ')) as bc from t ), c1 as( select length(content) - length(lc) as bullnum, length(content)-length(bc) as bearnum from c ), c2 as( select 'bull' as word,sum(case when bullnum >= 1 then 1 else 0 end ) as count from c1 union select 'bear' as word,sum(case when bearnum >= 1 then 1 else 0 end ) as count from c1 ) select * from c2
pythondata 解法, 执行用时: 248 ms, 内存消耗: 59.3 MB, 提交时间: 2023-10-15 17:24:03
import pandas as pd import re def count_occurrences(files: pd.DataFrame) -> pd.DataFrame: files['bull_count']=files['content'].str.count(r'.*\s\bbull\b\s.*').cumsum() files['bear_count']=files['content'].str.count(r'.*\s\bbear\b\s.*').cumsum() total_bull_count = files.iloc[-1]['bull_count'] total_bear_count = files.iloc[-1]['bear_count'] total_counts = {'bull': total_bull_count, 'bear': total_bear_count} df = pd.DataFrame(list(total_counts.items()), columns=['word', 'count']) return df
pythondata 解法, 执行用时: 224 ms, 内存消耗: 58.2 MB, 提交时间: 2023-10-15 17:23:46
import pandas as pd def count_occurrences(files: pd.DataFrame) -> pd.DataFrame: bull_count = bear_count = 0 for s in files['content']: if ' bull ' in s: bull_count += 1 if ' bear ' in s: bear_count += 1 return pd.DataFrame({'word': ['bull', 'bear'], 'count': [bull_count, bear_count]})