统计词频

192. 统计词频

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

words.txt只包括小写字母和 ' ' 。
每个单词只由小写字母组成。
单词间由一个或多个空格字符分隔。

示例:

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is

你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1

说明:

不要担心词频相同的单词的排序问题，每个单词出现的频率都是唯一的。
你可以使用一行 Unix pipes 实现吗？

相似题目

前 K 个高频元素

原站题解

去查看

上次编辑到这里，代码来自缓存点击恢复默认模板

# Read from the file words.txt and output the word frequency list to stdout.

bash 解法, 执行用时: 98 ms, 内存消耗: 3.9 MB, 提交时间: 2024-12-18 10:47:34

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2,$1}'

bash 解法, 执行用时: 423 ms, 内存消耗: 3.7 MB, 提交时间: 2024-05-28 00:36:47

cat words.txt | xargs -n 1 | awk '{
    if($1 in data)
        data[$1] = data[$1] + 1
    else
        data[$1] = 1
 } END {for(str in data) print data[str],str}' | sort -rn | awk '{print $2, $1}'

bash 解法, 执行用时: 12 ms, 内存消耗: N/A, 提交时间: 2018-08-21 19:05:50

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | sed 's/ /\n/g' | sed '/^$/d' | sort | uniq -c | awk '{print $2, $1}' | sort -nrk2

上一题

下一题

详情