DA30. 牛客网用户练习的平均次日留存率
描述
现有牛客网12月每天练习题目情况的数据集nowcoder.csv。包含如下字段(字段之间用逗号分隔):
现需要查看用户在某天练习后第二天还会再来练习的留存情况,请计算用户练习的平均次日留存率。
以上数据集中某天练习后第二天还会再来的用户数除以user_id总次数(不考虑重复情况)记为平均此日留存率,结果保留两位小数。
Python 3 解法, 执行用时: 787ms, 内存消耗: 524288KB, 提交时间: 2022-07-13
import pandas as pd from datetime import timedelta nowcoder = pd.read_csv('nowcoder.csv') total_id = nowcoder['user_id'].count() b=pd.merge(nowcoder,nowcoder,on='user_id') b['date_x']=pd.to_datetime(b.date_x).dt.date b['date_y']=pd.to_datetime(b.date_y).dt.date b['differ']=b.date_y-b.date_x sum_diff=b[b.differ=='1 days'].differ.count() res=round(sum_diff/total_id,2) print(res)
Python 3 解法, 执行用时: 792ms, 内存消耗: 524288KB, 提交时间: 2022-07-20
import pandas as pd from datetime import timedelta nowcoder = pd.read_csv('nowcoder.csv') total_count = nowcoder['user_id'].count() nowcoder['date'] = pd.to_datetime(nowcoder['date']).dt.date nowcoder['date_lag'] = nowcoder['date'] + timedelta(days = 1) nowcoder_info = pd.merge(nowcoder[['user_id','date']],nowcoder[['user_id','date_lag']],left_on = 'user_id', right_on = 'user_id',how = 'left') count_number = nowcoder_info.loc[nowcoder_info['date']==nowcoder_info['date_lag']]['user_id'].count() print(round(count_number/total_count,2))
Python 3 解法, 执行用时: 796ms, 内存消耗: 524288KB, 提交时间: 2022-07-13
import pandas as pd from datetime import timedelta nowcoder = pd.read_csv('nowcoder.csv') df = pd.merge(nowcoder, nowcoder, on='user_id', suffixes=['_a','_b']) df.date_a = pd.to_datetime(df.date_a).dt.date df.date_b = pd.to_datetime(df.date_b).dt.date df = df[(df.date_a + timedelta(days=1)) == df.date_b] all_num = nowcoder.user_id.count() again_num = df.user_id.count() print(round(again_num / all_num, 2))
Python 3 解法, 执行用时: 800ms, 内存消耗: 524288KB, 提交时间: 2022-07-25
import pandas as pd from datetime import timedelta nowcoder = pd.read_csv('nowcoder.csv') df = pd.merge(nowcoder,nowcoder,on='user_id',suffixes=['_a','_b']) df.date_a = pd.to_datetime(df.date_a).dt.date df.date_b = pd.to_datetime(df.date_b).dt.date df = df[(df.date_a+timedelta(days=1))==df.date_b] all_num = nowcoder.user_id.count() again_num = df.user_id.count() print(round(again_num/all_num,2))
Python 3 解法, 执行用时: 800ms, 内存消耗: 524288KB, 提交时间: 2022-07-15
import pandas as pd nowcoder = pd.read_csv('nowcoder.csv') df = pd.merge(nowcoder,nowcoder,on='user_id',suffixes=['_a','_b']) df.date_a = pd.to_datetime(df.date_a).dt.date df.date_b = pd.to_datetime(df.date_b).dt.date df1 = df[(df.date_a+ pd.Timedelta(days=1)) == df.date_b] again_num = df1.user_id.count() all_num = nowcoder.user_id.count() res = round(again_num/all_num,2) print(res)