# Write your MySQL query statement below
585. 2016年的投资
写一个查询语句,将 2016 年 (TIV_2016) 所有成功投资的金额加起来,保留 2 位小数。
对于一个投保人,他在 2016 年成功投资的条件是:
输入格式:
表 insurance 格式如下:
| Column Name | Type | |-------------|---------------| | PID | INTEGER(11) | | TIV_2015 | NUMERIC(15,2) | | TIV_2016 | NUMERIC(15,2) | | LAT | NUMERIC(5,2) | | LON | NUMERIC(5,2) |
PID 字段是投保人的投保编号, TIV_2015 是该投保人在2015年的总投保金额, TIV_2016 是该投保人在2016年的投保金额, LAT 是投保人所在城市的维度, LON 是投保人所在城市的经度。
样例输入
| PID | TIV_2015 | TIV_2016 | LAT | LON | |-----|----------|----------|-----|-----| | 1 | 10 | 5 | 10 | 10 | | 2 | 20 | 20 | 20 | 20 | | 3 | 10 | 30 | 20 | 20 | | 4 | 10 | 40 | 40 | 40 |
样例输出
| TIV_2016 | |----------| | 45.00 |
解释
就如最后一个投保人,第一个投保人同时满足两个条件: 1. 他在 2015 年的投保金额 TIV_2015 为 '10' ,与第三个和第四个投保人在 2015 年的投保金额相同。 2. 他所在城市的经纬度是独一无二的。 第二个投保人两个条件都不满足。他在 2015 年的投资 TIV_2015 与其他任何投保人都不相同。 且他所在城市的经纬度与第三个投保人相同。基于同样的原因,第三个投保人投资失败。 所以返回的结果是第一个投保人和最后一个投保人的 TIV_2016 之和,结果是 45 。
原站题解
pythondata 解法, 执行用时: 292 ms, 内存消耗: 59.8 MB, 提交时间: 2023-09-17 10:50:17
import pandas as pd def find_investments(insurance: pd.DataFrame) -> pd.DataFrame: cond1 = insurance['tiv_2015'].duplicated(keep=False) cond2 = ~insurance[['lat','lon']].duplicated(keep=False) _sum = insurance[cond1 & cond2]['tiv_2016'].sum() return pd.DataFrame({'tiv_2016':[round(_sum,2)]}) def find_investments2(insurance: pd.DataFrame) -> pd.DataFrame: df1 = insurance.groupby('tiv_2015').size().reset_index(name='count_tiv') df2 = insurance.groupby(['lat', 'lon']).size().reset_index(name='sample') df = insurance.merge(df1, on='tiv_2015').merge(df2, on=['lat', 'lon']) df['tiv_2016'] = df[(df['count_tiv'] > 1) & (df['sample'] == 1)]['tiv_2016'].sum() return df[['tiv_2016']].head(1) def find_investments3(insurance: pd.DataFrame) -> pd.DataFrame: #获取相同的tiv_2015 repetition_df = insurance['tiv_2015'].value_counts().reset_index() repetition_df=repetition_df.loc[repetition_df['count'] > 1, 'tiv_2015'] #获取不同的lat和lon different_df = insurance.groupby(['lat', 'lon'])['pid'].nunique().reset_index() different_df = different_df[different_df['pid'] == 1] different_df.drop('pid', axis=1, inplace=True) #从原表中剔除tiv_2015的唯一值 insurance = insurance[insurance['tiv_2015'].isin(repetition_df)] #获取lat和lon表中都不同的数据 lat_df = insurance[insurance['lat'].isin(different_df['lat'])] lon_df = insurance[insurance['lon'].isin(different_df['lon'])] #两表取交集,并且计算总和 #print(pd.merge(lat_df, lon_df)['tiv_2016']) max=pd.merge(lat_df, lon_df)['tiv_2016'].sum() return pd.DataFrame({'tiv_2016':[round(max,2)]})
mysql 解法, 执行用时: 308 ms, 内存消耗: 0 B, 提交时间: 2023-04-02 12:23:48
# Write your MySQL query statement below SELECT ROUND(SUM(TIV_2016), 2) AS TIV_2016 FROM ( SELECT *, COUNT(*) OVER (PARTITION BY TIV_2015 ) AS C1, COUNT(*) OVER (PARTITION BY LAT, LON ) AS C2 FROM INSURANCE ) A WHERE C1 > 1 AND C2 = 1;
mysql 解法, 执行用时: 312 ms, 内存消耗: 0 B, 提交时间: 2023-04-02 12:23:16
# Write your MySQL query statement below select round(sum(tiv_2016), 2) tiv_2016 from insurance where tiv_2015 in ( select tiv_2015 from insurance group by tiv_2015 having count(*) > 1 ) and concat(lat, lon) in ( select concat(lat, lon) from insurance group by lat, lon having count(*) = 1 );
mysql 解法, 执行用时: 347 ms, 内存消耗: 0 B, 提交时间: 2023-04-02 12:23:02
# Write your MySQL query statement below select round(sum(tiv_2016), 2) tiv_2016 from ( select tiv_2016, count(*) over(partition by tiv_2015) count_tiv_2015, count(*) over(partition by lat, lon) count_lat_lon from insurance ) as temp where count_lat_lon = 1 and count_tiv_2015 > 1