首页 新闻 会员 周边

读取数据出错该怎么处理

0
悬赏园豆:10 [已解决问题] 解决于 2021-12-24 21:35

test0
ts_code trade_date open high low close pre_close
002262,20200609,126.7688,139.26,126.7688,139.26,126.3
002262,20200608,120.5232,129.4696,117.4848,126.6,119.0884
002262,20200605,117.8224,122.2112,116.5564,119.0884,117.1472
002262,20200604,117.316,117.4848,115.206,117.1472,117.2316
002262,20200603,117.1472,119.1728,116.05,117.2316,117.316
002262,20200602,118.582,119.5948,115.8812,117.316,118.8352
002262,20200601,116.5564,119.6792,116.3876,118.8352,116.05
002262,20200529,114.8684,117.8224,114.5308,116.05,115.7124
002262,20200528,119.3416,119.3416,113.518,115.7124,119.426
002262,20200527,125.1652,125.5028,118.0756,119.426,125.9248
002262,20200526,125.2496,126.7688,122.8864,125.9248,125.0808
002262,20200525,118.0756,126.2624,116.9784,125.0808,117.4004
002262,20200522,121.1984,122.2112,116.7252,117.4004,120.6076
002262,20200521,117.316,124.2368,116.5564,120.6076,116.3032
002262,20200520,119.6792,121.2828,114.784,116.3032,119.5104
002262,20200519,121.2828,122.802,117.9068,119.5104,119.9324
002262,20200518,118.8352,121.6204,117.738,119.9324,119.004

test1
ts_code,date_start,date_end,close
002262,20200602,20200608,170.0299
002262,20190519,20200528,85.5816

读取test0的位置在test1里,该怎么表示才真确
结果
test0= test0.mask(test0['trade_date'] == test1.loc['date_start': 'date_end'])
Unnamed: 0 ts_code trade_date open high low close pre_close
0 NaN NaN NaT NaN NaN NaN NaN
1 NaN NaN NaT NaN NaN NaN NaN

baijianyun12345的主页 baijianyun12345 | 初学一级 | 园豆:193
提问于:2021-12-17 16:22
< >
分享
最佳答案
0

你是要合并还是取出来就行了

收获园豆:10
〆灬丶 | 老鸟四级 |园豆:2287 | 2021-12-17 16:49

你自己看着调整吧,合并就把 append 换成 concat 或 merge

res_df = pd.DataFrame()
for index, row in df1.iterrows():
    _start = row['date_start']
    _end = row['date_end']
    tmp = df0.loc[(df0['trade_date']>20200602)&(df0['trade_date']<20200608)]
    print(tmp)
    res_df.append(tmp)
res_df

结果

   ts_code  trade_date      open      high       low     close  pre_close
2     2262    20200605  117.8224  122.2112  116.5564  119.0884   117.1472
3     2262    20200604  117.3160  117.4848  115.2060  117.1472   117.2316
4     2262    20200603  117.1472  119.1728  116.0500  117.2316   117.3160
   ts_code  trade_date      open      high       low     close  pre_close
2     2262    20200605  117.8224  122.2112  116.5564  119.0884   117.1472
3     2262    20200604  117.3160  117.4848  115.2060  117.1472   117.2316
4     2262    20200603  117.1472  119.1728  116.0500  117.2316   117.3160
〆灬丶 | 园豆:2287 (老鸟四级) | 2021-12-17 17:34

取出test0的那段时间的数据

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-17 18:03

@韆: 我的test1不光两条数据,有很多条,该怎么表示,谢谢

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-17 18:16

@韆: ts_code,date_start,date_end,close
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000002.SZ,2019-12-14,2020-03-23,3643.5146
000002.SZ,2020-08-19,2020-11-27,4864.8106
000004.SZ,2019-10-27,2020-02-04,80.1827
000004.SZ,2019-11-23,2020-03-02,181.9453
000005.SZ,2020-09-19,2020-12-28,23.0773
000005.SZ,2020-09-21,2020-12-30,23.0773
000005.SZ,2020-04-04,2020-07-13,30.6771
000006.SZ,2019-12-23,2020-04-01,156.9372
000006.SZ,2020-04-04,2020-07-13,333.3089

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-17 18:19

感谢你的回复,不过不行,我的test0保存的是所有股票的历史数据,不是一个,test1保存的是所有股票去年2020最高价和最低价的前100天开始和结束时间,我的设想提取test0里所有股票2020年最高价和最低价前100天的数据,不知道该怎么表示,谢谢

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-18 11:04

@baijianyun12345:
我贴的代码你只要运行过,就不会发 “我的test1不光两条数据,有很多条” ;或者说你并不关注其中的变量,只关注了结果
另外,你的设想你的实现,最终都是你自己的工作,你不去动手谁能一直帮你

res_df = pd.DataFrame()
res_list = []


for index, row in df1.iterrows():
    _start = row['date_start']
    _end = row['date_end']
    tmp = df0.loc[(df0['trade_date']>_start)&(df0['trade_date']<_end)]
    print(tmp)
    res_list.append(tmp)
    
if res_list:
    res_df = pd.concat(res_list)

res_df
〆灬丶 | 园豆:2287 (老鸟四级) | 2021-12-20 10:21

@韆: 老师批评的是,不过我确实当天6点左右试过不行,我看到你的是时间我就替换成_start进去试的

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-20 18:27

还是不行,老师,乱跳,一直在000001来回跳,很久才算000001后面的,运算出来结果也不是我想要的

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-20 19:56

@baijianyun12345:
1.我当不起老师
2.你运行时的 df0、df1 分别是啥;我调试代码用的就是你提问的 test 0, test 1
PS:你的 test 0 并不能直接做 csv 读取,title 没有逗号分隔

〆灬丶 | 园豆:2287 (老鸟四级) | 2021-12-21 09:35

@韆: df0就是test0,也就是全部A股股票的历史数据,df1就是test1,也就是存储着我要提取test0数据的位置数据啊,我是这样理解的。

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 11:04

我的代码,哪里错的,请帮我改改:
path = 'C:/Users/Administrator/Desktop/stock333/stock_hfq'
def get_stock_code_list_in_one_dir(path):
"""
从指定文件夹下,导入所有csv文件的文件名
:param path:
:return:
"""
stock_list = []

# 系统自带函数os.walk,用于遍历文件夹中的所有文件
for root, dirs, files in os.walk(path):
    if files:  # 当files不为空的时候
        for f in files:
            if f.endswith('.csv'):
                stock_list.append(f[:9])
        #print(files)

return sorted(stock_list)

stockk_list = get_stock_code_list_in_one_dir(path)
all_stock=pd.DataFrame()
for code in stockk_list:
data = pd.read_csv(path + '/%s.csv' % code, header=0,encoding='gbk')
data['trade_date']=pd.to_datetime(data['trade_date'].astype('string'))
data.sort_values(by = 'trade_date',inplace=True)
df=pd.read_csv("C:/Users/Administrator/Desktop/bbb.csv")
print(data)

res_df = pd.DataFrame()
res_list = []
for index, row in df.iterrows():
    _start = row['date_start']
    _end = row['date_end']
    tmp = data.loc[(data['trade_date']>_start)&(data['trade_date']<_end)]
    print(tmp)
    res_list.append(tmp)
    
if res_list:
    res_df = pd.concat(res_list)

res_df
res_df.to_csv("C:/Users/Administrator/Desktop/ppp.csv",mode='a', index=False)
baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 11:16

我用2只股票数据来实验的结果,前面都对
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000560.SZ,2020-02-12,2020-05-22,15.9701
000560.SZ,2020-02-15,2020-05-25,15.9701
算000560就不对了,股票代码和时间就对不上了
247,000001.SZ,2020-11-27,2220.96,2220.96,2152.1102,2187.6456,2165.436,22.20959999999968,1.0256
246,000001.SZ,2020-11-30,2209.8552,2318.6822,2175.4303,2192.0875,2187.6456,4.44190000000026,0.203
441,000001.SZ,2020-02-13,1605.876,1624.4347,1594.9591,1599.3258,1612.4261,-13.100299999999834,-0.8125
440,000001.SZ,2020-02-14,1610.2427,1652.8187,1604.7843,1640.8101,1599.3258,41.48429999999985,2.5939
439,000001.SZ,2020-02-17,1641.9018,1677.9275,1629.8932,1677.9275,1640.8101,37.11740000000009,2.2621

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 23:47

@baijianyun12345: 只解决了部分问题,还是给你吧。。。

baijianyun12345 | 园豆:193 (初学一级) | 2021-12-24 21:37
其他回答(1)
0

你最终是想表示成什么样的?

wang_yb | 园豆:4891 (老鸟四级) | 2021-12-17 17:26

test1里保存着test0需要的代码段,还有开始和结束时间,想读取这部分时间的全部数据保存在加工

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-17 18:08

@baijianyun12345: 所有数据的 ts_code 都是一样的,依据什么合并 test0 和 test1 呢?

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-18 17:23

@wang_yb: 我的test0保存的是所有股票的历史数据,test1保存的是所有股票去年2020最高价和最低价的前100天开始和结束时间,我的设想提取test0里所有股票2020年最高价和最低价前100天的数据,不知道该怎么表示,不是要合并而是提取test0里每支股票去年最高价和最低价的前100天的数据,谢谢

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-18 17:39

@baijianyun12345: 你好,
看数据的例子,test1是每周收盘价,test0是每天交易的 4个 kline 价格,100天是从哪看出来的?
从test1似乎看不出有100天的间隔。

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-21 01:49

@wang_yb:
我不想回答他了,交给你了
提问的初始数据(test0 title 补充了逗号)

# test0
ts_code,trade_date,open,high,low,close,pre_close
002262,20200609,126.7688,139.26,126.7688,139.26,126.3
002262,20200608,120.5232,129.4696,117.4848,126.6,119.0884
002262,20200605,117.8224,122.2112,116.5564,119.0884,117.1472
002262,20200604,117.316,117.4848,115.206,117.1472,117.2316
002262,20200603,117.1472,119.1728,116.05,117.2316,117.316
002262,20200602,118.582,119.5948,115.8812,117.316,118.8352
002262,20200601,116.5564,119.6792,116.3876,118.8352,116.05
002262,20200529,114.8684,117.8224,114.5308,116.05,115.7124
002262,20200528,119.3416,119.3416,113.518,115.7124,119.426
002262,20200527,125.1652,125.5028,118.0756,119.426,125.9248
002262,20200526,125.2496,126.7688,122.8864,125.9248,125.0808
002262,20200525,118.0756,126.2624,116.9784,125.0808,117.4004
002262,20200522,121.1984,122.2112,116.7252,117.4004,120.6076
002262,20200521,117.316,124.2368,116.5564,120.6076,116.3032
002262,20200520,119.6792,121.2828,114.784,116.3032,119.5104
002262,20200519,121.2828,122.802,117.9068,119.5104,119.9324
002262,20200518,118.8352,121.6204,117.738,119.9324,119.004

# test1
ts_code,date_start,date_end,close
002262,20200602,20200608,170.0299
002262,20190519,20200528,85.5816

代码

res_df = pd.DataFrame()
res_list = []

df0 = pd.read_csv('./0.csv')
df1 = pd.read_csv('./1.csv')


for index, row in df1.iterrows():
    _start = row['date_start']
    _end = row['date_end']
    tmp = df0.loc[(df0['trade_date']>_start)&(df0['trade_date']<_end)]
    print(tmp)
    res_list.append(tmp)
    
if res_list:
    res_df = pd.concat(res_list)

res_df
支持(0) 反对(0) 〆灬丶 | 园豆:2287 (老鸟四级) | 2021-12-21 14:54

@wang_yb: test1每个代码有两行数据,第一行是002262这个股票2020年最高价的开始date_start和结束date_end时间,第二行是002262这个股票2020年最低价的开始date_start和结束date_end时间,再往下就是下只股票2020的最高价和最低价的时间,我的意识是,以test1的股票代码还有开始和结束时间这三个数据定位test0我需要的时间段数据

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 15:22

@wang_yb: 还有这数据不是原始数据,原始数据太长发不出来,发的这个数据是为了便于理解才这么发的
test0就是全部股票历史数据一只股票结一只大家都好理解
test1:
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000002.SZ,2019-12-14,2020-03-23,3643.5146
000002.SZ,2020-08-19,2020-11-27,4864.8106
000004.SZ,2019-10-27,2020-02-04,80.1827
000004.SZ,2019-11-23,2020-03-02,181.9453
000005.SZ,2020-09-19,2020-12-28,23.0773
000005.SZ,2020-09-21,2020-12-30,23.0773
000005.SZ,2020-04-04,2020-07-13,30.6771
000006.SZ,2019-12-23,2020-04-01,156.9372
000006.SZ,2020-04-04,2020-07-13,333.3089

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 15:29

@韆: 你这个应该就是提问这个哥们需要的。

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-21 16:46

@baijianyun12345: 这个回复里面的 韆 这个哥们给的代码就是满足你的需求的。
打印出来的 tmp 就是test1 中每个时间段在 test0中的数据

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-21 16:48

@wang_yb: 问题就是不行啊,一只股票好像可以,可我的是每支股票2020年最高价和最低价前100天数据,把他的代码放进去可以执行,就是算很久,结果也不对,比如第一只,我想要的是test0里000001最低价2019-12-14到2020-03-23这段数据,和000001最高价,2020-08-23到2020-12-01这段数据,以此类推每支。。。。

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-21 17:43

@baijianyun12345: 兄弟,得把你的代码,数据示例,你的执行结果是哪不对等等这些都贴出来啊。

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-22 12:22

@wang_yb:
我用2只股票数据来实验的结果,前面都对
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000560.SZ,2020-02-12,2020-05-22,15.9701
000560.SZ,2020-02-15,2020-05-25,15.9701
算000560就不对了,股票代码和时间就对不上了
247,000001.SZ,2020-11-27,2220.96,2220.96,2152.1102,2187.6456,2165.436,22.20959999999968,1.0256
246,000001.SZ,2020-11-30,2209.8552,2318.6822,2175.4303,2192.0875,2187.6456,4.44190000000026,0.203
441,000001.SZ,2020-02-13,1605.876,1624.4347,1594.9591,1599.3258,1612.4261,-13.100299999999834,-0.8125
440,000001.SZ,2020-02-14,1610.2427,1652.8187,1604.7843,1640.8101,1599.3258,41.48429999999985,2.5939
439,000001.SZ,2020-02-17,1641.9018,1677.9275,1629.8932,1677.9275,1640.8101,37.11740000000009,2.2621
第3行441,应该是000560.SZ,2020-02-13,可还是441,000001.SZ,2020-02-13,该换股票代码000560可还是000001

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-22 13:12

@baijianyun12345: 用的是 @韆 的代码吧?

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-22 22:43

@wang_yb:
最后是用的他的啊,前面就是循环读取全部股票数据,后面用他的就成了执行第一只截取时间要把所有时间都执行了,才运行第二只再运行所有时间,而我要的,000001只要是股票代码的相关的最高和最低时间段,以此内推,上面有我回给@韆 的代码就是全部代码,你如果看不到,我就再给你发一遍

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-23 09:22

@baijianyun12345: 兄弟,通过这个网页沟通太费劲了。。。 我也搞不定,看看有没有其他大佬能帮上吧

支持(0) 反对(0) wang_yb | 园豆:4891 (老鸟四级) | 2021-12-23 17:22

@wang_yb: 怎么了,我再给你发一遍嘛,确实编程有老师带还好,兴趣爱好还是费力,能帮就帮吧,不能也很谢谢你。
path = 'C:/Users/Administrator/Desktop/stock333/stock_hfq'
def get_stock_code_list_in_one_dir(path):
"""
从指定文件夹下,导入所有csv文件的文件名
:param path:
:return:
"""
stock_list = []

系统自带函数os.walk,用于遍历文件夹中的所有文件

for root, dirs, files in os.walk(path):
if files: # 当files不为空的时候
for f in files:
if f.endswith('.csv'):
stock_list.append(f[:9])
#print(files)

return sorted(stock_list)
stockk_list = get_stock_code_list_in_one_dir(path)
all_stock=pd.DataFrame()
for code in stockk_list:
data = pd.read_csv(path + '/%s.csv' % code, header=0,encoding='gbk')
data['trade_date']=pd.to_datetime(data['trade_date'].astype('string'))
data.sort_values(by = 'trade_date',inplace=True)
df=pd.read_csv("C:/Users/Administrator/Desktop/bbb.csv")
print(data)

res_df = pd.DataFrame()
res_list = []
for index, row in df.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = data.loc[(data['trade_date']>_start)&(data['trade_date']<_end)]
print(tmp)
res_list.append(tmp)

if res_list:
res_df = pd.concat(res_list)

res_df
res_df.to_csv("C:/Users/Administrator/Desktop/ppp.csv",mode='a', index=False)

支持(0) 反对(0) baijianyun12345 | 园豆:193 (初学一级) | 2021-12-23 18:36
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册