test0
ts_code trade_date open high low close pre_close
002262,20200609,126.7688,139.26,126.7688,139.26,126.3
002262,20200608,120.5232,129.4696,117.4848,126.6,119.0884
002262,20200605,117.8224,122.2112,116.5564,119.0884,117.1472
002262,20200604,117.316,117.4848,115.206,117.1472,117.2316
002262,20200603,117.1472,119.1728,116.05,117.2316,117.316
002262,20200602,118.582,119.5948,115.8812,117.316,118.8352
002262,20200601,116.5564,119.6792,116.3876,118.8352,116.05
002262,20200529,114.8684,117.8224,114.5308,116.05,115.7124
002262,20200528,119.3416,119.3416,113.518,115.7124,119.426
002262,20200527,125.1652,125.5028,118.0756,119.426,125.9248
002262,20200526,125.2496,126.7688,122.8864,125.9248,125.0808
002262,20200525,118.0756,126.2624,116.9784,125.0808,117.4004
002262,20200522,121.1984,122.2112,116.7252,117.4004,120.6076
002262,20200521,117.316,124.2368,116.5564,120.6076,116.3032
002262,20200520,119.6792,121.2828,114.784,116.3032,119.5104
002262,20200519,121.2828,122.802,117.9068,119.5104,119.9324
002262,20200518,118.8352,121.6204,117.738,119.9324,119.004
test1
ts_code,date_start,date_end,close
002262,20200602,20200608,170.0299
002262,20190519,20200528,85.5816
读取test0的位置在test1里,该怎么表示才真确
结果
test0= test0.mask(test0['trade_date'] == test1.loc['date_start': 'date_end'])
Unnamed: 0 ts_code trade_date open high low close pre_close
0 NaN NaN NaT NaN NaN NaN NaN
1 NaN NaN NaT NaN NaN NaN NaN
你是要合并还是取出来就行了
你自己看着调整吧,合并就把 append 换成 concat 或 merge
res_df = pd.DataFrame()
for index, row in df1.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = df0.loc[(df0['trade_date']>20200602)&(df0['trade_date']<20200608)]
print(tmp)
res_df.append(tmp)
res_df
结果
ts_code trade_date open high low close pre_close
2 2262 20200605 117.8224 122.2112 116.5564 119.0884 117.1472
3 2262 20200604 117.3160 117.4848 115.2060 117.1472 117.2316
4 2262 20200603 117.1472 119.1728 116.0500 117.2316 117.3160
ts_code trade_date open high low close pre_close
2 2262 20200605 117.8224 122.2112 116.5564 119.0884 117.1472
3 2262 20200604 117.3160 117.4848 115.2060 117.1472 117.2316
4 2262 20200603 117.1472 119.1728 116.0500 117.2316 117.3160
取出test0的那段时间的数据
@韆: 我的test1不光两条数据,有很多条,该怎么表示,谢谢
@韆: ts_code,date_start,date_end,close
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000002.SZ,2019-12-14,2020-03-23,3643.5146
000002.SZ,2020-08-19,2020-11-27,4864.8106
000004.SZ,2019-10-27,2020-02-04,80.1827
000004.SZ,2019-11-23,2020-03-02,181.9453
000005.SZ,2020-09-19,2020-12-28,23.0773
000005.SZ,2020-09-21,2020-12-30,23.0773
000005.SZ,2020-04-04,2020-07-13,30.6771
000006.SZ,2019-12-23,2020-04-01,156.9372
000006.SZ,2020-04-04,2020-07-13,333.3089
感谢你的回复,不过不行,我的test0保存的是所有股票的历史数据,不是一个,test1保存的是所有股票去年2020最高价和最低价的前100天开始和结束时间,我的设想提取test0里所有股票2020年最高价和最低价前100天的数据,不知道该怎么表示,谢谢
@baijianyun12345:
我贴的代码你只要运行过,就不会发 “我的test1不光两条数据,有很多条” ;或者说你并不关注其中的变量,只关注了结果
另外,你的设想你的实现,最终都是你自己的工作,你不去动手谁能一直帮你
res_df = pd.DataFrame()
res_list = []
for index, row in df1.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = df0.loc[(df0['trade_date']>_start)&(df0['trade_date']<_end)]
print(tmp)
res_list.append(tmp)
if res_list:
res_df = pd.concat(res_list)
res_df
@韆: 老师批评的是,不过我确实当天6点左右试过不行,我看到你的是时间我就替换成_start进去试的
还是不行,老师,乱跳,一直在000001来回跳,很久才算000001后面的,运算出来结果也不是我想要的
@baijianyun12345:
1.我当不起老师
2.你运行时的 df0、df1 分别是啥;我调试代码用的就是你提问的 test 0, test 1
PS:你的 test 0 并不能直接做 csv 读取,title 没有逗号分隔
@韆: df0就是test0,也就是全部A股股票的历史数据,df1就是test1,也就是存储着我要提取test0数据的位置数据啊,我是这样理解的。
我的代码,哪里错的,请帮我改改:
path = 'C:/Users/Administrator/Desktop/stock333/stock_hfq'
def get_stock_code_list_in_one_dir(path):
"""
从指定文件夹下,导入所有csv文件的文件名
:param path:
:return:
"""
stock_list = []
# 系统自带函数os.walk,用于遍历文件夹中的所有文件
for root, dirs, files in os.walk(path):
if files: # 当files不为空的时候
for f in files:
if f.endswith('.csv'):
stock_list.append(f[:9])
#print(files)
return sorted(stock_list)
stockk_list = get_stock_code_list_in_one_dir(path)
all_stock=pd.DataFrame()
for code in stockk_list:
data = pd.read_csv(path + '/%s.csv' % code, header=0,encoding='gbk')
data['trade_date']=pd.to_datetime(data['trade_date'].astype('string'))
data.sort_values(by = 'trade_date',inplace=True)
df=pd.read_csv("C:/Users/Administrator/Desktop/bbb.csv")
print(data)
res_df = pd.DataFrame()
res_list = []
for index, row in df.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = data.loc[(data['trade_date']>_start)&(data['trade_date']<_end)]
print(tmp)
res_list.append(tmp)
if res_list:
res_df = pd.concat(res_list)
res_df
res_df.to_csv("C:/Users/Administrator/Desktop/ppp.csv",mode='a', index=False)
我用2只股票数据来实验的结果,前面都对
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000560.SZ,2020-02-12,2020-05-22,15.9701
000560.SZ,2020-02-15,2020-05-25,15.9701
算000560就不对了,股票代码和时间就对不上了
247,000001.SZ,2020-11-27,2220.96,2220.96,2152.1102,2187.6456,2165.436,22.20959999999968,1.0256
246,000001.SZ,2020-11-30,2209.8552,2318.6822,2175.4303,2192.0875,2187.6456,4.44190000000026,0.203
441,000001.SZ,2020-02-13,1605.876,1624.4347,1594.9591,1599.3258,1612.4261,-13.100299999999834,-0.8125
440,000001.SZ,2020-02-14,1610.2427,1652.8187,1604.7843,1640.8101,1599.3258,41.48429999999985,2.5939
439,000001.SZ,2020-02-17,1641.9018,1677.9275,1629.8932,1677.9275,1640.8101,37.11740000000009,2.2621
@baijianyun12345: 只解决了部分问题,还是给你吧。。。
你最终是想表示成什么样的?
test1里保存着test0需要的代码段,还有开始和结束时间,想读取这部分时间的全部数据保存在加工
@baijianyun12345: 所有数据的 ts_code 都是一样的,依据什么合并 test0 和 test1 呢?
@wang_yb: 我的test0保存的是所有股票的历史数据,test1保存的是所有股票去年2020最高价和最低价的前100天开始和结束时间,我的设想提取test0里所有股票2020年最高价和最低价前100天的数据,不知道该怎么表示,不是要合并而是提取test0里每支股票去年最高价和最低价的前100天的数据,谢谢
@baijianyun12345: 你好,
看数据的例子,test1是每周收盘价,test0是每天交易的 4个 kline 价格,100天是从哪看出来的?
从test1似乎看不出有100天的间隔。
@wang_yb:
我不想回答他了,交给你了
提问的初始数据(test0 title 补充了逗号)
# test0
ts_code,trade_date,open,high,low,close,pre_close
002262,20200609,126.7688,139.26,126.7688,139.26,126.3
002262,20200608,120.5232,129.4696,117.4848,126.6,119.0884
002262,20200605,117.8224,122.2112,116.5564,119.0884,117.1472
002262,20200604,117.316,117.4848,115.206,117.1472,117.2316
002262,20200603,117.1472,119.1728,116.05,117.2316,117.316
002262,20200602,118.582,119.5948,115.8812,117.316,118.8352
002262,20200601,116.5564,119.6792,116.3876,118.8352,116.05
002262,20200529,114.8684,117.8224,114.5308,116.05,115.7124
002262,20200528,119.3416,119.3416,113.518,115.7124,119.426
002262,20200527,125.1652,125.5028,118.0756,119.426,125.9248
002262,20200526,125.2496,126.7688,122.8864,125.9248,125.0808
002262,20200525,118.0756,126.2624,116.9784,125.0808,117.4004
002262,20200522,121.1984,122.2112,116.7252,117.4004,120.6076
002262,20200521,117.316,124.2368,116.5564,120.6076,116.3032
002262,20200520,119.6792,121.2828,114.784,116.3032,119.5104
002262,20200519,121.2828,122.802,117.9068,119.5104,119.9324
002262,20200518,118.8352,121.6204,117.738,119.9324,119.004
# test1
ts_code,date_start,date_end,close
002262,20200602,20200608,170.0299
002262,20190519,20200528,85.5816
代码
res_df = pd.DataFrame()
res_list = []
df0 = pd.read_csv('./0.csv')
df1 = pd.read_csv('./1.csv')
for index, row in df1.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = df0.loc[(df0['trade_date']>_start)&(df0['trade_date']<_end)]
print(tmp)
res_list.append(tmp)
if res_list:
res_df = pd.concat(res_list)
res_df
@wang_yb: test1每个代码有两行数据,第一行是002262这个股票2020年最高价的开始date_start和结束date_end时间,第二行是002262这个股票2020年最低价的开始date_start和结束date_end时间,再往下就是下只股票2020的最高价和最低价的时间,我的意识是,以test1的股票代码还有开始和结束时间这三个数据定位test0我需要的时间段数据
@wang_yb: 还有这数据不是原始数据,原始数据太长发不出来,发的这个数据是为了便于理解才这么发的
test0就是全部股票历史数据一只股票结一只大家都好理解
test1:
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000002.SZ,2019-12-14,2020-03-23,3643.5146
000002.SZ,2020-08-19,2020-11-27,4864.8106
000004.SZ,2019-10-27,2020-02-04,80.1827
000004.SZ,2019-11-23,2020-03-02,181.9453
000005.SZ,2020-09-19,2020-12-28,23.0773
000005.SZ,2020-09-21,2020-12-30,23.0773
000005.SZ,2020-04-04,2020-07-13,30.6771
000006.SZ,2019-12-23,2020-04-01,156.9372
000006.SZ,2020-04-04,2020-07-13,333.3089
@韆: 你这个应该就是提问这个哥们需要的。
@baijianyun12345: 这个回复里面的 韆 这个哥们给的代码就是满足你的需求的。
打印出来的 tmp 就是test1 中每个时间段在 test0中的数据
@wang_yb: 问题就是不行啊,一只股票好像可以,可我的是每支股票2020年最高价和最低价前100天数据,把他的代码放进去可以执行,就是算很久,结果也不对,比如第一只,我想要的是test0里000001最低价2019-12-14到2020-03-23这段数据,和000001最高价,2020-08-23到2020-12-01这段数据,以此类推每支。。。。
@baijianyun12345: 兄弟,得把你的代码,数据示例,你的执行结果是哪不对等等这些都贴出来啊。
@wang_yb:
我用2只股票数据来实验的结果,前面都对
000001.SZ,2019-12-14,2020-03-23,1326.4034
000001.SZ,2020-08-23,2020-12-01,2226.5124
000560.SZ,2020-02-12,2020-05-22,15.9701
000560.SZ,2020-02-15,2020-05-25,15.9701
算000560就不对了,股票代码和时间就对不上了
247,000001.SZ,2020-11-27,2220.96,2220.96,2152.1102,2187.6456,2165.436,22.20959999999968,1.0256
246,000001.SZ,2020-11-30,2209.8552,2318.6822,2175.4303,2192.0875,2187.6456,4.44190000000026,0.203
441,000001.SZ,2020-02-13,1605.876,1624.4347,1594.9591,1599.3258,1612.4261,-13.100299999999834,-0.8125
440,000001.SZ,2020-02-14,1610.2427,1652.8187,1604.7843,1640.8101,1599.3258,41.48429999999985,2.5939
439,000001.SZ,2020-02-17,1641.9018,1677.9275,1629.8932,1677.9275,1640.8101,37.11740000000009,2.2621
第3行441,应该是000560.SZ,2020-02-13,可还是441,000001.SZ,2020-02-13,该换股票代码000560可还是000001
@baijianyun12345: 用的是 @韆 的代码吧?
@wang_yb:
最后是用的他的啊,前面就是循环读取全部股票数据,后面用他的就成了执行第一只截取时间要把所有时间都执行了,才运行第二只再运行所有时间,而我要的,000001只要是股票代码的相关的最高和最低时间段,以此内推,上面有我回给@韆 的代码就是全部代码,你如果看不到,我就再给你发一遍
@baijianyun12345: 兄弟,通过这个网页沟通太费劲了。。。 我也搞不定,看看有没有其他大佬能帮上吧
@wang_yb: 怎么了,我再给你发一遍嘛,确实编程有老师带还好,兴趣爱好还是费力,能帮就帮吧,不能也很谢谢你。
path = 'C:/Users/Administrator/Desktop/stock333/stock_hfq'
def get_stock_code_list_in_one_dir(path):
"""
从指定文件夹下,导入所有csv文件的文件名
:param path:
:return:
"""
stock_list = []
for root, dirs, files in os.walk(path):
if files: # 当files不为空的时候
for f in files:
if f.endswith('.csv'):
stock_list.append(f[:9])
#print(files)
return sorted(stock_list)
stockk_list = get_stock_code_list_in_one_dir(path)
all_stock=pd.DataFrame()
for code in stockk_list:
data = pd.read_csv(path + '/%s.csv' % code, header=0,encoding='gbk')
data['trade_date']=pd.to_datetime(data['trade_date'].astype('string'))
data.sort_values(by = 'trade_date',inplace=True)
df=pd.read_csv("C:/Users/Administrator/Desktop/bbb.csv")
print(data)
res_df = pd.DataFrame()
res_list = []
for index, row in df.iterrows():
_start = row['date_start']
_end = row['date_end']
tmp = data.loc[(data['trade_date']>_start)&(data['trade_date']<_end)]
print(tmp)
res_list.append(tmp)
if res_list:
res_df = pd.concat(res_list)
res_df
res_df.to_csv("C:/Users/Administrator/Desktop/ppp.csv",mode='a', index=False)