An automation algorithm for harvesting capital market information from the web

被引:13
作者
Agrrawal, Pankaj [1 ]
机构
[1] Univ Maine, Dept Finance, Orono, ME 04469 USA
关键词
Information retrieval; Worldwide web; Programming and algorithm theory; Capital markets;
D O I
10.1108/03074350910949790
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Purpose - The purpose of this paper is to develop an algorithm to harvest user specified information on finance portals and compile it into machine-readable datasets for quantitative analysis. Design/methodology/approach - The Visual Basic macro language in Microsoft Excel is applied to develop code that is not constrained by the single-query function of Excel. The core of the algorithm is built around the splitting of the URL connector line and the placement of a continuously updating variable into which are looped as many tickers as there are in the input list. The output is then written to non-overlapping cells. Findings - Numerical information placed on major finance websites can be harvested into structured machine-readable datasets by applying this algorithm. Research limitations/implications - One significant change in Microsoft Excel 2007 is that the worksheet is expanded from 224 to 234 cells, or to be more specific, from 256 (IV) columns x 65,536 rows (28 x 216) to 16,384 (XFD) x 1,048,576 (214 x 220). These new limits while allowing for a larger number of tickers, still constrain a single worksheet to 16,384 columns. For five fields per ticker that translates into roughly 3,200 ticker symbols. Practical implications - The algorithm extends user accessibility to websites that do not provide the facility of simultaneous downloading of information on multiple stock tickers. Furthermore, the procedure automates the downloading of multiple pieces of information (fields) and entire tables per ticker (record). Originality/value - An exhaustive literature search did not find any paper that discusses a multiple ticker algorithm for web harvesting.
引用
收藏
页码:427 / +
页数:13
相关论文
共 13 条
[1]  
Agrrawal P., 2007, I INVESTOR J FAL, P96
[2]   An Intertemporal Study of ETF Liquidity and Underlying Factor Transition, 2009-2014 [J].
Agrrawal, Pankaj ;
Clark, John M. ;
Agarwal, Rajat ;
Kale, Jivendra K. .
JOURNAL OF TRADING, 2014, 9 (03) :69-78
[3]   What Is Wrong with this Picture? A Problem with Comparative Return Plots on Finance Websites and a Bias Against Income-Generating Assets [J].
Agrrawal, Pankaj ;
Borgman, Richard .
JOURNAL OF BEHAVIORAL FINANCE, 2010, 11 (04) :195-210
[4]  
Benninga S., 2008, Financial Modeling, V3rd
[5]  
Corrado C., 2006, The Journal of Financial Research, V29, P95, DOI DOI 10.1111/j.1475-6803.2006.00168.x
[6]   The forecast quality of CBOE implied volatility indexes [J].
Corrado, CJ ;
Miller, TW .
JOURNAL OF FUTURES MARKETS, 2005, 25 (04) :339-373
[7]  
Davis E., 2003, STRATEGIC FINANCE, V85, P44
[8]  
Doug W., 2004, J REAL ESTATE PORTFO, V10, P129, DOI [https://doi.org/10.1080/10835547.2004.12089696, DOI 10.1080/10835547.2004.12089696]
[9]   Intraday price formation in US equity index markets [J].
Hasbrouck, J .
JOURNAL OF FINANCE, 2003, 58 (06) :2375-2399
[10]   An architecture for advanced services in cyberspace through data mining: A framework with case studies in finance and engineering [J].
Kim, SH .
JOURNAL OF ORGANIZATIONAL COMPUTING AND ELECTRONIC COMMERCE, 2000, 10 (04) :257-270