Excel VBA Web Scraping Tutorial
A couple months ago I published an article on UK industry analysis–based on first principles analysis using historical FTSE share price data. At the end of this article is the code I used to extract Yahoo! share price data from their website.
You can follow the guidance below for instructions on how to perform the scrape and cache the data; but this video provides a quick demonstration of the tool.
To utilize the code at the end of this article follow the steps below (or just download the EXCEL FILE itself and take a look).
- Open Excel, name a sheet ‘l_finance’ and into it paste a list of ticker symbols starting in cell A1 (no blank rows!) whose historical share price data you want to extract.
- Next hit Alt + F11 to open the VBA editor.
- Paste all of the code below into the editor and click F5 to run the subroutine.
- After hitting F5, wait a minute or two (don’t mess with Excel while the code is working!) and your data is cached in a workbook called ‘yhcacheYYYYMM.xlsx’ (in the same folder as your workbook you pasted the code into).
- You can then graph it, put it in pivot tables, move to Access, SQL Server, SAS, or other analysis tool and analyse.
- Below is what your screen should look like after pasting the code into Excel.
The routine above basically loops through a list of given symbols, and for each symbol it opens the relevant Yahoo Finance historic share price page, retrieves the associated price data for the company, then fires it into a data cache (i.e. a worksheet) in Excel. Once the routine was up and running it took about two minutes to run… not bad, I thought, for 5yrs of monthly closing share price data for 100 companies. When I get more time I will use this basic web scrape technique to analyse a broader range of industry share price trends.
The article showing share price trends across all UK FTSE100 industries for the past five years is located HERE. The exercise used monthly share price data from Yahoo Finance. That data is free, courtesy of Yahoo, but it is only retrievable one company at a time on their website.
Excel Web Scraper Code: Yahoo Finance Data
Sub d_yhprices() 'routine for executing repeated web queries on yahoo finance webpage to get 5yrs monthly data for a defined ticker list Dim rg1, rg2, rg3 As Range Dim bk1, bk2, bk3 As Workbook Dim sht1, sht2, sht3 As Worksheet Dim qtb As New QueryTable Dim url1, path1, tkr As String Dim r1, r2, r3 As Long Dim t2 As Date 'initiate ticker range and cache repository workbook Set sht1 = ThisWorkbook.Sheets("l_finance") r1 = sht1.Range("a1").End(xlDown).Row Set bk3 = Workbooks.Add Set sht3 = bk3.Sheets(1) 'loop through ticker symbols to extract and cache data For i = 2 To r1 tkr = sht1.Range("a" & i) url1 = "http://finance.yahoo.com/q/hp?s=" & tkr & "&a=00&b=2&c=1980&d=11&e=17&f=2014&g=m" 'note the date range and periodicity on the end of this string Set bk2 = Workbooks.Add Set sht2 = bk2.Sheets(1) 'extract web table to temp bk Set qtb = sht2.QueryTables.Add(Connection:="URL;" & url1, Destination:=Range("$A$1")) qtb.WebFormatting = xlWebFormattingNone qtb.WebTables = "15" qtb.Refresh BackgroundQuery:=False 'Application.Wait (Now + #12:00:05 AM#) r2 = sht2.Range("a1").End(xlDown).Row sht2.Range("a:a").Insert sht2.Range("a1") = "ticker" sht2.Range("a2:a" & r2) = tkr 'export to cache then discard bk2 sht2.Range("a1").CurrentRegion.Copy r3 = sht3.Range("a1048576").End(xlUp).Row + 1 sht3.Range("a" & r3).PasteSpecial xlPasteValues Application.CutCopyMode = False bk2.Close False Next i path1 = ThisWorkbook.Path & "yhcache" & Format(Date, "yyyymm") & ".xlsx" bk3.SaveAs (path1) bk3.Close MsgBox ("finished caching stock price data!") End Sub
Did you try this out? How did it go? Do you know how to do this properly using C# or vb.net?… feel free to share your comments below.