Python: Extending the extension
In my last post, Extending MV Basic by Calling Python, I showed how we can extend the functionality of MV Basic by calling Python functions. That alone is a nice feature to have but in this post I want to go one step further and show how we can extend Python itself to achieve some interesting results.
The last post showed how to extend the functionality of MV Basic by calling a Python function. In this post, I’ll use a popular Python package to extend what Python can do and then use that extended Python to extend the functionality of MV Basic to achieve a result that would be much more difficult and time consuming to do using only MV Basic.
If you like, we’re going to be extending the extension!
To get this underway, let’s pose a problem.
As part of a fictional application we are building, we’d like to get access to company stock prices but we don’t have access to any of the web services that can return that information. However, we know that we can access that same data via the stock exchange web site.
The information we need is available on the stock exchange web site but how can we access that?
Web pages as web services
Let’s do a quick review of web services.
The web service client, or consumer, makes a request to the web service provider which then responds with the information requested in either XML or JSON, or perhaps an error if it was not available. The web service provider typically runs on a web server and the protocol used to access it is HTTP.
This architecture is identical to what is used when we browse the internet with a web browser. One difference is that a web service returns data in either XML or JSON format but browsing a web page returns the web page in HTML. Despite that difference, the architecture is the same and the major pieces of the two are the same.
Therefore, something that may not be immediately obvious is that normal web pages are actually a form of web service. Once we come to realise that, we can view normal web sites and pages as web services which carry information that can be used.
The next problem…
Of course, as is often the case, once we solve one problem we move straight into the next! In this case, our next problem is that when we access a web page for information, instead of receiving nicely formatted XML or JSON as we would with a web service, we will receive HTML. The data we want will be wrapped up in a lot of additional HTML that is of no use to us and will just make it harder to get to the data we want. Navigating our way through that HTML and getting to the data we want, is the crux of this post!
Getting the data
In our fictional application, we want to get the price of some company stocks on the Australian Stock Exchange (ASX) and have now realised that the data we want is available on the ASX web site. Here is a sample of the page, showing the price of some bank stocks:
The web page has many elements on it and not surprisingly the HTML for the page is quite complex and verbose. The data we want is in there somewhere but how can we access it? This is where Python can really help.
A beautiful soup
The blob of HTML for that web page really is soup. Beautiful soup but soup just the same! The HTML holds the data we want (stock prices) but it is so verbose that parsing it to get at the data is quite a challenge. Luckily we have a tool available to help.
Core Python does not have a simple way of parsing that HTML and extracting the data we need, so we need to extend Python using another library.
BeautifulSoup (https://pypi.org/project/beautifulsoup4/) is a popular Python package that makes it easy to parse the HTML for a web page and scrape information from that page. The project description says:
“Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.”
So easy in fact, that the following snip is all the Python code needed to parse the HTML from the ASX web site and extract pricing information from that table of stock prices
pageSoup = BeautifulSoup(response.read(), 'html.parser')
for tr in pageSoup.find('table', class_='datatable').find_all('tr')[1:]:
tdVals = tr.find_all('td')
Let’s look more closely at this code.
pageSoup = BeautifulSoup(response.read(), ‘html.parser’)
This line creates a BeautifulSoup object called pageSoup that holds a parsed representation of the HTML of the web page
- The first parameter supplies the HTML of the page which in this case is being retrieved from a call to the web page (response.read())
- The second parameter instructs BeautifulSoup to parse the page using an HTML parser.
for tr in pageSoup.find(‘table’, class_=’datatable’).find_all(‘tr’)[1:]:
This line sets up a for loop which loops across all rows (tr elements), starting from the second row, of a table with the class ‘datatable’. There is only one table with that class in the web page and that is the table of stock price results – the one we want.
tdVals = tr.find_all(‘td’)
For each row looped over by the previous line of code, this line will extract all the table data cells (td elements), thus providing all the data from that row.
Parsing the HTML and extracting data using MV Basic or core Python would be a difficult and involved task but by using BeautifulSoup, it becomes quite simple to do.
Getting and parsing
The snip of code shown above is from a Python function called getStockPrices that does two jobs:
- It calls the web page and returns the HTML from that web page.
- It parses the HTML from the web page and returns a Python list of dictionary objects, each of which has the stock code, buy price and sell price.
Here is the source code for the function:
On line 21, the (ASX) web page is called by the request class from the urllib module (https://docs.python.org/3/library/urllib.html), which is one way that Python can request web pages.
Line 23 is where the parsing, as described earlier, is started.
Line 26 is where the composing of the list of data dictionaries is done.
This is a simplified function but hopefully it shows how straightforward both the urllib and BeautifulSoup make the job of calling a web page and then scraping data from it, in effect treating the web page as a web service.
Testing the function
Let’s test the function directly at the Python interpreter. The function is in a Python module called stocks so we need to import that module. Here is a run of the test straight from the Python interpreter:
In the second line, the function is called on to get the prices for NAB, WBC and CBA. It will return a list of data objects (Python dictionaries) with each object containing the stock code, buy price and sell price for one of the requested stocks.
The third and fourth lines loop through the list and print out these data objects and as we can see from the output of the print, the function has worked and retrieved the data from the ASX web site.
Running it from MV Basic
As we saw in the last post, the extensions to MV Basic allow us to run Python functions from MV Basic. To show that, I have a Basic programme called STOCKVALUES and here is the source:
In line 20, the Python function getStockPriceArray (from the module stocks) is called. That function is shown below and I won’t describe it except to say that it, in turn, calls getStockPrices and turns the returned values into a dynamic array string which is returned to MV Basic. Here is the function listing:
Running STOCKVALUES will call getStockPriceArray and that in turn will call getStockPrices to retrieve the prices. getStockPriceArray will return the stock data in a dynamic array and from there, the STOCKVALUES MV Basic program simply loops through the returned array and prints the data for each stock.
Note that the list of stocks to retrieve data for is kept in the UniVerse file STOCKCODES and that is why we do a select on that file near the beginning of the STOCKVALUES programme.
Running the STOCKVALUES program
OK, enough explanations – let’s see it run!
Here is the output of running the STOCKVALUES program from TCL:
YAY! We have successfully extended Python to allow us to treat the stock exchange web site as a web service, providing the stock price data we need.
This post has built on the last post to show how we can use the extensible nature of Python to achieve results that would be difficult using core Python or MV Basic. I have tried to keep this as simple as possible, given that there is quite a lot going on here.
Web page scraping may or may not be something you need to do but hopefully I have been able to show that there is a lot of power available when you start extending the Python core with some of the many packages available for Python. That extended Python can then be used to extend MV Basic and hence we are extending the extension!
I hope I have encouraged you to explore Python some more and try some of the extensions for yourself. If you do, please send me an email (firstname.lastname@example.org) and let me know how you get on – I’d love to hear from you!