Python Data Science and the Rocket MultiValue Database ( Part 2 of 3 )

In “Python Data Science and the Rocket MultiValue Database ( Part 1 of 3 )“ I provided an introduction to Numpy, and showed how to convert a Numpy array to a u2py.DynArray.

In this section I will go a bit further and show:

  • How to write the numpy data to a Rocket MultiValue file
  • How to read back the MultiValue data, and instantiate a Numpy array
  • Introduce you to Pandas
  • How to move a Numpy array to a Pandas Data Frame

How to write the Numpy data to a Rocket MultiValue file

Before we can begin, we need to create a Rocket MultiValue file, and create some dictionaries.  Note for brevity, I will copy dictionaries items into the account rather than creating them by hand.  ( In part three of the series, I complete our journey on creating a Python class for persisting your data science objects into the MultiValue Database )

UniData Example:

CREATE.FILE U2DS 3 11
Create file D_U2DS, modulo/3,blocksize/1024
Hash type = 0
Create file U2DS, modulo/11,blocksize/1024
Hash type = 3
Added "@ID", the default record for UniData to DICT U2DS.
:COPY FROM DICT VOC TO DICT U2DS F1 F2 F3 F4
4 records copied

Note that you will need to modify the new dictionary items to be defined as MultiValued.  ( Change attribute 7 from S to M )

Now that you have a file, let’s go into Python, and build a Numpy Array then store it to the MultiValue File.

: PYTHON
python> import u2py
python> import numpy as np
python> import pandas as pd

Start with a Numpy Array ( built with simple sample data )

python> theData = [ [ 101, 102, 103, 104 ], [ 201, 202, 203, 204], [ 301, 302, 303, 304 ], [ 401, 402, 403, 404 ] ]
python> theData
[[101, 102, 103, 104], [201, 202, 203, 204], [301, 302, 303, 304], [401, 402, 403, 404]]
python> myArray = np.array( theData )
python> myArray
array([[101, 102, 103, 104],
[201, 202, 203, 204],
[301, 302, 303, 304],
[401, 402, 403, 404]])

We can modify our 4×4 Numpy Array prior to persisting it to our MultiValue database.

python> np.transpose(myArray)
array([[101, 201, 301, 401],
[102, 202, 302, 402],
[103, 203, 303, 403],
[104, 204, 304, 404]])

For our example we will put the transposed array data back into a Python nested list

python> asNestedList = np.transpose(myArray).tolist()
python> asNestedList
[[101, 201, 301, 401], [102, 202, 302, 402], [103, 203, 303, 403], [104, 204, 30
4, 404]]

Here I’ll write the data to the MultiValue file.

Since the u2py.DynArray can be instantiated from a Python nested list, we can create a dynamic array, and store it in the file we created earlier.

python> rec = u2py.DynArray(asNestedList)
python> rec
<u2py.DynArray value=b'101\xfd201\xfd301\xfd401\xfe102\xfd202\xfd302\xfd402\xfe1
03\xfd203\xfd303\xfd403\xfe104\xfd204\xfd304\xfd404'>
python> file = u2py.File("U2DS")
python> file.write("mike", rec)

Now I’ll verify the data made it to the file.


u2py.run("LIST U2DS F1 F2 F3 F4")
LIST U2DS F1 F2 F3 F4 10:48:56 Jul 06 2018 1
U2DS...... F1........ F2............. F3............. F4.............

mike       101        102             103             104
201        202             203             204
301        302             303             304
401        402             403             404
1 record listed

How to read back the MultiValue data, and instantiate a numpy array

The next step in our example is to extract the data from the MultiValue database for use in more Data Science Processing.

python> myDynArray = file.read("mike")
python> myDynArray
<u2py.DynArray value=b'101\xfd201\xfd301\xfd401\xfe102\xfd202\xfd302\xfd402\xfe1
03\xfd203\xfd303\xfd403\xfe104\xfd204\xfd304\xfd404'>
python> myNestedList = myDynArray.to_list()

As mentioned earlier, you can instantiate a numpy array from a nested list.

python> npArray = np.array(myNestedList)
python> npArray
array([['101', '201', '301', '401'],
['102', '202', '302', '402'],
['103', '203', '303', '403'],
['104', '204', '304', '404']],
dtype='<U3')

Introduction to Pandas

Pandas is an open source Python module used in Data Science.  It can easily import data into an easy-to-use data structure which allows you to perform operations on large data sets.

Since we have started our discussion with Numpy Arrays, we will instantiate our Pandas Data Frame from the Numpy Array:

Note that while numpy handles the array of information, Pandas allows you to define the column headers.

python> pdDataFrame = pd.DataFrame(npArray, columns=['f1','f2','f3','f4'])
python> pdDataFrame
f1   f2   f3   f4
0  101  201  301  401
1  102  202  302  402
2  103  203  303  403
3  104  204  304  404

You now have a Pandas Data frame to examine.  Note that the Numpy array is just the values portion of the Data Frame, and can be used the same as the numpy array, and return a nested Python List.

python> pdDataFrame.values
array([['101', '201', '301', '401'],
['102', '202', '302', '402'],
['103', '203', '303', '403'],
['104', '204', '304', '404']], dtype=object)

Note that we can also get the values as a Nested List:

python> pdDataFrame.values.tolist()
[['101', '201', '301', '401'], ['102', '202', '302', '402'], ['103', '203', '303
', '403'], ['104', '204', '304', '404']]

We can also extract the column names in the same way:

python> pdDataFrame.columns.tolist()
['f1', 'f2', 'f3', 'f4']

In “Python Data Science and the Rocket MultiValue Database ( Part 3 of 3 )“ I will show some of the things you can do with Pandas, and create a simple object for managing the storage and retrieval of the data to a Rocket MultiValue Database.

Michael Rajkowski

Michael Rajkowski 7 Posts

As a member of the Rocket MultiValue Support organization, Michael has worked with MultiValue for over 25 years in numerous professional roles. He is especially fascinated with the areas of MultiValue that intersect with other technologies. He recently moved to Irvine, where he can not only expand his MultiValue expertise to include the Rocket D3 product family, but also where he, his wife and son can enjoy being closer to Disneyland.

0 Comments

Leave a Comment

Your email address will not be published. Required fields are marked *