Python Performance Tuning with Numba

https://numba.pydata.org/

Numba makes Python code fast

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

Advertisements

Python – Performance Tuning with Nvidia

Step 1:
Read this first
https://weeraman.com/put-that-gpu-to-good-use-with-python-e5a437168c01

Step 2: Install NVidia in Mac
https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html

>brew tap caskroom/drivers
>brew cask install nvidia-cuda

Then you also need to add the following to your file ~/.bash_profile:
export PATH=/Developer/NVIDIA/CUDA-9.0/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

Step 3: Try the following code

Before

import numpy as np
from timeit import default_timer as timer

def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()
    
#performance: 2.000642750179395
    

After

import numpy as np
from timeit import default_timer as timer
from numba import vectorize

@vectorize(['float32(float32, float32)'], target='parallel')
def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()
    
#performance: 0.4175510290078819    

—–
Reference:
https://github.com/numba/numba/issues/1898
https://stackoverflow.com/questions/38566367/installing-cuda-via-brew-and-dmg

Learn Python

https://www.tutorialspoint.com/python/index.htm

https://edu.openedg.org

Object-Oriented Programming: https://realpython.com/python3-object-oriented-programming/

Full Stack Python: https://www.fullstackpython.com/table-of-contents.html

10 Myths of Enterprise Python
https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/

Python Performance: https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/python.html

Python popularity is going up and Java popularity is going down
https://medium.freecodecamp.org/best-programming-languages-to-learn-in-2018-ultimate-guide-bfc93e615b35

Protect Python Code

How to protect Python Code?

Option 1: Convert Python into Binaries, which client can’t decompile.
http://cython.org/ – Free, Apache License

Article on Cython: https://medium.com/@xpl/protecting-python-sources-using-cython-dcd940bb188e

Option 2: Minify / Obfuscate
Minify: Remove empty lines, comments,…etc
Obfuscate: Convert human-readable names to nonreadable format.

List of Obfuscation software:
https://mnfy.readthedocs.io/en/latest/
https://github.com/QQuick/Opy

Article: https://www.smallsurething.com/how-to-obfuscate-python-source-code/


Article:
https://stackoverflow.com/questions/261638/how-do-i-protect-python-code

How to decompile “.PYC” Files?
https://stackoverflow.com/questions/5287253/is-it-possible-to-decompile-a-compiled-pyc-file-into-a-py-file


Option 3: Please add copyright and other file headers to get a legal edge.

Note: Combination of all above based on business need.

Connecting Hive with Python

In production environment we need to connect with multiple Hive Instances.

Option 1: Use ODBC for Python.
https://github.com/mkleehammer/pyodbc
This is not proved product to use.

Option 2: Use Pyhs2 driver.
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver
This helps to connect to only one instance of Hive.

Conclusion:
Stay away from connecting to Hive with Python in BigData production environment.
Better to decouple technology. Let Python write data to flat files. Load those files to Hive.

Working with Python in Mac

How to install and work with Python in mac?

Step 1:
https://www.jetbrains.com/pycharm/download/#section=mac

Step 2: How to install missing module?
On bottom left side of PyCharm, click on computer icon and choose terminal.
select package in import statement. Press Alt+Enter, choose install package.
>sudo easy_install pip
Search for package here https://pypi.python.org/pypi
>sudo pip install package_name