ORM in Python

Psycopg2 driver
https://pynative.com/python-postgresql-select-data-from-table/

SQL Alchemy
https://www.sqlalchemy.org/

PostgreSQL Setup
View at Medium.com

https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
Better to stay away from SQLAlchemy.
We are experiencing performance issues in our current project.

Python Tutorial
https://www.w3schools.com/python/python_classes.asp

PostgreSQL Vs MySQL
http://www.postgresqltutorial.com/postgresql-vs-mysql/

Python Performance Tuning with Numba

https://numba.pydata.org/

Numba makes Python code fast

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

Python – Performance Tuning with Nvidia and vectorize

Step 1:
Read this first
https://weeraman.com/put-that-gpu-to-good-use-with-python-e5a437168c01

Step 2: Install NVidia in Mac
https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html

>brew tap caskroom/drivers
>brew cask install nvidia-cuda

Then you also need to add the following to your file ~/.bash_profile:
export PATH=/Developer/NVIDIA/CUDA-9.0/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-9.0/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

Step 3: Try the following code

Before

import numpy as np
from timeit import default_timer as timer

def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()
    
#performance: 2.000642750179395
    

After

import numpy as np
from timeit import default_timer as timer
from numba import vectorize

@vectorize(['float32(float32, float32)'], target='parallel')
def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()
    
#performance: 0.4175510290078819    

—–
Reference:
https://github.com/numba/numba/issues/1898
https://stackoverflow.com/questions/38566367/installing-cuda-via-brew-and-dmg

Learn Python

https://www.tutorialspoint.com/python/index.htm

https://edu.openedg.org

Object-Oriented Programming: https://realpython.com/python3-object-oriented-programming/

Full Stack Python: https://www.fullstackpython.com/table-of-contents.html

10 Myths of Enterprise Python
https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-python/

Python Performance: https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/python.html

Python popularity is going up and Java popularity is going down
https://medium.freecodecamp.org/best-programming-languages-to-learn-in-2018-ultimate-guide-bfc93e615b35

Protect Python Code

How to protect Python Code?

Option 1: Convert Python into Binaries, which client can’t decompile.
http://cython.org/ – Free, Apache License

Article on Cython: https://medium.com/@xpl/protecting-python-sources-using-cython-dcd940bb188e

Option 2: Minify / Obfuscate
Minify: Remove empty lines, comments,…etc
Obfuscate: Convert human-readable names to nonreadable format.

List of Obfuscation software:
https://mnfy.readthedocs.io/en/latest/
https://github.com/QQuick/Opy

Article: https://www.smallsurething.com/how-to-obfuscate-python-source-code/


Article:
https://stackoverflow.com/questions/261638/how-do-i-protect-python-code

How to decompile “.PYC” Files?
https://stackoverflow.com/questions/5287253/is-it-possible-to-decompile-a-compiled-pyc-file-into-a-py-file


Option 3: Please add copyright and other file headers to get a legal edge.

Note: Combination of all above based on business need.

Connecting Hive with Python

In production environment we need to connect with multiple Hive Instances.

Option 1: Use ODBC for Python.
https://github.com/mkleehammer/pyodbc
This is not proved product to use.

Option 2: Use Pyhs2 driver.
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver
This helps to connect to only one instance of Hive.

Conclusion:
Stay away from connecting to Hive with Python in BigData production environment.
Better to decouple technology. Let Python write data to flat files. Load those files to Hive.

#bigdata, #python

Working with Python in Mac

How to install and work with Python in mac?

Step 1:
https://www.jetbrains.com/pycharm/download/#section=mac

Step 2: How to install missing module?
On bottom left side of PyCharm, click on computer icon and choose terminal.
select package in import statement. Press Alt+Enter, choose install package.
>sudo easy_install pip
Search for package here https://pypi.python.org/pypi
>sudo pip install package_name

#python