Connecting Hive with Python

In production environment we need to connect with multiple Hive Instances.

Option 1: Use ODBC for Python.
https://github.com/mkleehammer/pyodbc
This is not proved product to use.

Option 2: Use Pyhs2 driver.
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver
This helps to connect to only one instance of Hive.

Conclusion:
Stay away from connecting to Hive with Python in BigData production environment.
Better to decouple technology. Let Python write data to flat files. Load those files to Hive.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s