Access DB2 From Databricks

This took me a good few hours to figure out. So hopefully it will help you and my future self.

  • install com.ibm.db2.jcc:db2jcc:db2jcc4 on your cluster from maven
  • Get your license file dir (this is a whole process in itself)
  • From your license info, copy the jar file (mine is like db2jcc*.jar) up to databricks using databricks-cli.
    • I copied them to a tmp dir and then moved them to /dbfs/FileStore/jars/maven/com/ibm/db2/jcc/license from a notebook, but that might not be necessary
    • You might also have to copy the .lic files into the same dir, but, again, I haven't validated that.
  • install that jar on your cluster as a library
  • restart your cluster

Then you can run this (python) code:

connection_string = 'jdbc:db2://{host}:{port}/{database}:currentSchema={schema};database={database};user={username};password={password};'.format(  
    host=host, 
    port=port,
    schema=default_schema,
    database=database, 
    username=username, 
    password=password
)

rdd = spark.read.format("jdbc") \  
    .option('url', connection_string) \
    .option('driver', 'com.ibm.db2.jcc.DB2Driver') \
    .option('dbtable', 'my_table') \
    .load()

display(rdd)  

Hurrah!