The other day I needed to do some data analysis from a Microsoft Access file on my FreeBSD box. Here are the steps that I took.

Goal: load data from Access into Python to perform data analysis

Library/program used:

UnixODBC

UnixODBC is ODBC an implementation on Linux/Unix. Installing it is quite easy, we can simply download the source code, run ./configure, make and make install.

MDBTools

To actually connect to a database using ODBC, we’ll also need a driver for that database. The ODBC driver for Access provided by Microsoft doesn’t have a Linux installation, so we’ll need to find a third-party driver.

There are paid libraries, such as the one from Easysoft (closed source but may have better quality). Luckily, I found one open-source driver from MDBTools. MDBTools is a collection of CLI tools to interact with an Access database, and it also contains an ODBC driver. We’ll install this driver.

The source code is at https://github.com/mdbtools/mdbtools

Installation of MDBTools is quite easy, we just need to add the flag --with-unixodbc and set the path to our unixODBC installation. If we installed unixODBC in a non-standard path(e.g.,$HOME), we also need to set CPPFLAGS and LDFLAGS so that MDBTools knows where to find the library and header files.

Config unixODBC

Now we’ll need to config unixODBC so that it knows about our Access Driver, there are two files that we need:

  • odbcinst.ini (in etc folder): contains the configuration of ODBC drivers
  • odbc.ini (in etc folder): contains the configuration of Database

We’ll add Access driver to odbcinst.ini as follows:

[MDBTools]
Driver = <path to mdbtools installation>/lib/odbc/libmdbodbc.so

odbc.ini is where we configure our database. For example, if I have an Access file named test.accdb, I will configure it as follows:

[test]
Driver = MDBTools
Database = <path to our database file>/test.accdb

After this, we can use unixODBC to connect to access, for example using its isql command:

isql test
> select * from test_table

Connect from Python

I also wanted to load Access data into Python so I used pyodbc to do that. It’s a wrapper for unixODBC so we’ll need unixODBC installed before we can actually run pyodbc.

To check if pyodbc can actually load unixODBC, we need to import it:

# pip install pyodbc
import pyodbc

If there is an error such as ImportError: Shared object "libodbc.so.2" not found, required by "pyodbc.cpython-310.so", we need to check our unixODBC installation. If we installed it in a non-standard folder, we can set the LD_LIBRARY_PATH environment variable to our unixODBC’s lib folder.

After properly import pyodbc, we can read our Access database as follows:

import pyodbc
file_name = "<path to access file>"
con = pyodbc.connect(f'DRIVER=MDBTools;DBQ={file_name}')
cursor = con.cursor()
cursor.execute('SELECT * FROM test_table').fetchone()

# we can even load our Access data into a Pandas dataframe
import pandas as pd
df = pd.read_sql('SELECT * FROM test_table', con)

Conclusion

Although I can connect and load data from Access, sometimes I get some weird errors from pyodbc (or MDBTools). My suggestion is to save the Access data to other natively supported database systems on Linux, such as SQLite, as soon as we can load it. It’s easy to do that from pandas.

import pyodbc
import sqlite3
import pandas as pd

access_con = pyodbc.connect('DRIVER=MDBTools;DBQ=test.accdb')
sqlite_con = sqlite3.connect('test.sqlite3')

df = access_con.read_sql('SELECT * FROM test_table', access_con)
df.to_sql('test_table', sqlite_con)