At Findmypast we are currently engaged in a major rearchitecture process, moving from an architecture of a small number of tightly coupled C# servers to a network of Elixir microservices. One of the big gains we hope to achieve from this is to make our database architecture easier to modify and scale by splitting out our huge and unwieldy Microsoft SQL Server databases into smaller units. Each unit will be specialised for a specific task and accessed solely by one microservice.
To migrate live databases with zero downtime our plan is to design our services to write to both the legacy and new databases at once but continue to read from legacy. Then, once we are satisfied that data and responses will be consistent from the new database, reads will be switched to happen from it and eventually the dual write switched off and the deprecated legacy tables archived. This means our Elixir servers need to be able to communicate with both Microsoft SQL Server and the open source database technologies (primarily PostgreSQL) we will be migrating to using roughly the same interface. Ecto is perfect for this but there is a major stumbling block: there is no Ecto 2.0 ready adapter for SQL Server.
So we are writing our own database connection layer and Ecto adapter.
We could have tried patching the existing TDS adapter
but we were wary of delving into a native Elixir TDS implementation that has
been stale for a year. We wanted to use Microsoft’s own ODBC driver for Linux
to benefit from first-party support and not have to maintain a binary protocol
implementation. So we decided to base our adapter around OTP’s existing :odbc
application (this application turned out to be lacking in some glaring areas but
that’s a story for another time).
We hope this series of blog posts will both document our process and provide an informal resource for other people wishing to implement new Ecto adapters.
Dependencies
The Erlang ODBC application is usually not installed by default, so we had to
go through our respective package managers to find erlang-odbc
or similar.
This package usually depends unixodbc
, the ODBC driver manager for Linux, but
if it didn’t we’d have to install that as well.
Finally, we needed the latest SQL Server driver for Linux, which we got from Microsoft’s repositories. Note that the driver isn’t open source. We don’t have a problem with this but if you’re a purist and still want to follow along the FreeTDS ODBC Driver is a serviceable open source alternative (although I cannot guarantee all the example code will work with no modifications).
Establishing a DBConnection
DBConnection
is a
library that implements much of the logic needed for handling database
connection pooling. The Ecto SQL
adapter base which we will be using to
build out SQL Server adapter expects the connection protocol to implement
the DBConnection
behaviour, so that was our starting point.
Starting from the start, the first callback we need to implement is connect/1
.
This function should establish a connection to the database and return a
reference to it in the state
object so it can be used throughout the
lifetime of the connection. It has the following signature:
@spec connect(opts :: Keyword.t) ::
{:ok, state :: any} |
{:error, Exception.t}
To connect to a database using the :odbc
application, the relevant function
is :odbc.connect/2
. It accepts a connection string (a charlist in Elixir terms,
of course) that will be parsed by the ODBC driver manager. We can look at the
SQL Server documentation for the format of this string, and find:
Driver={SQL Server};
Server=myServerAddress;
Database=myDataBase;
Uid=myUsername;
Pwd=myPassword;
All of these should be configurable, but for ease of use we provide some defaults:
[
{"DRIVER", opts[:odbc_driver] || "{ODBC Driver 13 for SQL Server}"},
{"SERVER", opts[:hostname] || System.get_env("MSSQL_HST") || "localhost"},
{"DATABASE", opts[:database] || System.get_env("MSSQL_DB")},
{"UID", opts[:username] || System.get_env("MSSQL_UID")},
{"PWD", opts[:password] || System.get_env("MSSQL_PWD")}
]
|> Enum.map_join(fn {key, value} ->
"#{key}=#{value};" end)
|> to_charlist
Having the connection parameters default to environment variables allows the same application to target different databases depending on how it’s deployed. This is very useful for things like integration v production environments.
:odbc.connect/2
also accepts a keyword list of options for the Erlang :odbc
application itself. Of interest to us are:
:tuple_row
: Defaults to:on
and rows are returned as tuples. This is very awkward (and not what Ecto expects). We’ll turn it:off
to get our rows as lists. This will enable us toEnum.map/2
over them, amongst other things.:extended_errors
: Turning this:on
returns more information on errors, which we can use to catch and handle specific exceptions.:binary_strings
: You’d think turning this:on
and communicating with the database in binaries instead of charlists would make things easier. However, it means you have to manage character encodings instead of:odbc
doing it for you, so we’re leaving it:off
for now.
So that leaves our function as:
def connect(opts) do
conn_opts = opts
|> Keyword.put_new(:tuple_row, :off)
|> Keyword.put_new(:extended_errors, :on)
conn_str = [
{"DRIVER", opts[:odbc_driver] || "{ODBC Driver 13 for SQL Server}"},
{"SERVER", opts[:hostname] || System.get_env("MSSQL_HST") || "localhost"},
{"DATABASE", opts[:database] || System.get_env("MSSQL_DB")},
{"UID", opts[:username] || System.get_env("MSSQL_UID")},
{"PWD", opts[:password] || System.get_env("MSSQL_PWD")}
]
|> Enum.map_join(fn {key, value} ->
"#{key}=#{value};" end)
|> to_charlist
:odbc.connect(conn_str, conn_opts)
end
It works! We can call DBConnection.start_link
on our module and get a ref to
the process. There’s a bit of a problem, however. A line in the :odbc.connect/2
documentation should worry us:
The connection is associated with the process that created it and can only be accessed through it
You might think that’s OK. Surely the DBConnection
process will own
communication with :odbc
, just as intended. The DBConnection
docs explain
why this assumption is incorrect:
DBConnection
handles callbacks differently to most behaviours. Some callbacks will be called in the calling process, with the state copied to and from the calling process.
What this means is we cannot rely on the logic inside DBConnection
callbacks
being executed in the same process that connect/1
was executed in. Our :odbc
functions will return {:error, :process_not_owner_of_odbc_connection}
. Sad!
In the next post we’ll cover wrapping :odbc
in a GenServer
to keep the calls
in a single process while still having the bulk of the logic in DBConnection
callbacks. See you soon!