As a MongoDB Partner, we were excited when late last year MongoDB released their own driver in alpha for Google’s Go programming language. How easy is it to migrate existing large data(base) projects? How does it compare to the de-facto standard — mgo? We take an in depth look by taking you through one of our recent data migration projects; Authinity, our identity management service.
What is a database driver?
A database driver is an adapter, usually in the form of a software library or program, that provides an interface to allow you to easily connect to a database. It usually handles the underlying connection pool and will handle all communication with the database server in the required protocol.
What is Authinity?
Authinity is a modern identity management service built by Avco Systems. We decided to build Authinity when we discovered that no existing off-the-shelf product or service seemed to satisfy our clients’ needs for always up-to-date, best practice security coupled with flexible policies to meet their own compliance requirements. Authinity is used for authentication in our own internal suite and currently available to all to try for free. For more information on Authinity, please visit https://www.authinity.com/.
mgo — if it ain’t broke, don’t fix it?
mgo is an open source library for working with MongoDB databases in Go that has been around since 2011. Prior to mongo-go, mgo was somewhat the only choice when it comes to MongoDB in Go — but it does not disappoint. For a community maintained project, it does a very good job of providing necessary reliability and functionality for both small and large projects that wish to use MongoDB and Go. mgo has good godocs and community support, and is widely adopted. It is very much suitable for production use in commercial environments and has worked flawlessly for us in Authinity to date.
So why the change?
The original developer of mgo formally paused working on it in early 2018 and since then that repository has been dormant. A couple of forks have appeared since then — the most popular of which being globalsign/mgo. This is still maintained with bug fixes and minor changes, but no significant additional development has taken place since it was forked. Furthermore, MongoDB produced a specification for features of MongoDB drivers in 2015, which mgo does not meet, making it somewhat inconsistent with MongoDB’s official drivers for other languages. Inevitably, the development of any unofficial, community driven library will be slow to react to new server functionality introduced by MongoDB — whereas an official one will be developed in tandem with new server functionality. In addition, you can have much confidence in the stability and integrity of the official driver from MongoDB, written and maintained by the people who know the most about the technology, and joining their ever growing list of drivers they have developed and maintain. You can read more on MongoDB’s decision to build their own official driver in their own blog post here.
As more time passes, we expect a shift to the official mongo-go-driver that is likely to leave mgo even more dormant and unmaintained — perhaps stuck in the past.
Migrating your application to MongoDB Go
You shouldn’t be worried if your project is still using mgo at the moment — there will be many projects in this position. However, it is worth considering and planning for your transition to the official driver, including determining the effort that will be required for you to do so. We have decided to switchover sooner rather than later for some important reasons:
- for both new and existing systems, we want to ensure we are not using any library that may be approaching end-of-life; this helps us avoid any potential maintainability and/or security issues in the future
- as a MongoDB Partner, adopting early and feeding back to MongoDB will help discovery and maturity of the driver
- we love being at the cutting edge of technology whenever possible
- we have a newly released product (Authinity) with a small but friendly user base, which provides the perfect environment for trying something new
Here are some challenges and consideration we faced moving our Authinity (cloud self service identity management) application from mgo to the official MongoDB Go driver.
Code structure
As digital transformation specialists, we are well aware of the challenges that can arise when writing code that is reliant on a specific technology. When writing code, it is best practice to always try to ensure that you are not tightly coupled to any dependency, including the chosen database technology. When we began writing Authinity, this best practice approach was taken. We knew that we wanted to use MongoDB initially, but we had to approach this in a way that would mean if we wanted to switch at a later date it would not cause major problems and far reaching changes throughout the codebase. If you haven’t taken steps in your project to be technology agnostic with your database, then you are likely to have to make a larger amount of changes throughout all parts of your codebase for this migration. This increases the magnitude of the task instantly. You should consider taking an approach going forward that will ease any future transitions you may need to make in this area. Ensuring you are not tightly coupled to a specific data technology dependency is almost always achieved with some form of data access layer (DAL). The approach taken in Authinity for a DAL is a set of well defined interfaces that encompass any database specific code. For example, our organisation store interface is:
1// Store defines store behaviour for an organisation
2type Store interface {
3 storage.StoreBase
4 Get(id id.ID) (*Organisation, error)
5 GetByName(name string) (*Organisation, error)
6 UpdateByID(id id.ID, update *Organisation) (*Organisation, error)
7 Exists(name string) (bool, error)
8 Insert(o *Organisation) error
9}
Keeping the database specific implementations behind these interfaces means it only requires a new implementation of these interfaces to switch to an entirely different technology.
With this code structure, you would expect that switching to a different library of the same database technology to be even easier than switching technologies. When migrating to mongo-go, rather than writing new implementations of the interfaces initially we preferred the approach of trying to make the existing MongoDB implementations of our DAL interfaces work with mongo-go directly. After all, a lot of the concepts, and therefore code, should be the same or very similar, and it should all translate well. However, within our Mongo specific layer, another feature of our code structure made this somewhat tricky.
“When it comes to switching from mgo to mongo-go any wrappers you have around the structs need to be modified or perhaps even removed.”
In order to achieve good quality unit tests that test a single unit and ensure the tests do not leak into third party dependencies, we had previously wrapped all of the mgo structs that we would be using, including mgo.Database
, mgo.Collection
, mgo.Query
and mgo.Session
. The wrappers around these are then able to be mocked in their entirety and used very easily in unit tests. A limitation of doing this is that these set of wrappers become tightly coupled with mgo. When it comes to switching from mgo to mongo-go any wrappers you have around the structs need to be modified or perhaps even removed. Fortunately, mongo.Database
and mongo.Collection
are very alike the old mgo.Database
and mgo.Collection
— so modifications here were, for the most part, straightforward. However, mgo.Query
is a rather different story and somewhat tricky to keep the concept of this whilst using the new mongo-go-driver.
Query
The mgo.Query
struct was the return type from your “get” queries (i.e. Find
) in mgo and allowed you to go on to do things such as query.One(result)
, query.All(results)
and query.Sort(field)
. This concept doesn’t really exist at current in mongo-go and you are instead returned a mongo.Cursor
. With it being a cursor, you can of course “read next” and then you must decode the BSON document to a struct as necessary. This all sounds good and probably how you would expect this to work. While retrieving a cursor to iterate was possible in mgo, using the mgo.Query
struct was somewhat more convenient. For example, within mgo’s query.All
function a lot of the reading and decoding is hidden away. Unfortunately with mongo-go you are required to write your own reflection to sensibly decode multiple BSON documents into a slice of your custom. unspecified type (see below). This is a bit fiddly to write and it was nice that this slightly gruesome code was hidden away with mgo.
1// QueryWrapper wraps a mongo query
2type QueryWrapper struct {
3 q mongo.Cursor
4 e error
5}
6
7// All returns all results from the cursor
8func (w *QueryWrapper) All(result interface{}) error {
9 if w.e != nil {
10 return w.e
11 }
12
13 defer w.q.Close(nil)
14
15 resultv := reflect.ValueOf(result)
16 slicev := resultv.Elem()
17 if slicev.Kind() == reflect.Interface {
18 slicev = slicev.Elem()
19 }
20 slicev = slicev.Slice(0, slicev.Cap())
21 elemt := slicev.Type().Elem()
22 i := 0
23
24 for {
25 elemp := reflect.New(elemt)
26 if !w.q.Next(nil) {
27 break
28 }
29 err := w.q.Decode(elemp.Interface())
30 if err != nil {
31 return err
32 }
33 slicev = reflect.Append(slicev, elemp.Elem())
34 i++
35 }
36 resultv.Elem().Set(slicev.Slice(0, i))
37
38 return nil
39}
As I have said, you always get a cursor to iterate in mongo-go, and that results in further major differences, including how Sort
, Limit
and other functionality of this kind are used. These must be specified as options to the Find function in mongo-go rather than chained onto the mgo.Query
; again this seems reasonable but means refactoring of where these are used is required.
These kind of changes make trying to translate the new mongo-go-driver into the previously discussed wrappers somewhat tricky. At this stage, it is clear that trying to do so is not the best approach. The wrappers will need to be removed and replaced with wrappers more akin to the new structure of the mongo-go-driver, and as a result anything using these refactored.
IDs
A big part of the code base not being tied down to one database technology is having an ID type or abstraction that would allow you to easily change the underlying type of IDs without having to make changes to all of your entities. The idiomatic type for IDs in MongoDB is ObjectId, and therefore when using MongoDB it is usually preferable to use this. In order to avoid ObjectId (a MongoDB specific concept) leaking into our entities and functions, we have created our own type in Authinity of id.ID
. This type is then used whenever IDs are passed around between functions, or defined on entities. The underlying type of id.ID
is string — this is because the vast majority of ID types can be represented sensibly as a string. Providing methods that allow the id.ID
to be marshalled to and unmarshalled from an ObjectId allow the ObjectId to be used as the ID type in the database. With mgo, this was done via the GetBSON
and SetBSON
functions that could be defined on any type and provide a way to convert to/from BSON and that type. For our ID with an underlying type of string, this was straightforward:
1// ID with an underlying ObjectID so as to ensure immutability.
2type ID string
3
4// New creates a new ID
5func New() ID {
6 return ID(bson.NewObjectId())
7}
8
9// GetBSON implements bson.Getter.
10func (id ID) GetBSON() (interface{}, error) {
11 if id.IsEmpty() {
12 return "", nil
13 }
14 return bson.ObjectId(id), nil
15}
16
17// SetBSON implements bson.Setter.
18func (id *ID) SetBSON(raw bson.Raw) error {
19 var objID bson.ObjectId
20 if err := raw.Unmarshal(&objID); err != nil {
21 return err
22 }
23
24 *id = ID(objID)
25
26 return nil
27}
There are two options of how to do the same thing within mongo-go.
- Firstly, you can define a codec for a type that defines how it is handled when the mongo-go driver is decoding. The codec is then passed into the client options as part of the codec registry, in addition to existing codecs for all primitive and common types.
- Alternatively, you can take an approach that is very similar to mgo but uses the
UnmarshalBSONValue
andMarshalBSONValue
functions that work in a similar way toGetBSON
andSetBSON
. There was no need at this stage to use codecs here for us; providing implementations for these unmarshal/marshal methods on our ID type was fairly straightforward:
28// ID with an underlying ObjectID so as to ensure immutability.
29type ID string
30
31// New creates a new ID
32func New() ID {
33 return ID(objectid.New().Hex())
34}
35
36// UnmarshalBSONValue implements bson.ValueUnmarshaler
37func (id *ID) UnmarshalBSONValue(t bsontype.Type, raw []byte) error {
38 if t == bsontype.ObjectID && len(raw) == 12 {
39 var objID objectid.ObjectID
40 copy(objID[:], raw)
41 *id = ID(objID.Hex())
42 return nil
43 } else if t == bsontype.String {
44 if str, _, ok := bsoncore.ReadString(raw); ok && str == "" {
45 *id = ID("")
46 return nil
47 }
48 }
49
50 return fmt.Errorf("unable to unmarshal bson id — type: %v, length: %v", len(raw), t)
51}
52
53// MarshalBSONValue implements bson.ValueMarshaler
54func (id ID) MarshalBSONValue() (bsontype.Type, []byte, error) {
55 var objID objectid.ObjectID
56
57 if id.IsEmpty() {
58 objID = objectid.NilObjectID
59 } else {
60 var err error
61 objID, err = objectid.FromHex(id.String())
62 if err != nil {
63 return bsontype.ObjectID, nil, err
64 }
65 }
66
67 b, err := bson.Marshal(objID)
68 return bsontype.ObjectID, b, err
69
70}
However, when it came to testing this we ran into an issue. When unmarshalling you need to set the function receiver value (the ID) once you have converted from BSON (see line 41 above), and therefore you will always require a pointer receiver. This is no different to mgo (see line 24 above). However, when the type you are unmarshalling is not a pointer (i.e. id.ID
rather than *id.ID
) then mongo-go fails to recognise that the id.ID
type implements bson.ValueUnmarshaler
(i.e. the UnmarshalBSONValue
function). Changing the UnmarshalBSONValue
function to use a value receiver instead of a pointer receiver, mongo-go happily recognises that the type implements the ValueUnmarshaller. However, this is not viable as you can never set the receiver value unless you use a pointer receiver. Given that all of our entities use id.ID
as their ID types (i.e. not pointers), this was somewhat problematic. The only workaround at this point is to use pointers to ID (i.e. *id.ID
) throughout the codebase, which of course means changes to all entities and a vast amount of function signatures and some code within functions. This is far from ideal and we hope that this will be changed in a later version of the driver to ease our migration and prevent us from being forced into using pointers to ID throughout.
In the meantime, it would be worth investigating if the same issue is still present if we take the alternative route of defining a codec for our ID type and passing this into the mongo-go client codec registry.
Concurrency
With a web application, such as Authinity, you expect to have many concurrent paths of execution, the majority of which will be accessing the database for their own purpose. Properly handling the opening and closing of concurrent connections to a database is very important to ensure stability and consistency. You would fully expect this kind of protection to be included within any database driver. In mgo, the Session
struct provides this. The mgo godoc is very clear on its usage, talking specifically about concurrency and specifying how you should connect — with Dial
“called just once” and concurrent sessions “then established using the New
or Copy
methods on the obtained session” to ensure they “manage the pool of connections appropriately”. This results in a succinct session creator for mgo:
1// SessionCreator is a mongo session creator
2type SessionCreator struct {
3 session *SessionWrapper
4 mutex sync.Mutex
5}
6
7// Create creates a new underlying session if necessary and returns a copied session for use
8func (c *SessionCreator) Create() storage.Session {
9 c.mutex.Lock()
10 defer c.mutex.Unlock()
11
12 if c.session == nil {
13 s, err := mgo.Dial(storage.ConnectionURL())
14 if err != nil {
15 panic(err)
16 }
17
18 c.session = &SessionWrapper{s: s}
19 }
20
21 return c.session.Copy()
22}
Switching to mongo-go, it is obvious that you need to construct a new client and call the Connect
function to establish a connection. However, it isn’t especially clear how concurrent connections should be handled. I am having to make some assumptions here based on how the other, more mature MongoDB drivers are documented and assume that the Go driver is, or eventually will be, the same. The documentation for other drivers indicates that their client object is thread-safe and it is recommended to store a single instance of the client in a global place, albeit also allowing multiple client instances with the same settings to make use of the same connection pools. If this is the case for the Go driver as well, then you can’t really go wrong.
1// SessionCreator is a mongo session creator
2type SessionCreator struct {
3 client *mongo.Client
4 mutex sync.Mutex
5}
6
7// Create creates a new underlying mongo client if necessary and connects ready for use
8func (c *SessionCreator) Create() storage.Session {
9 c.mutex.Lock()
10 defer c.mutex.Unlock()
11
12 if c.client == nil {
13 client, err := mongo.NewClient(storage.ConnectionURL())
14 if err != nil {
15 panic(err)
16 }
17
18 err = client.Connect(context.Background())
19 if err != nil {
20 panic(err)
21 }
22
23 c.client = client
24 }
25
26 return &SessionClientWrapper{client: c.client}
27}
As you can see above, I have chosen to embed the client within my SessionCreator
as there is only a single instance of this and the existing mutex can be used to ensure only one client is created. The client will be created and connected the first time that the Create
function is called and after that just a wrapper containing the client will be returned from Create
. You will notice that I have renamed SessionWrapper
to SessionClientWrapper
as it now wraps a mongo.Client
rather than an mgo.Session
, while still implementing our storage.Session
interface (which is used throughout the codebase). This wraps the client to provide access to the DB method to access databases, collections and perform queries as this is now done via the mongo.Client
now, instead of previously being via mgo.Session
.
Previously mgo.Session
required you to close each session after use, which still kept the underlying connection open and allowed a new session to be created for each use, With mongo-go we simply leave the client connected to ensure this underlying connection remains open, otherwise future attempts to use the client will find the connection closed. Similarly, if we were to Connect
and Disconnect
the client for each use then we would end up attempting to Connect
on an already connected client, which produces an error. In addition, there appears to be no way to determine if the client is already connected, and so the best option here seems to be to leave it connected for use across multiple threads. It is unclear at the moment when client.Disconnect should be called, or if a call to this is even required, but it seems sensible that this would be when the application exits completely.
This patterns allows the mongo-go-driver to control the pool of connections through a single mongo.Client
, as well as allowing the application to connect when the first database access takes place. I would fully expect the mongo-go-driver to handle the management of the connection pool and so using the driver in this pattern seems like the most logical choice until further documentation is added to confirm this.
Conclusion
The mongo-go-driver is still in the alpha phase and changing fairly rapidly. Perhaps some of its minor shortcomings will be addressed during the rest of this phase and as it enters the beta phase. In its current state and as an early adopter, we faced some awkward barriers in our migration. Nonetheless, these kind of barriers will likely be common in migrations and documentation will no doubt improve, as will the input and support from the MongoDB community. A brief migration guide written by the MongoDB team for people migrating from mgo, and/or improved mongo-go-driver godocs would be massively helpful in providing clarity for anyone migrating.
Any issues raised throughout this post are, on reflection, relatively minor, and in general we are happy that the official driver is the right way to go. It is easy to understand the reasons in any differences that we observe in mongo-go and the design decisions taken so far seem sound. The official driver certainly feels more like an official driver, using more consistent naming and approach with the MongoDB official drivers for other languages, and adhering to the MongoDB driver specification. This is a huge positive when you regularly work in different languages, as we do.
While migrating at this early stage may still come with a few headaches, it is certainly worth getting your migration underway early and in doing so helping to contribute to the growth and maturity of the official mongo-go-driver.
Reach out to myself or anyone at Avco Systems if you have any questions in this area.