I had an interesting afternoon today, trying to identify an issue that occurred while working on a user migration project.  This is a solution I have used for many apps that only support a single Domain/Naming context when in a distributed directory environment.  The benefits and issues with this for long term use will be for another post all together.

Some background on the plan, is that due to an acquisition, there is a planned migration from a legacy Active Directory domain, into a new Active Directory forest/domain.  Ultimately we want to use the same username from the OldDomain in the NewDomain, so that the user experience for the applications are pretty seamless to the users of the applications.  Many of these applications only require a username and a password which is used to BIND to an LDAP directory for a success/failure status, and then a possible query for any user metadata or groups.

Things to note:

  • A single Big Bang migration is not possible, and needs to be a phase migration of users
  • Many applications only support a simple BIND to a single LDAP directory
  • Users are domains in different forests (Sorry, no GCs )
  • User Names are unique between OldDomain and NewDomain

During this migration period, applications will need to authenticate and do LDAP queries for user information, but many of the applications only support 1 domain/Naming context.  They also only support Simple LDAP binds, and rarely a SASL bind among them.

One of the solutions for these single NC dependent applications, was to utilize Identity Life Cycle Manager (ILM) and Active Directory in Application Mode (ADAM) to give these applications the ability to authenticate users from multiple domains, but still provide a single BIND point.

This can be accomplished by using ILM to import the user data from OldDomain and NewDomain, and provision userProxy objects into an ADAM instance which becomes the central point of application Binding.   The userProxy object (it does not need to be userProxy, it just needs to support the msDS-bindProxy aux class) does not contain a password like a normal user object, but instead proxies the simple bind authentication request to the Active Directory Domain that the userProxy is linked too.  This linking is done by populating the objectSid property of the ADAM user with the SID of the parent AD user.  When the SIMPLE bind to ADAM for that username,  the request is sent to AD using the local Security API (More on this later).  You can read more about Bind Redirection for ADAM Proxy Objects here.  The benefit is that all your auditing/compliance/support is managed at the Active Directory level, and not at the ADAM level.  This avoids having to sync user passwords to ADAM, and all the issues that may arise with that.

The export/provision rules from ILM to ADAM can be summarized as:

  1. Filter out DISABLED users from entering the Metaverse.  This insures that we only have the enabled accounts, or the ones which could be used for authentication.
  2. If the same userName  is enabled in both domains,  Join on samAccountName.
  3. Set the NewDomain user data as a higher precedence for contribution.  This means if both user accounts are enabled,  the NewDomain data replaces OldDomain Data when exported to ADAM.
  4. The assumption is,  that NewDomain accounts are to be enabled once the user has been completely migrated and is actively using the NewDomain account for primary account.
  5. Since ADAM is keyed on samAccountName, only one account with the same name can be in the ADAM directory at one time.  The objectSid associated with that name will belong to the ENABLED account of the pair.
  6. The ADAM directory is in newDomain Domain.

So LDAP applications pointing to ADAM request the userName and the Password from the endUser.  Users in OldDomain who are yet to be migrated continue to use their existing userName and password and authenticate against OldDomain.  Users who have been migrated, and Enabled,  use the same userName and password, but authenticate toward NewDomain.   As long as the passwords are synced at time of Enabling the NewDomain user account, the transition can be seamless.  In the end,  all the users will be migrated to NewDomain, but the applications will be able to authenticate users from both domains, because they appear as single Domain when Binding.

Here is a quick example of how this works below. (I didn’t have Visio handy, so I had to user Power Point)

ILM ADLDS BindProxy Sync Diagram

ILM ADLDS BindProxy Sync Diagram

Now that I have tried to explain all of that (in a long winded way),  let me get back to the issue that was found out today.  The solution above has been working as designed with much success.  It has allowed the phased migration of users while the applications can authenticate users from both domains.

Today I receive a call that all of sudden the users are getting told they cannot logon when pointing to ADAM.   I quickly checked the users in ADAM and everything looked fine as far as user properties are concerned.   When using this solution I find it’s helpful to flow the string value of the Domain and Username that belong to the objectSid of the userProxy object because it may not be visually evident that what the ADAM user will proxy back as. (I usually create an attribute called parentADuser for this).  The user in question was showing a value of “OldDomainmyUser” since the user had not been migrated to newDomain yet, this seemed correct.  So I decided to check with ADfind, and the -resolveSids switch, which will translate the objectSid to the textual representation of the domain and user that matches that objectSid.  Instead of seeing “OldDomainmyUser” I now see “NewDomainmyUser”, but why?

I checked ILM and indeed the objectSid in the userProxy object was from OldDomainmyUser, but it resolved as “NewDomainmyUser”.   I checked to see if the newDomain user had been enabled since that SID would take precedence in the exporting to ADAM, but it indeed was Disabled.

So the oldDomain SID now translates to the NewDomain user, which means that when an authentication request comes in, it is being proxied to the NewDomain user, which is disabled. This would explain why the user could not logon.

So why was this happening?  It turns out that the migration team in an effort to pre-stage the migration had added the oldDomain SID to the sidHistory of the NewDomain user.   Now since ADAM uses the LSA API’s to authenticate the userProxy object,  it was sending the OldDomain SID, to the newDomain. the newDomain would realize it now has a user matching that SID (in sidHistory) and attempt to proxy the auth to the newDomain user.

The solution?   Remove sidHistory from disabled user accounts in NewDomain, until the user was ready to be migrated and enabled.  This means that Proxied Auth from ADAM would go to the oldDomain user account and not the disabled newDomain user account.  Of course this should have been tested by the migration team earlier, but it was an interesting hour or so today as I pieced together the puzzle to find out what was going on.

So what did I learn from this situation?  When using ADAM bind redirection,  to remember that the authentication uses the local security API of the ADAM server. This means it will attempt to resolve the SIDs from the Domain the server belongs too.  If sidHistory is contained on that user object in AD, for the SID from the foreign domain, it will authenticate to the local Domain first before going to the foreign domain.  This seems very logical when you think about it, but obviously it has implications when using this solution for a migration project.

I also learned it’s quite hard to summarize everything in a single Blog post, and still have it make sense.  :)

Comment posted by Jef

at 1/31/2008 6:10:00 PM

Jef

Peter,
Glad to hear from you after so long!  I sent you an invite on messenger.
I know it CAN be done, but unfortunately many vendors and developers don’t do it correctly. :(   My team gets stuck “making it work” and use this solution more often than I’d like too when we get stuck with a product the vendor is locked into a single naming context, or uses directory guessing techniques which have their own issue. :(
Maybe I should post a naughty list of vendors and products out there that don’t play well in a multi-domain environment…
Hope to hear from you!

September 06 10:54 PM
Comment posted by (namnlös)

at 1/31/2008 3:04:00 AM

(namnlös)

Hi jef,
I have done some migrations and I see that you mentioned that you cant bind two domains at the same time…I assume thats why you use the ADAM in the middle.
Actally you can bind more than one domain at the same time, I have som vbs that does the job if youre interested, both for user and group migration and for reACLing the file structures…
Second, if you like, add me to your MSN! mail is: firstnameLastname@hotmail.com FirstnameLastname = peterwestling
Have a nice day!

September 03 4:07 AM (http://banan-republiken.spaces.live.com/)
Comment posted by Jef

at 6/27/2008 12:18:00 AM

Jef

Dmitri,
I forgot to mention that as one of the options we had considered.  Doing so would not fit the migration timeline because oldDomain is going offline at seperation (in a few months) and we would have to move ADAM then, etc.   Also in my scenario oldDomain is not managed by the same group as newDomain, which could expose passwords over LDAP simple binds where SSL is not used, etc.  But yes, it would be viable since the oldDomain sid would be tried against oldDomain first :)
We did think about moving the ADAM box to the ROOT domain of the forest newDomain is in,  but it still resolves the oldDomain sid as newDomain which makes sense.
Thanks for the feedback.

June 27 12:18 PM
Comment posted by Dmitri Gavrilov [MSFT]

at 6/27/2008 11:31:00 AM

Dmitri Gavrilov [MSFT]

Just for completeness, another option is to join ADAM machine to the OldDomain. Of course, this option might not be feasible in many deployments.

June 27 11:31 AM

A few weeks ago,  my home PC suffered a massive power issue, and refuses to stay booted up for more than a few seconds.   Of course this was a good excuse to finally go out and spec out a new home PC, since ours was getting long in the tooth.

Well,  the new PC is fast, and I decided to install Vista64 instead of Vista32, since it’s about time to make that transition.  My wife’s design applications all come in 64-bit flavors, so there would be a benefit for doing so.

So far so good, with one glaring exception:  Cisco VPN client. (5.0.0.340)

The IPsec and WebVPN clients refuse to work on Vista64, which as left me out in the VPN cold.  It seems  that Cisco is not going to be supporting IPsec on 64bit OS’s, and will be transitioning 64bit support for their Any connect SSL client.   Unfortunately this client is still in Beta, and testing with it so far still has some bugs to deal with.

So there is hope in sight, but for now I am forced to either run Cisco VPN in a VM on my desktop PC, or use an old laptop. :(

It’s the price you sometimes pay for being cutting edge it seems.

UPDATE (06-28-07):  Cisco has released their AnyConnect firmware, and client.  This now works perfectly on Vista 64-Bit!  Oh Happy Day!

From what I understand only the Anyconnect SSL VPN will support x64 machines in the future, but I could be wrong.

In my enterprise,  we use CryptoCard as one of our two-factor authentication providers, which has always been a hands off system for ILM integration.  I believe it was always because the underpinnings of the CryptoAdmin system were never quite understood, and there were other ways to manage the token definitions, that it was never thought it be a target system for ILM integration.

After a quick discussion this week,  I decided to investigate what it would actually take to connect ILM to view and possibly manage the tokens in the CryptoAdmin server.  It turns out,  my instance is running on MySQL 4.x, which out of the box ILM does not provide a MA for connectivity.

I figured now would be a good time to investigate the possible usage of an Extensible MA (XMA), as I have always meant to dig deeper into how they function, and what is needed to implement them, but never had the task at hand to need to implement it.  It was a real “Aha!” moment seeing how simple it really was, after all this time of being hands off.

It turns out, that The MIISexperts (now ILMexperts?!) have posted XMA code for use with MySQL which sounded like it would fit the bill, and offer me insight on how a working XMA has been coded.

I have to say, after downloading the MySQL .NET connector, and compiling the MySQL XMA code, it was pretty straight forward in implementing it on ILM to connect to the back end MySQL CryptoAdmin database.  Be sure to set your import template as an AVP source, because that is what the XMA will output the file as, and I was not able to change this without deleting and recreating the MA.

The only issue I had was the XMA defaulted to text encoding codepage other than UNICODE, which gave me the the stopped-file-embedded-null error.  Setting the file codepage to Unicode resolved my issue immediately, and Full Imports had succeeded.

So far, I am impressed with the ability to develop an XMA for the target system, even when Microsoft does not provide one out of the box.  For me, giving the ability to resolve my problem, and extend the platform to meet my business needs really adds allot of value to this implementation.  I don’t know why I had not focused on the XMA before, but I already can think of many other situations where the XMA can be implemented to provide integration.

Next steps will be working on export logic for managing CryptoCard entries,  but I think the big hurdles of not knowing how to implement the XMA are over, and it’s just like any MA with coding the management logic.

Thanks to Alex Tcherniakhovski for his posting on a Walkthrough: How to build an extensible management agent for MIIS for implementing the XMA which was really helpful in visualizing the process.

The issue with the Out of Memory Error and BAIL errors after installing MIIS SP2 or installing ILM 2007 can be resolved with a Hot fix:

ILM 2007: KB 938014
MIIS 2003 SP2: KB 936306
IIFP SP2: KB 938015

I had to request these from Premier Support as they were not available on the KB.

This brought the version up to:  3.2.1001.0 (Service Pack 2)

This has resolved my current issues after applying SP2.

UPDATE:

Brad Turner picked up more details from PSS on this error which you can read about here