Skip to content

Solaris Idmap Problems

When using the kernel enabled CIFS server on Solaris 11, we found that the idmap service picks Domain Controllers that are located across a WAN link, which cause two problems:
A) slow authentication; or even worse
B) idmap will use a server that disappears when a WAN link goes down which causes havoc

After watching the debug logs I can see that idmap scans the SRV records in DNS to get a list of Domain Controllers in the forest.  Even when config/site_name (not a well documented setting) is set in the SMF properties for idmap, the discovery process still cycles through the whole list of DC's in the forest.  If the first one is unreachable it keeps going until it finds one.  The list of SRV records is pretty much random since Active Directory assigned a weight of 100% to each SRV entry.  So in our case the discovery routine of idmap use basically a random server in a list of 21 Domain Controllers no matter where they live.  As long as its reachable through LDAP.

If the idmap service would just use the DC's listed in the specific site we specify for this CIFS server this would be a much more stable service.  It's possible this could be a bug that needs to be reported to Sun (Oracle) I am not sure.

My work around:

In my case I made local firewall rules on the inferior Windows Domain Controllers to block the specific Solaris CIFS server from connecting to them.  So the idmap logs will still show the unsuccessful attempts connecting to non reachable servers during discovery, but at least it will not be able to use them.  Whereas without the firewall block idmap would happily attach to a reachable DC in India or Australia.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall add rule name="Block Solaris IDMAPD" dir=In new remoteip="172.19.8.62/32,172.19.8.64/32,172.21.8.33/32" Action="Block" protocol="Any" Profile="Domain,Private,Public" enable="no

Ok.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall show rule name="Block Solaris IDMAPD"

Rule Name:                            Block Solaris IDMAPD
----------------------------------------------------------------------
Enabled:                              No
Direction:                            In
Profiles:                             Domain,Private,Public
Grouping:
LocalIP:                              Any
RemoteIP:                             172.19.8.62/32,172.19.8.64/32,172.21.8.33/32
Protocol:                             Any
Edge traversal:                       No
Action:                               Block
Ok.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall set rule name="Block Solaris IDMAPD" new enable="yes"

Updated 1 rule(s).
Ok.

Log entries looks  like this:

# pwd
/var/svc/log

# tail -f system-idmap:default.log
LDAP: frdc001.domain.com:389: Can't connect to the LDAP server
frdc001.sonosite.com: Can't connect to the LDAP server - Operation now in progress
LDAP: dedc002.domain.com:389: Can't connect to the LDAP server
dedc002.sonosite.com: Can't connect to the LDAP server - Operation now in progress
Using server usdc001.sonosite.com:3268

** Note:
Unfortunately the "Using server" log entry is specific to SunOS 5.11 151.0.1.8 which I think translates to Solaris 11 Express.  Even with debugging turned on for all, discovery or ldap I did not get the "Using server" entries on 5.11 11.0.

Check what DNS shows in the forest.  Our case 21 DC's:

# dig _ldap._tcp.domain.com SRV +short
;; Truncated, retrying in TCP mode.
0 100 389 frdc001.domain.com.
<snip>
0 100 389 indc002.domain.com.

Set Debugging Higher. Play with these. All might be too high, especially in a working server:

# svccfg -s idmap setprop 'debug/all = integer: 0'
# svccfg -s idmap setprop 'debug/ldap = integer: 1'
# svccfg -s idmap setprop 'debug/discovery = integer: 1'

Refresh the service to reload configuration change:

# svcadm refresh svc:/system/idmap:default

Set site_name :

# svccfg -s idmap setprop 'config/site_name = astring: US'
# svcadm refresh svc:/system/idmap:default

If the site name is not set the discovery process will complain that no site found.  It does not really affect anything since it goes and use any DC in the forest anyhow but I would think if site is set the discovery should behave better.

Check the SRV record for US site as we configured in Active Directory:

# dig _ldap._tcp.US._sites.domain.com SRV +short
0 100 389 usdc101.domain.com.
<snip>
0 100 389 usdc001.domain.com.

Check the CA site:

# dig _ldap._tcp.CA._sites.domain.com SRV +short
0 100 389 cadc001.domain.com.
0 100 389 cadc002.domain.com.

Check if this service is running. Might be required:

# svcs name-service-cache
STATE          STIME    FMRI
online         Jun_04   svc:/system/name-service-cache:default

TODO:

- Check how the Solaris ZFS appliance does this.  It does not appear to suffer the same fate.

Links:

http://docs.oracle.com/cd/E19082-01/819-3194/adsetup-2/index.html

Published inCIFSSolaris