“let no good deed go unpunished”
I’m not sure who said that, but it certainly seems true with SharePoint Patching – It’s Sunday night and I thought I would “be nice” and upgrade our development environment over the weekend so the developers don’t loose a day during the week.
What was I thinking…
My install:
2 SharePoint boxes plus one FAST box.
I had downloaded 4 updates:
– SharePoint Server SP2
– Language Packs for office products SP2
– Office Web Apps SP2
– Fast Server SP2
Progress:
The FAST server SP2 installed on the fast server pretty quickly – it asked for a reboot no trouble there – I didn’t see any indication that there were any post setup tasks to run, but then again I don’t know that I looked very hard.
The Problem Children: my two SharePoint boxes.
I ran setup on the 3 remaining SP2 installers in this order: Server, Language, OWA. The installs went pretty quick thanks to a tip I had heard on Todd Klindt’s netcast last week about stopping a bunch of services prior to launching setup. Luckily, I had also run a script I had leftover from SP1 that backed up key files and took an inventory of running services, content databases, etc…
With the 3 binaries installed, It was time to run the Dreaded “SharePoint Products Configuration Wizard”. I say dreaded, because looking back, I can’t think of a single time it’s ever worked the first time after an upgrade. Something has always gone wrong and this Do-gooder’s Sunday night was no exception.
At stage 9 of 10, it threw an error saying the SharePoint Admin Service (SPADMINV4) wasn’t started.
Piece of cake, I’ll just start it. Wait, no I can’t – it won’t start.
A few searches on the internet and I found this article:
http://support.microsoft.com/kb/2756815
It mentioned the Claims to Windows token service not starting either – hmm, that seemed to be the case too.
I checked a different environment that hadn’t been patched with SP2 and sure enough the claims to windows token service was running on that one, but it wasn’t running on my dev environment. Hmmm.
Do you remember above where I mentioned I had an inventory of what services were running before the upgrade??? Guess what – Claims to windows token service (C2WTS) was not running BEFORE my Upgrade, yet SP 2010 PRE SP2 seemed to be working just fine.
So perhaps there is something in SP 2010 SP2’s admin service that requires the C2WTS service. Or maybe that came with SP1, and I thought I had SP1 but was running RTM. Who knows.
Anyhow back to the fix, I used the second solution in KB 2756815 – My servers are behind a proxy server, and can’t get out to the internet so the C2WTS service gets mad when it can’t check the internet for the latest and greatest gossip about what certificates have fallen out of favor. In fact C2WTS gets so mad, it refuses to start – which is where KB2756815 comes in – I was able to change a local policy setting and after I did that, C2WTS started right up and after that, The SharePoint Admin Service (SPAdminV4) started as well.
As I write this, the Farm is in upgrade task step 9 of 10 and its 41.17% done- Hopefully it will finish up, then I can run it on my second box, and start upgrading content databases and be on my merry way.
But of course not…
Somewhere after about 44% I stepped away only to return and be greeted by the “Configuration Failed” screen – again.
2 problems seen in the logs:
- The user profile sync service would not start – I actually saw this flash across the screen during the operation of the wizard – it nicely said that I could/should start that after the install was done.
- The second issue: A timeout, Looks to be related to the user profile service.
The error mentioned setting the USP Sync Service Instance to Offline, so maybe I’ll get lucky and it will work the second time I run the wizard.
For Kicks, I’m running the wizard once on the second machine to see if it fairs any better. The second machine isn’t running User Profile Sync, which I don’t think should matter as much of the upgrade work is done via timer jobs and should work across all nodes in the farm, but just in case, it’s worth trying – I could use a success.
And, I got one! The second node finished ok, so now back to the first node.
Success again! Re-running the Products config wizard on the first node a second time also worked! Now all I need to do is start that user profile service… I tried it from windows services, but it didn’t start – not a big deal since I know that quite a bit goes on when provisioning User Profile Sync and remember above, SharePoint mentioned that it put UPS offline.
Next I try to Central Admin->”Services on Server” I find the service, click start and am asked for the farm account password – on the password screen in nice red letters, it says that after I provision the service I need to do an IIS reset. I think to myself “I haven’t done anything yet” and I enter the needed password and click OK.
I am immediately greeted by one of those SharePoint error screens with a correlation ID on it. Off to look at the ULS logs.
System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
Hmm, makes me wonder if something else is wrong.
I checked a few other SP Service Apps and they seemed fine.
Is it worth trying again? I go back to services on server and click start – I see the Red IISreset message and figure it can’t hurt to do it BEFORE – so I give that a shot, and guess what, I am able to click OK and not get an error – the service now shows as “Starting”
This doesn’t bother me – I’ve done User profile sync in 2010 a few times and I know it takes something like 10 minutes to finish setting up FIM in the background, but it’s taking a long time and I’m getting nervous…
I check the windows services – FIM Sync is started, FIM itself is still disabled. Hmm – I can’t say I’ve ever watched it so I don’t know which one gets put up first, so I wait a while longer, then I try a reboot, still no luck. Time for some powershell
I look at the user profile sync services in powershell with this command:
Get-SPServiceInstance | where { $_.typename -like ‘User Profile Synchronization *’ }
I get a line back that shows it is “Provisioning” – I’ve seen this before so it’s time to Kill that and try again.
I did a Stop-SPServiceInstance followed by the ID of the “Stuck” Sync Service.
Now back to Central admin for another try at starting it.
Still no luck.
Spencer Harbar is pretty much the expert on this and his blog says that if we’re in the situation we’re in, that we should just go ahead and rebuild the user profile service from scratch. There is only one problem here – we use Newsgator, a 3rd party add in, that relies heavily on User Profiles – So I’ve got a support ticket open with them to find out if UPS gets reset, if it will break a bunch of associations.
I’ll hopefully have an update tomorrow.
Update:
The issue turned out to be related to Duplicate Certificates for ForeFront.
Here are a few steps I took to get from Problem to Resolution.
- Enabled verbose logging for User Profile Service in Central Admin
- Looked at the Windows Event logs
- Reset the SharePoint Timer Cache
- Started the User Profile Sync Service on a different Node.
I noticed a few additional things:
In the Event Log, Event ID 234 from “ILM Web Service Configuration”
ILM Certificate could not be created: Cert step 2 could not be created: C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root
ILM Certificate could not be created: Cert could not be added: C:\Program Files\Microsoft Office Servers\14.0\Tools\CertMgr.exe -add -r LocalMachine -s My -c -n “ForefrontIdentityManager” -r LocalMachine -s TrustedPeople
I also noticed that the UPS Sync worked just fine on the other node, so that was good news.
MS had suggested that the FIM certs might have been messed up so I looked at certificates on both the working and non-working systems
What I found was that the Non-working system had more than one certificate.
MS said it was safe to delete all the forefront certs so that’s what I did, but it still didn’t work.
As it turned out, there was more than one place the certificates
The FIM certificates were found in two locations:
- Certificates (Local Computer) -> Personal -> Certificates -> ForefrontIdentityManager
- Certificates (Local Computer) -> Trusted Root Authority -> Certificates -> ForeFrontIdentityManager
I had deleted the ones from Personal, but not the Trusted Root Authority
http://www.cleverworkarounds.com/2010/08/15/more-user-profile-sync-in-sp2010-certificate-provisioning-issues/ Does a great job explaining the certificates so no need to rehash it here.
So in summary, Extra ForeFront certificates in the certificate store were the reason I couldn’t start the User Profile Sync Service.