Category Archives: Troubleshooting

Fun configuring Office web apps 2013 (OWA)

Note: This is not a “how to” article on what steps you need to take to configure OWA in 2013, it is a listing of the problems I ran into while configuring mine, and what solutions I found.  If you’re installing OWA, you’ll definitely want to read this, as it will probably help you, but it’s not a step by step how to guide on how to do it…

I ran into a few problems while configuring OWA for use in a new SharePoint 2013 farm today.

The DNS name you pick matters

I wasn’t sure of the architecture of this whole thing and one question was bugging me…

Does the user need direct access to the OWA server? Or do the SharePoint Web Front ends act as a proxy for it?

I setup a config and ran Fiddler on the client with the browser to see what server(s) it connected to – sure enough it connects to the OWA server(s) directly.

This has some implications for us if we’re making this available on the internet…
Think of this for a moment, in 2010, OWA was a “Service Application” – if your user had access to the WFE, SharePoint took care of the rest, even if OWA was on a different box (or boxes) It was Magic.

With OWA2013, you’ll need an external IP address, an SSL certificate your browsers will see as valid, and NLB or an external load balancer if you’re using more than one OWA box.
It also means you’ll want to build your OWA farm with a legit DNS name and not the machine name.

Uninstalling an old config:

First, I had messed around with using HTTP before my certificates were ready.
No Problem, I thought, I’ll just remove that Initial Farm config I had built with New-OfficeWebAppsFarm command, surely there is a Remove-OfficeWebAppsFarm command right?

Wrong.

The above shows me 3 commands for creating, but only 2 for removing!

Yep, I can use New-OfficeWebAppsFarm, but there is no Remove-OfficeWebAppsFarm!

It turned out to be easy enough, on my one box farm in Dev, I just needed to use

Note that if you have more than one machine in the OWA farm, you’d need to run the above command on each box, saving the first one (master) until last.

With that cleared up and was ready to install with my cert.

The next thing I ran into – the new-OfficeWebAppsFarm command wants the “friendly name” of the cert, not the domain name – it was easy enough to figure that out, but it threw me for a loop.

Time for my first OWA farm install!

Here I ran the command below on the first node of OWA:

Everything seemed pretty great…

Confirming it works

Except when you’re done, you’re supposed to check that it works by visiting this URL:

https://myurl.com/hosting/discovery

This is supposed to display a nice little snippit of XML – one that looks familiar if you’ve ever pulled up a web service in a browser.

Instead I got an error.

HTTP Error 500.21 – Internal Server Error
Handler “DiscoveryService” has a bad module “ManagedPipeLineHandler” in its module list

It looked like asp.net wasn’t registered.

I searched the error, and found this article: http://consulting.risualblogs.com/blog/2013/04/03/office-webapps-deployment-error/

bottom line, I ran this command:

and that re-bound ASP.net to IIS.

So far so good, the page was coming up now on the first node.

Adding a second OWA server

Now time for the second node.

To add a 2nd-99th OWA node, the command is a little different:

Here, I ran into one more issue – kinda simple but an “of course!” kinda moment.

When I ran the command the first time, it couldn’t find the OWA server to connect to.

Care to guess why?

Because these are all behind a load balancer, I had host entries on both OWA servers that pointed back to themselves. This is common so that you can log on to one server and confirm THAT server is working.  Well, I got ahead of myself here, and the new-officewebappsmachine command was being redirected back to itself.

Not-going-to-work

Easy fix, I pointed the host entry to the other server long enough to do the command, then set it back so I could test it locally.  I’m glad I did that, because the second node had the same problem as the first, something I might not have found without the host entry.
I ran that aspnet_regiis.exe -ir command a second time, and did an IIS reset and things were looking good on my second OWA node.

Oh and one more goofy thing I noticed…

Configuring SP to talk to OWA

No OWA build is complete until it’s registered with SharePoint 2013 so SP knows where to send the traffic.

To do this, on the SharePoint server we run this command:

Now what’s odd here, is it rejected my server name multiple times.

It did not like:

  • “Https://myurl.com/”
  • https://myurl.com
  • “Myurl.com” (I don’t know if I actually tried this one)

What worked was just using myurl.com – no quotes, no https://

As soon as I finished the stuff above, I thought I was ‘done’ after all, the commands worked so OWA must work now right?

Kinda.

Don’t test OWA with the System account

I ran into a few more issues when I tried to test this. Remember that bit above, where I said on each OWA server, I had put in a host entry so that I could log on to the server, and test just that server? Well, I was logging on to the server as the SharePoint “System Account” – and guess what? That doesn’t work.  Now you might expect a nice error that says something like “You can’t use this as the system account” – Nope. Just a nice correlation ID to go hunt down.  And in the ULS logs what do we see? Well, nothing that indicates that I can’t use the admin – I dug it up the solution by searching the web.
So I tested with a different account which solved one problem, but it still wasn’t working…

Make sure the individual WOPI bindings match the overall zone

You configure the connection between SharePoint and OWA on the SP box by using commands like this:

Now, you might ask, how did I choose the zone?  There are 4 choices, and somewhere I read that if this site is going to be available internally and externally, use “External-HTTPS”

Not so fast..

As it turns out there’s another command that’s relevant here:

This will list out a bunch of individual bindings for each file type.
Guess what those bindings said? Internal-HTTPS.

Another search on the internet and yeah, these should have been the same as the overall zone set by the Set-SPWOPIZONE command I used above.

Wanting a “quick win” it seemed faster to change the overall zone by reissuing the set-SPWOPIZone command than to figure out how to update 185 bindings. I ran this:

And guess what??

OWA works now.

Ok, so that sums up my first few installs of Office Web Apps 2013 and tying them to SharePoint 2013 in a multi-server OWA farm.

That’s a lot of material, lets see if i can sum this up with a nice bulleted list:

  • Use a real DNS name for your OWA box, not the machine name.
  • It’s possible to remove an OWA farm config, even though there is not a “remove-OfficeWebAppsFarm.
  • Don’t put host entries on your OWA boxes until after you’re done with the install.
  • When configuring the WOPI bindings on the SharePoint farm, don’t use https:// in the DNS name.
  • make sure the zone listed for each binding matches the zone set for the the overall connection.
  • When you test OWA, don’t use the Farm account.

– Jack

User leaves the company and users mysite is no longer accesible

This has happened to me a few times….

A user leaves and his or her AD account is deleted.

Sharepoint catches wind and marks the mysite as abandoned and emails the manager it had in User profile service, then deletes the User profile.

Then a day later someone emails the SharePoint Admin (me) and says they aren’t actually that person’s manager.

The link in the email to the mysite doesn’t work and it’s easy to think that it just auto deleted.

Checking the site with powershell (get-spsite http://mysiteurl/person/userid) shows the site collection still exists.

I found the answer at Demand Prasad’s blog.

To see the contents of the mysite, you need to be one of the two site collection admins as assigned in Central admin, but you can’t use central admin, you have to use

stsadm -o siteowner -url http://mystite/person/userid -secondarylogin domain\userid

The orignal blog post does a great job of explaining it so head over and have a read.

http://demantprasad.wordpress.com/2013/05/03/my-site-cleanup-job-and-user-not-found-error-message-in-sharepoint-2010/

Troublshooting a 404 error related to moving a content database

I’ve had an odd issue come up a few times now….

For one reason or another, a content database needed to be detached and then re-attached to the SharePoint farm.

The most recent need for this was so a specific content DB could me moved to a different disk on the SQL server.

In theory, the operation is pretty easy:
  • Umount the Content DB –  I did this from Central administration, Manage content databases
  • have the DBA move the DB to a new disk.
  • Mount the content DB – this is where trouble started….
Here’s what happened:

The Site collections in that DB were available on one WFE, but not the other one.

Hmm, now for the troubleshooting….

Using PowerShell, I used the command below on each WFE:

Oddly, on the working WFE, my “moved” content database showed  up in the list.

On the non-working WFE it was missing.

The fix:

I tried unmounting and remounting from Central admin – no luck. (though one time the “working” WFE flipped to the other WFE!)

Next I thought I would get clever and use powershell on the broken WFE to mount it there – this didn’t work- I got an error that said the Content DB was already attached.

I also tried things like restarting IIS, Restarting the SharePoint Timer Service and Restarting the SharePoint Admin Service – no luck.

I also tried clearing the SharePoint Timer Cache – no luck either!

So Finally I unmounted it from the farm (again from CA)

Then I ran the command below on ALL nodes (WFE and APP) to confirm that it wasn’t in the farm anywhere:

Finally I ran the mount-spcontentdatabase command from the Central admin box:

Note that I only had to run this once – I confirmed the Content DB was visible on all 5 boxes by re-running the get-spcontentdatabase command on each of them and all was good!

As a side note, I was lucky – Because we moved the content database, I knew what site collection to validate. Also as a habit, I log into each WFE to validate – had I used the public URL and gone through the load balancer,  would have had a 50/50 chance of seeing the problem.

To make matters worse, we have 18 content Databases in this web application. The Content database we moved was NOT the root site collection! Meaning uncaught issues would have gone unnoticed for a long time. And when the issues were discovered, they’d only apply to a single site collection and only then on a specific WFE would there be a problem. These kinds of situations are nearly impossible to figure out down the road.

So if you find yourself detaching & re-attaching content databases, be sure to test the site collections in those databases on each WFE, it could save you hours down the road.

(SharePoint 2010 SP1)

Troubleshooting Bad AD accounts and people picker issues in Central admin while granting Site collection admin rights

I had a strange problem today – While trying to set a site collection admin via central administration on SharePoint 2010, The existing user ID’s were both underlined with a red squiggly line, and trying to pick new names in people picker resulted in no results.

image001

At first I suspected the CA box had maybe been dropped from the domain or some other AD related issue.

But further investigation showed that other web apps on the same farm were fine.

Additionally, From the web app itself, I was able to select users in the people picker and they worked just fine, So the problem was only with the people picker in central admin, and only for this one specific web application.

I asked around on the SPYam forum on Yammer, and also at SharePoint-Community.net.

I got some great dialog going within a few minutes –

Adam Larkin asked if I had been using IE 10 – He had seen issues caused by IE – I was using IE 10 but this didn’t turn out to be the issue in this case.

Jasit Chopra suggested that I check Authentication mechanism – but this was set to NTLM, and was the same between the working and non-working site,

Paul Choquette – suggested I compare web.configs – another great suggestion – It didn’t turn up anything however.

Vlad (one of the founders of SharePoint-Community.net) also chimed in.

So far so good, No solution yet, but narrowed it down quite a bit. I always appreciate any help troubleshooting – sometimes just talking about things leads to a resolution.

Over on Yammer I had asked the question as well, and Trevor Seward had the suggestion to check the people picker settings. Combining Trevor’s suggestion along with Pauls, I compared PeoplePicker settings between a working web app and the non-working web app and found that the “broken” one had something in the ActiveDirectoryCustomFilter.

Specifically, here’s the powershell I used:

Whats really amazing is that Trevor’s reply on Yammer came from his phone, including the above PowerShell commands!

I think today there were several lessons learned:

1) I know a little more about the people picker settings

2) Reaching out to the community for help can be both engaging and rewarding.

Special thanks to everyone listed above for their help today!

Update: Trevor has a nice article on troubleshooting PeoplePicker related issues at http://thesharepointfarm.com/2014/01/people-picker-troubleshooting-tips/

ShareGate Choice List workaround Script

Update 3/6/2014

This script is no longer needed! ShareGate 4.5 was released and now copies these items natively- no workaround needed!

Hats off to the Share-Gate development team for this update!

The content below should be considered Archive/reference:

If you’re a user of Share-Gate, you may have run into an issue transferring SharePoint lists with Multiple choice fields.

The migrations work, so long as all the data in your list fits into the list of choices you have. But what if at one point the list of choices changed, and you now have data in your list that isn’t in the choice list?

The scenario is like this.
Say you have a choice field with “region”
It started out with North, South, East and West.
So you have data in the list that has those values in it.

Then someone says, “Hey lets restructure – going forward we want people to choose from: East, Central an West”

See what happened there?

The data that’s already in there could be North South East or West, but NEW data can only be West Central or East. This causes issues when the list is migrated.

The solution is to temporarily allow “Fill-in Choices:”

Field choices

 

Ideally, you would change the value “Allow ‘Fill-in’ choices:” above from “no” to “yes” before the migration, then set it back to no after the migration at the new location.

The trouble is,  a given web can have quite a few lists, and each list can have quite a few fields. Going through all of this manually, twice is not fun? What is fun you ask? I’m glad you ask – Running two PowerShell scripts…

I wrote this little PowerShell script…

Here’s what it does –

It enumerates though all the lists in your SP Web.
Then it checks each list to see if has any “Choice Fields” that are set to not allow fill-in choices.
Then it.. sets the Value to yes for “Allow Fill in Values”

How much would you pay for a timesaving script like this???

But wait! There’s more!

While it’s doing all this, its also generating a 2nd powershell script – filled with commands to turn these values back off.

Note in the above script – I needed two near identical $web objects – the reason for this is that as I went through the list in code, and did my .update() statements, it would invalidate the state of the $web object

Another upside is that the source and destination can be on different farms – just move the generated PS1 file to the destination and run it there.

If there’s a downside, it’s that this really only works for SP 2010 and SP 2013 On premise.

You can’t see a URL on your Web Front End (WFE) and neither can your Search crawler

There is a security feature in Windows that prevents certain network operations.

This will usually be a problem in two scenarios:

– You have a web front end, and on it is a website named www.mywebsite.com, if you try to access that site from ANY other computer, it works just fine, but if you try to access it from the Web Front End that’s hosting mywebsite.com, it doesn’t work!

– You have a web front end, and are trying to do search crawls of your site from the same box hosting the site. Web Crawls fail and are unable to get to the site.

There’s an easy explanation and fix for this – It’s been documented elsewhere so here are the links:

Here’s the Technet article:
http://support.microsoft.com/kb/896861

Bob Fox, MSFT PFE has an even better way of turning this off, only for specific domains:
http://blogs.technet.com/b/sharepoint_foxhole/archive/2010/06/21/disableloopbackcheck-lets-do-it-the-right-way.aspx

 

 

CRL, Loopback and things that can slow down code, and cause problems on servers with no internet access

I’ve run into a few issues in the past year or two that revolved around SSL certificates.

SSL certs usually get installed on your Web Front Ends.

As part of the whole grand scheme of how SSL’s work there is something called an CRL – a list of “bad” certificates that had to be “revoked”

If a hacker steals a Certificate for your bank and installs it on his server, and then directs traffic to his sever, your browser, all by itself, would not know the difference – this is where the CRL comes in – if the bank knows the certificate has been compromised, they can get it “Revoked” which puts it on the CRL, a sort of “black list” for bad certificates.

When the user at home opens his or her browser, part of the SSL authentication mechanism is that your browser checks the CRL list before allowing you to proceed.

This usually works just fine…

Unfortunately, with just the ‘right’ configuration, this can throw a nasty delay into network communications.

Here’s a scenario:

You have an SSL website.

Client PC’s connect to this SSL website from everywhere and everyone is happy.

A .Net developer writes some code to access content on the above server. Note here this is code running on one server (doesn’t have to be a web server) connecting to your SSL protected website on a different server.

What happens now is that the developers code “stalls” for about a minute as the server tries to access the CRL to see if the certificate presented by your SSL website has been revoked.

It stalls because the server the developer is running code on doesn’t have access to the internet (It can access your SSL website, because they are on the same internal network)

This can be an issue.

Ideally you’d fix the connectivity problem, but there may be times that’s out of your control (a demo laptop for example)

The next ideal fix would be a system wide fix like a global registry setting. This doesn’t exist.

Each app gets to decide how to deal with checking CRLS –

This article lists 14 different infrastructure pieces and how each is configured to disable CRL checking http://social.technet.microsoft.com/wiki/contents/articles/964.certificate-revocation-list-crl-verification-an-application-choice.aspx

One personal observation I’ve made is that Google Chome doesn’t seem to care about the CRL – (link) so if you think CRL is a problem on your server, and you can install Chrome, you could then compare how long IE takes to bring up your web page from scratch vs Chrome.

I’ve used the script below to turn off CRL checking for .net code:

– Jack 

ShareGate Can’t open a website – IN-017-006 The requested list or document library cannot be found. It may have been deleted.

I’ve run into this error a few times in ShareGate-

You try to open a Site Collection or web, and see this:

IN-017-006

SharePoint Designer is not much help either:

SPD

The solution, after much trial and error, was actually pretty simple!

But before I get to the solution, lets look at some things I tried that led me to the solution…

I searched the term “there are no items to show in this view” for SharePoint Designer on the internet.  This turned up a few possibilities – one being a corrupt /_vti_bin/Listdata.svc.

That wasn’t the case for me- some webs from the same site collection worked just fine – it was just one specific web that did not.

The other thing that turned up was the possibility of a corrupt list.  Often due to a missing feature needed by the list.

I turned to PowerShell and ran the following commands:

This started to return results, then errored out –

I was definitely on to something – there was something wrong with one of my lists.

Next I needed to figure out which one.

There are a few ways to do this – the easiest would likely have been to go to the website and choose “Site Actions”->”View all site content” and then just click on each one.

I didn’t do that.

Instead I mucked with PowerShell some more, I was able to tell from the error when I ran my above code which list was causing the problem.

My next step was to see if there was any data in the list.

Based on “Site Actions”->”View all site content” , it had an item count of zero, but I wasn’t sure I could trust this given the list was corrupt.

So I turned to SQL – I knew which Database it was in. (Get-SPContentDatabase -site $url will tell you this)

This listed a few results,  but the results looked like what is usually found in the Forms folder – I didn’t for example, see anything at all that looked like end user data  or documentation.

Armed with this re-assurance that the list was not needed, it seemed the easiest way forward was just to delete the list – this would be easy – or so I thought….

I didn’t have any hopes of deleting the list through the UI, (though to be clear, it did show up under “Site Actions”->”View all site content” and clicking on that threw an error)

So I again turned to PowerShell:

I saw a delete method, so I called it without the ( ) to see what it was looking for, It needed a GUID.

I ran the command shown below to get a list of ID’s from my site – in my case, the second list was the corrupt one.

Then I put that guid in the delete function as shown: and got the error as shown:

powershell_delete_list

“List does not exists”

So that didn’t work.

Next I searched the internet for more information –

I found a post that mentioned the recycle bin. Ah Ha!

I looked at “Site Actions”->”View all site content” and then the recycle bin, which of course was empty.

So I went to Site Actions -> Site Settings -> then went to “Go To Top Level Site Settings” -> then  Site Collection Administration(heading) ->Recycle Bin.

After sorting by URL, I found the corrupt list there.

First I tried restoring the list, it threw an error.

Then I tried to delete the list – that worked, and put the item in the Second stage recycle bin.

Next I went to the 2nd Stage “Deleted from end user Recycle Bin” area and deleted it from there.

Back in Site actions -> “View all site content” the list no longer showed up.

I relaunched SharePoint Designer and I am again able to bring up the list of lists and libraries.

I again tried a migration in ShareGate and it’s purring along like a kitten.

So to make my long story short, the error in ShareGate was caused by a corrupt/broken list – this same error affected SharePoint Designer (they both presumably use the same web service interface to get information from SharePoint) and to a degree, it even affected looking at the lists in PowerShell.

I suspect there is a bug somewhere, but I suppose it’s possible the user hit delete at the exact instance the application pool restarted or something like that.

At the time this happened we were running  SP2010 SP1 + June 2011 CU – if you run across a similar situation, please leave a comment with your version of SharePoint, hopefully this is addressed in SP2!

– Jack