Wednesday, 19 May 2010

The search service instance on this server is not online

I came across this tickler today:

Problem:
1) I configured the Office Server Search Service (index role) on an application server without incident
2) I ran the following command on both WFE servers:
Stsadm –o osearch –action start –f –role query -propagationlocation D:\Index
3) This returned the error
The search service instance on this server is not online
4) The Windows or ULS logs showed no useful info. Based on various blog posts I did the following on WFE1 (not necessarily in this order):
a. Added the –farmcontactemail, -farmperformancelevel, -farmserviceaccount and –farmservicepassword switches in various combinations (although I knew none of these should be necessary as all this info had been populated when I created the index)
b. Added the search service account to the local administrators group (even though I knew this should not be necessary, I found a couple of bogus references to this in blogs)
c. Changed the search service account to the farm admin account
d. Created the SSP (hadn’t been created prior to first attempts)
e. Checked that I could access the search admin web service at
https://[indexserver]:56738/[sspname]/Search/SearchAdmin.asmx
f. Started the SPSearch service (previously stopped, not that this should have any impact)
g. Deleted D:\Index, which must have been created by one of the previous failed attempts
h. Tried starting the query service via Central Admin - this didn't display any errors but the service did not start

i. Rebooted WFE1
5) I eventually found that I was able to start the query service on WFE2 by omitting the –propagationlocation switch. At least I assume it was because I omitted the switch, but it may have been due to one of the changes above (creation of the SSP perhaps).
6) I tried running the command without –propagationlocation on WFE1 and this returned a new error
Method failed with unexpected error code 3
7) After a few more attempts I removed WFE1 from the farm and rejoined it. Even this did not resolve the error. I did however notice that during the rejoin (which was scripted), the command psconfig -cmd secureresources produced an error because D:\Index did not exist. Now at this point WFE1 should have been a fresh server in the farm and I had not tried to start the search service, so why was it trying to secure the index location?

Solution:
The solution was to rebuild an empty shell of the index file location directory structure (D:\Index\Office Server\Config) in Windows Explorer on WFE1, and re-run psconfig -cmd secureresources. Once this was done the query service could be started and hey presto!

I don’t know why the original problem occurred, most likely because the very first attempt failed and the service got into a neither stopped nor started state. It may be that I should not have deleted D:\index during my troubleshooting, but I found a blog post recommending just that!


In any case, a new server joined to the farm should not be looking for the index file location unless and until the search service is started, so that part feels like a bug to me.

Monday, 17 May 2010

Service stuck in Starting/Stopping state


I've seen this scenario so many times - you go to start or stop a service in Central Administration (Operations tab > Services on Server) and you wait... and you wait... but the service just sits there saying Starting or Stopping. It appears from Central Admin that there is no way to resolve this as you can't re-initiate the start/stop command through the UI.

This is where STSADM comes to the rescue, and the very handy ProvisionService operation. Details on this can be found here http://technet.microsoft.com/en-us/library/cc262031(office.12).aspx

This command allows you to re-initiate the start/stop command and in most cases will fix this issue for you. It's also very handy for scripting a farm build. The syntax is

stsadm -o provisionservice -action [start/stop] -servicename [name] -servicetype [type]

The values you're expected to enter for servicename and servicetype are not immediately obvious - if you run stsadm -o enumservices you can see all the details but they're not exactly readable, so I've laid these out in the table below. Where the ServiceName column is null, you don't need to include that parameter and you can just use -servicetype. Where both servicename and servicetype exist, you must include both.