Big Bad Blip

I was at lunch last week when I saw pages about a failed monitoring checks on one of our sites. My coworkers were working on CE/Vista SP6 upgrades. Though it was one upgraded yesterday. When I returned to the office, I asked about it. Exactly 24 hours to the second after checking the license in yesterday’s final start, the JMS node failed a license check four times about a minute apart. On the fourth failure, it started a shutdown of the node. Others in the cluster did as well.

Fortunately, a coworker caught it soon enough to start them again so not enough were shut down the load balancer would stop sending us traffic. Also, this was between terms so we did not have a normal work load.

Still, JMS migrated. That made Weblogic edit the config.xml and probably left the cluster in a weird state. So I set cron to shutdown the cluster at 4am, copy a known good config.xml into place, check the config with our monitor script (pages if bad), and start the cluster. That was a disaster. Various nodes failed their early The startup started the admin node, but the JMS failed to start. So I was paged about it still being down when it ought to have been running.

My 6:30 am starts failed for the same reason: bad encrypted password in boot.properties. My only idea how to fix this was a coworker had mentioned having to re-install an admin node for a security error. So I called the coworker. I explained the problem and the solution I really did not want to take. She looked at the error and thought about it some. She decided it might work to replace the boot.properties with an unencrypted version because Weblogic would encrypted it when discovered. She also suggested removing the servers directory and placing a REFRESH file which would prompt the node to download a new copy of the files it needs from the admin node.

That worked to getting the nodes to start correctly. It was fine during the normal maintenance on Friday. Looks like we are in the clear.

That afternoon I brought it up on our normal check-in call with Blackboard. An unable to find license file issue was why Blackboard pulled CE/Vista SP4. It also was a Weblogic upgrade.

from Rants, Raves, and Rhetoric v4

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s