Alert: ESXi 5.5 Update 3 bug – Deleting VM snapshot crashes VM

Removing snapshot causes Unexpected signal: 11. Most of the backup software relies on creating and removing snapshots.

Bug is not officially confirmed by VMWare. No fix, the only solution is to roll back to ESXi 5.5 update 2.
See the following post for more details:

https://communities.vmware.com/thread/520379

Update 1/10/2015 14:30 EST: VMware released kb http://kb.vmware.com/kb/2133118

Update 7/10/2015 15:30 EST: VMware released kb http://kb.vmware.com/kb/2133825

Posted in Random stuff | Leave a comment

HP Proliant BL460c Gen9 HBA bug

There is a bug with HP FlexFabric 650FLB adapters on HP Gen 9 blades. Buggy firmware prevents HBA from properly negotiating FC protocols. HBA will fail to initiate the PLOGI (Port Login) process. Symptoms on Brocade FC switch will show FC4 type as “none” and switch would fail to detect it as initiator.
If you have Brocade switch it can be confirmed via portloginshow # command, where # is the port number where blade is connected.

fctest:admin>portloginshow 1
Type PID World Wide Name credit df_sz cos
=====================================================
fd 01153b xx:xx:xx:xx:xx:c2:86:04 16 2112 c scr=0x3
fd 011537 xx:xx:xx:xx:xx:c2:86:14 16 2112 c scr=0x3
ff 01153b xx:xx:xx:xx:xx:c2:86:04 0 0 8 d_id=FFFFFC

d=id=FFFFC will be missing for faulty HBA. I the example above HBA ending with c2:86:14 has frmware with the bug.

Issue is confirmed with 10.5.65.21 (latest available on HP website)
Resolution: update firmware to 10.5.65.23

You can download firmware below (it’s not available via HP website yet)
OneConnect-Flash-10.5.65.23

Update: HP issued advisory:

Environment
FACT:HP ProLiant BL460c Gen9 Server
FACT:HP FlexFabric 20Gb 2-port 650M Adapter
FACT:HP FlexFabric 20Gb 2-port 650FLB Adapter
Questions/Symptoms
SYMPTOM: Storage path will disappear after the Firmware upgrade to 10.5.65.21
SYMPTOM: Problem is seen with VIrtual Connect Manager and OneView enviornments
SYMPTOM: Storage path may disappear after 650FLB Firmware upgrade to 10.5.65.21
SYMPTOM:P roblem is seen with Virtual Connect FLexfabric 10/24 and 20/40 modules
 
Upgrading with latest firmware (10.5.65.21) on 650 FLB may cause the path to the storage to disappear. This issue may occur when using the latest SPP Version 2015.06.0.
Cause
CAUSE: This issue only occurs because of the 650FLB firmware version 10.5.65.21
Answer/Solution
FIX: This issue is currently under investigation
As Workaround downgrade the firmware to 10.2.477.23 using (SPP) Version 2015.04.0 Or Reduce the uplink to 1 per Virtual connect SAN fabric or OneView Fibre Channel Uplink set.
Posted in Random stuff | Tagged , , , , , , | 3 Comments

3PAR remote syslog

To view current config for remote syslog
showsys -param
cli% showsys -param
System parameters from configured settings

——Parameter—— –Value—
RawSpaceAlertFC : 0
RawSpaceAlertNL : 0
RawSpaceAlertSSD : 0
RemoteSyslog : 0
RemoteSyslogHost : 0.0.0.0
SparingAlgorithm : Default
EventLogSize : 3M
VVRetentionTimeMax : 336 Hours
UpgradeNote :
PortFailoverEnabled : yes
AutoExportAfterReboot : yes
AllowR5OnNLDrives : no
AllowR0 : no
ThermalShutdown : yes
FailoverMatchedSet : no

To configure remote syslog:
setsys RemoteSyslogHost 1.1.1.1 where 1.1.1.1 is the ip of remote syslog
setsys RemoteSyslog 1

and that’s it!

Posted in 3par | Tagged , , | 2 Comments

3PAR real free space

Today browsing one of my favorite 3PAR related websites (3parug.com) I came across topic asking for a “real” free space. I assume someone is trying to find out how more of the actual data he/she can fit before running out of space.

Before we answer this question lets take a look at different “layers” of free space.
1. Physical Drive space
Let’s take for example 900GB FC drive. Inside 3PAR MC it will report as Total Capacity of 819GB. On the other hand 900GB SSD will report Total Capacity 852GB.
Note: I don’t have information (formula) on how Total Capacity derived from capacity reported by HD manufactures.

Now let’s take a look what is used within Total Capacity You can view it by issuing showpd -space command.
showpd_space
– Size – total size described above
– Volume – how much space is actually used by Volumes
– Spare – space used by spare chunklets
– Free – space available for Volumes
Now let’s look at MC:
3par_pd
Total Capacity = Size
Free Capacity = Free
Allocated Capacity = Volume + Spare

2. CPG space
In order to “use” PD space described above you need to assign drive to one CPG. CPG creates underlaying RAID from chunklets (1GB in size). So for example CPG with 5 – Data 1 – Parity will consume 6 GB of Free Capacity on physical drives for each 5GB of data.

In 3PAR MC you can view remaining free space:
cpg

Estimated Free System Space should give a good indication on how much “real” free space (after RAID parity) remains on your 3PAR for a given CPG.

Please remember with Thin Volumes you can over-provision space as 3PAR’s ASIC removes all “zeros” on the fly.

Posted in 3par | Tagged , | 3 Comments

VMware Site Recovery Manager 5.8 command on SRM with Powershell bug

Hello, we have another VMware Site Recovery Manager (SRM) bug. This time it’s with Command on SRM server and Powershell scripts.

I am not SRM developer but it seems SRM itself parses commands before passing it to Windows OS for execution. Sometimes it causes issues.
Let’s take a look at this line:
c:\windows\system32\windowspowershell\v1.0\powershell.exe -Command "(Invoke-Command -ComputerName REMOTEPC -FilePath "C:\SRM\test1.ps1")"

In the example above we are executing Powershell script on remote host (REMOTEPC). Everything looks standard and it works if you run it directly in the Windows Operating System.

The same script (test1.ps1) will fail to execute when we call to execute via SRM. Let’s take a look at SRM’s vmware-dr log:
2014-12-01T09:24:20.348-05:00 [00884 info 'Recovery' ctxID=39eff996 opID=3913b17c] [recovery-plan-1036482.beforePrepareStorage-0] Executing command c:\windows\system32\windowspowershell\v1.0\powershell.exe -Command "(Invoke-Command -ComputerName REMOTEPC -FilePath "C:\SRM\test1.ps1")"
2014-12-01T09:24:20.348-05:00 [00884 verbose 'Recovery' ctxID=39eff996 opID=3913b17c] COMMAND LINE ENVIRONMENT SETTINGS::
<----cut---->
2014-12-01T09:24:20.348-05:00 [00884 verbose 'SysCommandWin32' ctxID=39eff996 opID=3913b17c] Starting process: "c:\windows\system32\windowspowershell\v1.0\powershell.exe" -Command "(Invoke-Command -ComputerName REMOTEPC -FilePath C:\SRM\test1.ps1\")"

As you can see SRM messes up double quotes C:\SRM\test1.ps1\", thus making command invalid.

If we format this command sightly different (removed wrapping double quotes for C:\SRM\test1.ps1)
c:\windows\system32\windowspowershell\v1.0\powershell.exe -Command "(Invoke-Command -ComputerName REMOTEPC -FilePath C:\SRM\test1.ps1)"
script executes flawlessly from both SRM and natively Windows OS.

Workaround
Replace space in file path’s name (think about DOS days haha) and remove double quotes wrapping.

Conclusion
My ticket is still open with VMware and engineering team is currently investigating. You will be affected by this bug if your script’s file path name contains spaces. You need to wrap it with quotes.

Update: 14/05/2015
VMware published internal KB 2116057

Posted in VMware | Tagged , , , , , | Leave a comment

VMware Site Recovery Manager 5.8 Bug – Linked Mode

There is a “well known” bug in VMware Site Recovery Manager 5.8 (SRM), which puts your DR plan at risk. It will only affect if you have vCenters connected in Linked mode. Well, let me put it this way: when you have Site Disaster – you will not meet your RTO.

Luckily for us we caught this bug during our latest DR testing.

If you have two vCenters in Linked mode and would like to confirm this bug please bring down vCenter in you Production site down, log into vCenter at DR site and try to run recovery. You will see this:
SRM 5.8 bug

Additionally in SRM log you will see the following errors:
2014-11-29T10:01:05.750-05:00 [03060 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to fqdn-prodvcenter:80>; cnx: (null), error: class Vmacore::Http::HttpException(HTTP error response: Service Unavailable)
Workaround
VMware Engineer confirmed this bug and said currently they don’t have a fix. Removing Linked mode between vCenters is the workaround.

1. On the recovery site vCenter Server, point to Start -> All Programs -> vCenter Server Linked Mode Configuration.
2. Click Next, select Modify Linked Mode configuration and click Next.
3. Ensure that the checkbox Isolate this vCenter Server instance from Linked Mode group is selected and click Next.
4. Click Continue to isolate the vCenter Server.
5. When the wizard has completed, check that the Site Recovery Manager service is still running and start it if necessary.

Conclusion
It seems VMware under a lot of pressure from Microsoft to shorten release cycle for their products. I can’t believe QA team missed such a huge bug.

Update: VMWare published KB
kb.vmware.com/kb/2093902

Posted in VMware | Tagged , , , | 1 Comment

VMware Site Recovery Manager remote Powershell script

You have a PowerShell script designed to run locally, but you need to execute it on remote computer via another product such as VMware Site Recovery Manager . You need to keep all your scripts in central repository but you need to execute them on multiple servers. Does this sound familiar? Here’s quick and nice trick:

c:\windows\system32\windowspowershell\v1.0\powershell.exe -Command "(Invoke-Command -ComputerName TargetPC -FilePath C:\Script\LocalScript.ps1)"

TargetPC – name of the remote computer
C:\Script\LocalScript.ps1 – path of the script on local computer form which you’re running this command

What is required in domain environment?
– on Remote server enable Windows Remote Management (winrm quickconfig)
– on Remote server make sure account (under which you’re executing Powershell command above) has correct permissions (i.e is a member local Administrator group)

Posted in Powershell, VMware | Tagged , , , | Leave a comment

HP 3PAR Layout Grid Manager

Chunklet is not a secret for everyone involved with 3PAR Inform OS. In short, a chunklet is a 1GB block of data. By default a 3PAR array will devide all physical disks in multiple chunklets of 1GB.

Have you ever wanted to visualize how your chucklets, which are part of the same virtual volume, or CPG is spread across array? It’s very easy with IMC 4.6.

You need to enable Layout Grid.

3par_1

And you should see data properly balanced:
3par_2

Posted in Random stuff | Tagged , | Leave a comment

3par Powershell Example

HP has amazing product called 3par but it’s definitely lucking some features on add-on software side.

One of the products HP offers is called HP 3PAR Recovery Manager Software for Microsoft SQL Server (RM-SQL). Purpose of this software is to provide rapid on-site “backup” and not DR (in my experience majority of folks in HP don’t understand what this product does). I am not going into details how this software works but to give you a brief summary: it uses 3PAR hardware VSS provider to create application consistent snapshots and replicates them to another remote 3PAR SAN or stores them on 3PAR SAN locally.

 

You might think it does achieve objectives of DR but don’t be fooled. You are unable to restore replicated snapshots to a DR SQL server – the can only be restored to the same SQL server (you need to zone the same server to both 3PAR systems). Silly if you ask me as it seems to be solo software restriction.

 

Our objective for this exercise is DR for physical SQL servers hosted on 3PAR SAN. Additionally, we need to achieve low RTO and RPO. I am not describing on how to configure HP 3PAR RM-SQL to get data across to DR site. Continue reading

Posted in 3par, Powershell | Tagged , , , | 2 Comments

HP Broadcom-Based Network adapters firmware 7.8.21 fix – c04258318

“HP has released a very serious customer advisory saying that some Broadcom Nics which are used in G2-G6 servers and blades could be killed by a firmware update component in their HP Service Pack for Proliant 2014.02.

Using HPSUM, HP SPP or Smart Components for VMware to update the “Comprehensive Configuration Management” (CCM) firmware version to 7.8.21 can kill the nics which would require a hardware swap out to fix!”

You can find advisory here:
HP Broadcom-Based Network Adapters – Updating Comprehensive Configuration Management Firmware Version 7.8.21 on Certain HP Broadcom-Based Network Adapters May Result in the Network Adapter becoming Nonfunctional

If your server is out of warranty or you just don’t want to wait for hardware swap please read below. You need access to the server via remote management such as KVM or iLO. Of course, if you’re lucky you can perform this operation locally. In this post I will demonstrate how to recover BL460c G1 blade with bricked HP NC373i Multifunction Gigabit Server Adapter.

I will be recovering NIC remotely using iLO.
Continue reading

Posted in Random stuff | Tagged , , , , , , | 5 Comments