{"id":321,"date":"2016-08-01T08:58:45","date_gmt":"2016-08-01T15:58:45","guid":{"rendered":"http:\/\/blog.gptnet.net\/?p=321"},"modified":"2016-12-01T11:14:09","modified_gmt":"2016-12-01T18:14:09","slug":"esxi-5-56-x-bug-hpe-cim-varrunsfcb-inode-table-of-its-ramdisk-is-full","status":"publish","type":"post","link":"https:\/\/blog.gptnet.net\/?p=321","title":{"rendered":"ESXi 5.5\/6.x bug HPE CIM &#8211; \/var\/run\/sfcb inode table of its ramdisk is full"},"content":{"rendered":"<p>Another bug from VMware\/HPE &#8211; unfortunately we don&#8217;t have public KB available at this point. As per our conversation with VMware engineer this issue affects both ESXi 5.5 and ESXi 6.x hosts.<br \/>\nI suspect VMware sfcb service fails to clear temporary files or HPE CIM providers create files which they are not suppose to.<br \/>\nI observed this issue with HPE ProLiant BL660c Gen8 blades running ESXi 5.5. These blades come with 4 CPU sockets and 1TB of ram &#8211; they are hosting VDI environment so they do have high density and a lot of power on\/off operations.<br \/>\nAs the troubleshooting options we tried updating to the latest ESXi patches, HPE drivers and software but issue was still persisting.<\/p>\n<p><strong>Scope<\/strong><br \/>\nIssue affects ESXi 5.5 and ESXi 6.x running HPE CIM providers, such as OEM HPE customized images.<\/p>\n<p><strong>Symtomps<\/strong><br \/>\nUnable to power on new VMs, vMotion fails.<br \/>\nvkernel.log shows the following errors:<br \/>\n<code>Cannot create file \/var\/run\/sfcb\/52494bef-1566-c7e5-6604-676ddd5b9c46 for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full.<br \/>\n<\/code><br \/>\nYou see alot of files inside \/var\/run\/sfcb directory<br \/>\n<a href=\"http:\/\/blog.gptnet.net\/?attachment_id=327\" rel=\"attachment wp-att-327\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-327\" src=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_2.png\" alt=\"sfcb_2\" width=\"990\" height=\"265\" srcset=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_2-300x80.png 300w, https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_2-768x206.png 768w, https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_2.png 990w\" sizes=\"auto, (max-width: 990px) 100vw, 990px\" \/><\/a><br \/>\n<a href=\"http:\/\/blog.gptnet.net\/?attachment_id=329\" rel=\"attachment wp-att-329\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-329\" src=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_4.png\" alt=\"sfcb_4\" width=\"329\" height=\"50\" srcset=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_4-300x46.png 300w, https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_4.png 329w\" sizes=\"auto, (max-width: 329px) 100vw, 329px\" \/><\/a><\/p>\n<p>Below you will find workarounds to address this issue.<br \/>\n<!--more--><br \/>\n<strong>Temporary workaround<\/strong><\/p>\n<p>1. Disable HA on the cluster to avoid alerts.<br \/>\n2. Stop SFCB by running the following command:<br \/>\n<code>\/etc\/init.d\/sfcbd-watchdog stop<\/code><br \/>\n3. Delete files inside \/var\/run\/sfcb<br \/>\nIf you get error <code>-sh: can't fork<\/code> delete files in small batches with commands such as <code>rm [0-2]*<\/code> or even more granual with <code>rm abcd*<\/code><br \/>\n<a href=\"http:\/\/blog.gptnet.net\/?attachment_id=328\" rel=\"attachment wp-att-328\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-328\" src=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_3.png\" alt=\"sfcb_3\" width=\"614\" height=\"48\" srcset=\"https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_3-300x23.png 300w, https:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_3.png 614w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/a><br \/>\n4. Start SFCB by running the following command:<br \/>\n<code>\/etc\/init.d\/sfcbd-watchdog start<\/code><br \/>\n5. Verify fs for free ionodes:<br \/>\n<code>esxcli system visorfs ramdisk list<\/code><br \/>\n6. Restart management agents<br \/>\n<code>\/etc\/init.d\/hostd restart<br \/>\n\/etc\/init.d\/vpxa restart<\/code><br \/>\nAt this point host will temporary disconnect from vCenter, so don&#8217;t panic as all VMs are still online.<\/p>\n<p><strong>Permanent workaround<\/strong><\/p>\n<p>Now to address this issue permanently i suggest implementing cron job, which clears files every hour from \/var\/run\/sfcb directory. Make sure to clear all files prior using instructions above prior to proceeding with this. Now onto our permanent solution &#8211; SSH into the host and edit <code>vi \/etc\/rc.local.d\/local.sh<\/code> file. Copy and paste the following above <code>exit 0<\/code><br \/>\n<code>#custom workaround by Naz Snidanko nsnidanko@act.bm 7\/26\/2016 to address vmware bug<br \/>\n#1. Stop cron service<br \/>\n\/bin\/kill $(cat \/var\/run\/crond.pid)<br \/>\n#2. Instert new crontab entry<br \/>\n\/bin\/echo \"0 * * * * for i in \/var\/run\/sfcb\/*; do rm -rf \\$i; done\" &gt;&gt; \/var\/spool\/cron\/crontabs\/root<br \/>\n#3. Start cron service<br \/>\n\/usr\/lib\/vmware\/busybox\/bin\/busybox crond<\/code><\/p>\n<p><a href=\"http:\/\/blog.gptnet.net\/?attachment_id=326\" rel=\"attachment wp-att-326\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-326\" src=\"http:\/\/blog.gptnet.net\/wp-content\/uploads\/2016\/08\/sfcb_1.png\" alt=\"sfcb_1\" width=\"1044\" height=\"474\" \/><\/a><\/p>\n<p>That&#8217;s it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Another bug from VMware\/HPE &#8211; unfortunately we don&#8217;t have public KB available at this point. As per our conversation with VMware engineer this issue affects both ESXi 5.5 and ESXi 6.x hosts. I suspect VMware sfcb service fails to clear &hellip; <a href=\"https:\/\/blog.gptnet.net\/?p=321\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[88,91,94,90,93,89,92],"class_list":["post-321","post","type-post","status-publish","format-standard","hentry","category-vmware","tag-varrunsfcb","tag-cannot-create-file-varrunsfcb","tag-hp-proliant-bl660c","tag-hpe-cim","tag-hpe-proliant-bl660c","tag-inode-table","tag-sfcb-cimxml-pro"],"_links":{"self":[{"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/posts\/321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=321"}],"version-history":[{"count":12,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/posts\/321\/revisions"}],"predecessor-version":[{"id":349,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=\/wp\/v2\/posts\/321\/revisions\/349"}],"wp:attachment":[{"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.gptnet.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}