Thursday, February 26, 2015

Bento Box Update for CentOS and Fedora [feedly]

Bento Box Update for CentOS and Fedora
// Chef Blog

This is not urgent, but you may encounter SSL verification errors when using vagrant directly, or vagrant through test kitchen.

Special Thanks to Joe Damato of Package Cloud for spending his time debugging this issue with me the other day.

TL;DR, We found a bug in our bento boxes where the SSL certificates for AWS S3 couldn't be verified by openssl and yum on our CentOS 5.11, CentOS 6.6, and Fedora 21 "bento" boxes because the VeriSign certificates were removed by the upstream curl project. Update your local boxes. First remove the boxes with vagrant box remove, then rerun test kitchen or vagrant in your project.

We publish Chef Server 12 packages to a great hosted package repository provider, Package Cloud. They provide secure, properly configured yum and apt repositories with SSL, GPG, and all the encrypted bits you can eat. In testing the chef-server cookbook for consuming packages from Package Cloud, I discovered a problem with our bento-built base boxes for CentOS 5.11, and 6.6.

[2015-02-25T19:54:49+00:00] ERROR: chef_server_ingredient[chef-server-core] (chef-server::default line 18) had an error: Mixlib::ShellOut::ShellCommandFailed: packagecloud_repo[chef/stable/] (/tmp/kitchen/cache/cookbooks/chef-server-ingredient/libraries/chef_server_ingredients_provider.rb line 44) had an error: Mixlib::ShellOut::ShellCommandFailed: execute[yum-makecache-chef_stable_] (/tmp/kitchen/cache/cookbooks/packagecloud/providers/repo.rb line 109) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'  ---- Begin output of yum -q makecache -y --disablerepo=* --enablerepo=chef_stable_ ----  ...SNIP    File "/usr/lib64/python2.4/", line 565, in http_error_302  ...SNIP    File "/usr/lib64/python2.4/site-packages/M2Crypto/SSL/", line 167, in connect_ssl      return m2.ssl_connect(self.ssl, self._timeout)  M2Crypto.SSL.SSLError: certificate verify failed  

What's going on here?

We're attempting to add the Package Cloud repository configuration and rebuild the yum cache for it. Here is the yum configuration:

[chef_stable_]  name=chef_stable_  baseurl=$basearch  repo_gpgcheck=0  gpgcheck=0  enabled=1  gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-packagecloud_io  sslverify=1  sslcacert=/etc/pki/tls/certs/ca-bundle.crt  

Note that the baseurl is https – most package repositories probably aren't going to run into this because most use http. The thing is, despite Package Cloud having a valid SSL certificate, we're getting a verification failure in the certificate chain. Let's look at this with OpenSSL:

$ openssl s_client -CAfile /etc/pki/tls/certs/ca-bundle.crt -connect  CONNECTED(00000003)  depth=3 /C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root  verify return:1  depth=2 /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Certification Authority  verify return:1  depth=1 /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA  verify return:1  depth=0 /OU=Domain Control Validated/OU=EssentialSSL/  verify return:1  ... SNIP  SSL-Session:      Verify return code: 0 (ok)  

Okay, that looks fine, why is it failing when yum runs? The key is in the python stack trace from yum:

File "/usr/lib64/python2.4/", line 565, in http_error_302  

Package Cloud actually stores the packages in S3, so it redirects to the bucket, Let's check that certificate with openssl:

$ openssl s_client -CAfile /etc/pki/tls/certs/ca-bundle.crt -connect  depth=2 /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5  verify error:num=20:unable to get local issuer certificate  verify return:0  ---  Certificate chain   0 s:/C=US/ST=Washington/L=Seattle/ Inc./CN=*     i:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at (c)10/CN=VeriSign Class 3 Secure Server CA - G3   1 s:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at (c)10/CN=VeriSign Class 3 Secure Server CA - G3     i:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5   2 s:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5     i:/C=US/O=VeriSign, Inc./OU=Class 3 Public Primary Certification Authority  ...SNIP  SSL-Session:      Verify return code: 20 (unable to get local issuer certificate)  

This is getting to the root of the matter as to why yum was failing. Why is it failing though?

As it turns out, the latest CA certificate bundle from the curl project appears to have removed two of the Versign certificates, which are used by AWS for

But wait, why does this matter? Shouldn't CentOS have the ca-bundle.crt file that comes from the openssl package?

$ rpm -qf ca-bundle.crt  openssl-0.9.8e-27.el5_10.4  

Sure enough. What happened?

$ sudo rpm -V openssl  S.5....T  c /etc/pki/tls/certs/ca-bundle.crt  

Wait a second, why is the file different? Well this is where we get back to the TL;DR. In our bento boxes for CentOS, we had a line in the ks.cfg that looked like this:

wget -O/etc/pki/tls/certs/ca-bundle.crt  

I say past tense because we've since removed this from the ks.cfg on the affected platforms and rebuilt the boxes. This issue was particularly perplexing at first because the problem didn't happen on our CentOS 5.10 box. The point in time when that box was built, the cacert.pem bundle had the VeriSign certificates, but they were removed when we retrieved the cacert.pem for 5.11 and 6.6 base boxes.

Why were we retrieving the bundle in the first place? It's hard to say – that wget line has always been in the ks.cfg for the bento repository. At some point in time it might have been to work around invalid certificates being present in the default package from the distribution, or some other problem. The important thing is that the distribution's package has working certificates, and we want to use that.

So what do you need to do? You should remove your opscode-centos vagrant boxes, and re-add them. You can do this:

for i in opscode-centos-5.10 opscode-centos-5.11 opscode-centos-6.6 opscode-centos-7.0 opscode-fedora-20  do  vagrant box remove $i  done  

Then wherever you're using those boxes in your own projects – cookbooks with test-kitchen for example, you can simple rerun test kitchen and it will download the updated boxes.

If you'd like to first check if your base boxes are affected, you can use the test-cacert cookbook. With ChefDK 0.4.0:

% git clone test-cacert  % cd test-cacert  % kitchen test default  


Shared via my feedly reader

Sent from my iPhone