Tuesday, February 9, 2016

Debugging Neutron Networking in XenServer [feedly]

Debugging Neutron Networking in XenServer
// Latest blog entries

One of the tasks I was assigned was to fix the code preventing XenServer with Neutron from working properly. This configuration used to work well, but the support was broken when more and more changes were made in Neutron, and the lack of a CI environment with XenServer hid the problem. I began getting XenServer with Neutron back to a working state by following the outline in the Quantum with Grizzly blog post from a few years ago. It's important to note that with the Havana release, Quantum was renamed to Neutron, and we'll use Neutron throughout this post. During my work, I needed to debug why OpenStack images were not obtaining IP addresses. This blog post covers the workflow I used, and I hope you'll find it helpful.


  • XenServer: 6.5
  • OpenStack: September 2015 master code
  • Network: ML2 plugin, OVS driver, VLAN type
  • Single Box installation

I had made some changes in the DevStack script to let XenServer with Neutron to be installed and run properly. The following are some debugging processes I followed when newly launched VMs could not get an IP from Neutron DHCP agent automatically.

Brief description of the DHCP process

When guest VMs are booting, they will try to send DHCP request broadcast message within the same network broadcast domain and then wait for a DHCP server's reply. In OpenStack Neutron, the DHCP server, or DHCP agent, is responsible for allocating IP addresses. If VMs cannot get IP addresses, our first priority is to check whether the packets from the VMs can be received by the DHCP server.




Dump traffic in Network Node

Since I used DevStack with single box installation, all OpenStack nodes reside in the same DomU (VM). Perform the following steps

1. Check namespace that DHCP agent uses

In the DevStack VM, execute:

    sudo ip netns

The output will look something like this:


Note: qdhcp-xxx is the namespace for the DHCP agent

2. Check interface DHCP agent uses for L3 packets

In the DevStack VM, execute:

    sudo ip netns exec \
qdhcp-49a623fd-c168-4f27-ad82-946bfb6df3d7 ifconfig

The results will look something like the following, and the "tapYYYY" entry is the one we care about.

    lo Link encap:Local Loopback
inet addr: Mask:
inet6 addr: ::1/128 Scope:Host
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

tap7b39ecad-81 Link encap:Ethernet HWaddr fa:16:3e:e3:46:c1
inet addr: Bcast: Mask:
inet6 addr: fe80::f816:3eff:fee3:46c1/64 Scope:Link
inet6 addr: fdff:631:9696:0:f816:3eff:fee3:46c1/64 Scope:Global
RX packets:42606 errors:0 dropped:0 overruns:0 frame:0
TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4687150 (4.6 MB) TX bytes:4867 (4.8 KB)

3. Monitor traffic flow with DHCP agent's interface tapYYY

In the DevStack VM monitor the traffic flow to the tapdisk interface by executing this command:

    sudo ip netns exec \
qdhcp-49a623fd-c168-4f27-ad82-946bfb6df3d7 \
tcpdump -i tap7b39ecad-81 -s0 -w dhcp.cap

Theoretically, when launching a new instance, you should see DHCP request and reply messages like this:

    16:29:40.710953 IP > BOOTP/DHCP, Request from fa:16:3e:f9:f6:b0 (oui Unknown), length 302
16:29:40.713625 IP > BOOTP/DHCP, Reply, length 330

Dump traffic in Compute Node

Meanwhile, you will definitely want to dump traffic in the OpenStack compute note, and with XenServer this is Dom0.

When new instance is launched, there will be a new virtual interface created named "vifX.Y". 'X' is the domain ID for the new VM and Y is the ID if the VIF defined in XAPI. Domain IDs are sequential - if the latest interface is vif20.0, the next one will most likely be vif21.0. Then you can try tcpdump -i vif21.0. Note that it may fail at first if the virtual interface hasn't been created yet, but once the virtual interface is created, you can monitor the packets. Theoretically you should see DHCP request and reply packets in Dom0; just like you see in DHCP agent side.

Note: If you cannot catch the dump packet at the instance's launching time, you can also try this using ifup eth0 after doing a login to the instance via XenCenter. "ifup eth0" will also trigger the instance to send a DHCP request.

Check DHCP request goes out of the compute node

In most case, you should see the DHCP request packets sent out from Dom0, this means that the VM itself is OK. It has sent out DHCP request message.

Note: Some images will try to send DHCP requests from time to time until it gets a response message. However, some images won't. They will only try several times, e.g. three times, and if it cannot get DHCP response it won't try again any more. In some scenarios, this will let the instance lose the chance of sending DHCP request. That's why some people on the Internet suggest changing images when launching instance cannot get an IP address via DHCP.

Check DHCP request arrives at the DHCP server side

When I was first testing, I didn't see any DHCP request from the DHCP agent side. Where the request packet go? It's possible that the packets are dropped? Then who dropped these packets? Why drop them?

If we think it a bit more, it's either L2 or L3 that dropped. With this in mind, we can begin to check one by one. For L3/L4, I don't have a firewall setup and the security group's default rule is to let all packets go through. So, I don't spent so much effort on this part. For L2, since we use OVS, I began by checking OVS rules. If you are not familiar with OVS, this can take some time. At least I spent a lot of time on it to completely understand the mechanism and the rules.

The main aim is to check all existing rules in Dom0 and DomU, and then try to find out which rule let the packets dropped.

Check OVS flow rules

OVS flow rules in Network Node

To get the port information on the network bridge "br-int" execute the following in the DevStack VM

    sudo ovs-ofctl show br-int     
    stack@DevStackOSDomU:~$ sudo ovs-ofctl show br-int
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000ba78580d604a
n_tables:254, n_buffers:256
1(int-br-eth1): addr:1a:2d:5f:48:64:47
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(tap7b39ecad-81): addr:00:00:00:00:00:00
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
3(qr-78592dd4-ec): addr:00:00:00:00:00:00
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
4(qr-55af50c7-32): addr:00:00:00:00:00:00
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-int): addr:9e:04:94:a4:95:bb
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

To get the flow rules, execute:

    sudo ovs-ofctl dump-flows br-int    
    stack@DevStackOSDomU:~$ sudo ovs-ofctl dump-flows br-int      NXST_FLOW reply (xid=0x4):        cookie=0x9bf3d60450c2ae94, duration=277625.02s, table=0, n_packets=31, n_bytes=4076, idle_age=15793, hard_age=65534, priority=3,in_port=1,dl_vlan=1041 actions=mod_vlan_vid:1,NORMAL        cookie=0x9bf3d60450c2ae94, duration=277631.928s, table=0, n_packets=2, n_bytes=180, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop        cookie=0x9bf3d60450c2ae94, duration=277632.116s, table=0, n_packets=42782, n_bytes=4706099, idle_age=1, hard_age=65534, priority=0 actions=NORMAL        cookie=0x9bf3d60450c2ae94, duration=277632.103s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop        cookie=0x9bf3d60450c2ae94, duration=277632.09s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop

These rules in DomU look normal without concerns, so let's go on with Dom0 and try to find more.

OVS flow rules in Compute Node

Looking at the traffic flow in picture 1, the traffic direction from VM to DHCP server is xapiX->xapiY(Dom0), then ->br-eth1->br-int(DomU). So, maybe some rules filtered the packets at the layer 2 level by OVS. While I do suspect xapiY, I cannot provide any specific reasons why.

To determine the xapiY in your environment, execute:

    xe network-list

In the results, look for the "bridge" which matches the name-label for your network. In our case, it was xapi3, so to determine the port information, execute:

   ovs-ofctl show xapi3 get port information    
   [root@rbobo ~]# ovs-ofctl show xapi3     OFPT_FEATURES_REPLY (xid=0x2): dpid:00008ec00170b013     n_tables:254, n_buffers:256     capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP     actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE      1(vif15.1): addr:fe:ff:ff:ff:ff:ff        config:     0        state:      0        speed: 0 Mbps now, 0 Mbps max      2(phy-xapi3): addr:d6:37:17:1d:01:ee        config:     0        state:      0        speed: 0 Mbps now, 0 Mbps max      LOCAL(xapi3): addr:5a:46:65:a2:3b:4f        config:     0        state:      0        speed: 0 Mbps now, 0 Mbps max     OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

Execute ovs-ofctl dump-flows xapi3 to get flow rules

  [root@rbobo ~]# ovs-ofctl dump-flows xapi3    NXST_FLOW reply (xid=0x4):      cookie=0x0, duration=278700.004s, table=0, n_packets=42917, n_bytes=4836933, idle_age=0, hard_age=65534, priority=0 actions=NORMAL      cookie=0x0, duration=276117.558s, table=0, n_packets=31, n_bytes=3976, idle_age=16859, hard_age=65534, priority=4,in_port=2,dl_vlan=1 actions=mod_vlan_vid:1041,NORMAL      cookie=0x0, duration=278694.945s, table=0, n_packets=7, n_bytes=799, idle_age=65534, hard_age=65534, priority=2,in_port=2 actions=drop

Please pay attention to port 2(phy-xapi3), it has two specific rules:

  • The higher priority=4 will be matched first. If the dl_vlan=1, it will modify the tag and then with normal process, which will let the flow through
  • The lower priority=2 will be matched second, and it will drop the flow. So, will the flows be dropped? If the flow doesn't have dl_vlan=1, it will be definitely be dropped.


(1) For dl_vlan=1, this is the virtual LAN tag id which corresponding to the Port tag
(2) I didn't realize the problem was a missing tag for the new launched instance for a long time due to my lack of OVS understanding. Thus I didn't have know to check the port's tag first. So next time when we meet this problem, we can check these part first.

With this question, I checked the new launched instance's port information, ran command ovs-vsctl show in Dom0, you can get results like these:

Bridge "xapi5"  fail_mode: secure  Port "xapi5"  Interface "xapi5"  type: internal  Port "vif16.0"  Interface "vif16.0"  Port "int-xapi3"  Interface "int-xapi3"  type: patch  options: {peer="phy-xapi3"}


For port vif16.0, it really doesn't have tag with value 1, so the flow will be unconditionally dropped.

Note: When launching a new instance under XenServer, it will have a virtual network interface named vifx.0, and from OVS's point of view, it will also create a port and bind that interface correspondingly.
Check why tag is not set

The next step is to find out why the newly launched instance don't have a tag in OVS. There is no obvious findings for new comers like me. Just read the code over and over and make assumptions and test and so forth. But after trying various ideas, I did find that each time when I restart neutron-openvswitch-agent(q-agt) in the Compute Node, the VM can get IP if I execute ifup eth0 command. So, there must be something which is done when q-agt restarts and is not done when launching a new instance. With this information, I can focus my code inspection. Finally I found that, with XenServer, when a new instance is launched, q-agt cannot detect the newly added port and it will not add a tag to the corresponding port.

That then left the question of why q-agt cannot detect port changes. We have a session from DomU to Dom0 to monitor port changes, which seems not to work as we expect. With this in mind, I first ran command ovsdb-client monitor Interface name,ofport in Dom0, which produces output like this:

   [root@rbobo ~]# ovsdb-client monitor Interface name,ofport      row                                  action  name        ofport      ------------------------------------ ------- ----------- ------      54bcda61-de64-4d0e-a1c8-d339a2cabb50 initial "eth1"      1           987be636-b352-47a3-a570-8118b59c7bbc initial "xapi3"     65534       bb6a4f70-9f9c-4362-9397-010760f85a06 initial "xapi5"     65534       9ddff368-0be5-4f23-a03c-7940543d0ccc initial "vif15.2"   1           ba3af0f5-e8ed-4bdb-8c3d-67a638b81091 initial "phy-xapi3" 2           b57284cf-1dcd-4a10-bee1-42516afe2573 initial "eth0"      1           38a0dd37-173f-421c-9aba-3e03a5b8c900 initial "vif16.0"   2           58b83fe4-5f33-40f3-9dd9-d5d4b3f25981 initial "xenbr0"    65534       6c792964-3930-477c-bafa-5415259dea96 initial "int-xapi3" 1           caa52d63-59ed-4917-9ec3-1ea957470d5e initial "vif15.1"   1           d8805d05-bbd2-40cb-b219-eb9177c217dc initial "vif15.0"   6           8131dcd2-69ea-401a-a65e-4d4a17203e0c initial "xapi4"     65534       086e6e3a-1ab2-469f-9604-56bbd4c2fe86 initial "xenbr1"    65534 

Then I launched a new instance try to find whether OVS monitor can give new output for the new launched instance, and I do get outputs like:

           row                       action name      ofport  ------------------------------------ ------ --------- ------  249c424a-4c9a-47b4-991a-bded9ec63ada insert "vif17.0" []        row                                  action name      ofport  ------------------------------------ ------ --------- ------  249c424a-4c9a-47b4-991a-bded9ec63ada old              []       new    "vif17.0" 3    

So, this means the OVS monitor itself works well! There maybe other errors with the code that makes the monitoring. Seems I'm getting closer to the the root cause :)

Finally, I found that with XenServer, our current implementation cannot read the OVS monitor's output, and thus q-agt doesn't know there is a new port added. But lucky enough, L2 Agent provides another way of getting the port changes, and thus we can use that way instead.

Setting minimize_polling=false in the L2 agent's configuration file ensures the Agent does not rely on ovsdb-client monitor, which means that the port will be identified and the tag gets added properly!

In this case, this is all that was needed to get an IP address and everything else worked normally. I hope the process I went through in debugging this problem will be beneficial to others.     

Read More

Shared via my feedly reader

Sent from my iPad