x86/syscalls: Split the x32 syscalls into their own table
[sfrench/cifs-2.6.git] / Documentation / virt / kvm / nested-vmx.txt
1 Nested VMX
2 ==========
3
4 Overview
5 ---------
6
7 On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
8 to easily and efficiently run guest operating systems. Normally, these guests
9 *cannot* themselves be hypervisors running their own guests, because in VMX,
10 guests cannot use VMX instructions.
11
12 The "Nested VMX" feature adds this missing capability - of running guest
13 hypervisors (which use VMX) with their own nested guests. It does so by
14 allowing a guest to use VMX instructions, and correctly and efficiently
15 emulating them using the single level of VMX available in the hardware.
16
17 We describe in much greater detail the theory behind the nested VMX feature,
18 its implementation and its performance characteristics, in the OSDI 2010 paper
19 "The Turtles Project: Design and Implementation of Nested Virtualization",
20 available at:
21
22         http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
23
24
25 Terminology
26 -----------
27
28 Single-level virtualization has two levels - the host (KVM) and the guests.
29 In nested virtualization, we have three levels: The host (KVM), which we call
30 L0, the guest hypervisor, which we call L1, and its nested guest, which we
31 call L2.
32
33
34 Running nested VMX
35 ------------------
36
37 The nested VMX feature is disabled by default. It can be enabled by giving
38 the "nested=1" option to the kvm-intel module.
39
40 No modifications are required to user space (qemu). However, qemu's default
41 emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
42 explicitly enabled, by giving qemu one of the following options:
43
44      -cpu host              (emulated CPU has all features of the real CPU)
45
46      -cpu qemu64,+vmx       (add just the vmx feature to a named CPU type)
47
48
49 ABIs
50 ----
51
52 Nested VMX aims to present a standard and (eventually) fully-functional VMX
53 implementation for the a guest hypervisor to use. As such, the official
54 specification of the ABI that it provides is Intel's VMX specification,
55 namely volume 3B of their "Intel 64 and IA-32 Architectures Software
56 Developer's Manual". Not all of VMX's features are currently fully supported,
57 but the goal is to eventually support them all, starting with the VMX features
58 which are used in practice by popular hypervisors (KVM and others).
59
60 As a VMX implementation, nested VMX presents a VMCS structure to L1.
61 As mandated by the spec, other than the two fields revision_id and abort,
62 this structure is *opaque* to its user, who is not supposed to know or care
63 about its internal structure. Rather, the structure is accessed through the
64 VMREAD and VMWRITE instructions.
65 Still, for debugging purposes, KVM developers might be interested to know the
66 internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
67
68 The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we
69 also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS
70 which L0 builds to actually run L2 - how this is done is explained in the
71 aforementioned paper.
72
73 For convenience, we repeat the content of struct vmcs12 here. If the internals
74 of this structure changes, this can break live migration across KVM versions.
75 VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
76 struct shadow_vmcs is ever changed.
77
78         typedef u64 natural_width;
79         struct __packed vmcs12 {
80                 /* According to the Intel spec, a VMCS region must start with
81                  * these two user-visible fields */
82                 u32 revision_id;
83                 u32 abort;
84
85                 u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
86                 u32 padding[7]; /* room for future expansion */
87
88                 u64 io_bitmap_a;
89                 u64 io_bitmap_b;
90                 u64 msr_bitmap;
91                 u64 vm_exit_msr_store_addr;
92                 u64 vm_exit_msr_load_addr;
93                 u64 vm_entry_msr_load_addr;
94                 u64 tsc_offset;
95                 u64 virtual_apic_page_addr;
96                 u64 apic_access_addr;
97                 u64 ept_pointer;
98                 u64 guest_physical_address;
99                 u64 vmcs_link_pointer;
100                 u64 guest_ia32_debugctl;
101                 u64 guest_ia32_pat;
102                 u64 guest_ia32_efer;
103                 u64 guest_pdptr0;
104                 u64 guest_pdptr1;
105                 u64 guest_pdptr2;
106                 u64 guest_pdptr3;
107                 u64 host_ia32_pat;
108                 u64 host_ia32_efer;
109                 u64 padding64[8]; /* room for future expansion */
110                 natural_width cr0_guest_host_mask;
111                 natural_width cr4_guest_host_mask;
112                 natural_width cr0_read_shadow;
113                 natural_width cr4_read_shadow;
114                 natural_width cr3_target_value0;
115                 natural_width cr3_target_value1;
116                 natural_width cr3_target_value2;
117                 natural_width cr3_target_value3;
118                 natural_width exit_qualification;
119                 natural_width guest_linear_address;
120                 natural_width guest_cr0;
121                 natural_width guest_cr3;
122                 natural_width guest_cr4;
123                 natural_width guest_es_base;
124                 natural_width guest_cs_base;
125                 natural_width guest_ss_base;
126                 natural_width guest_ds_base;
127                 natural_width guest_fs_base;
128                 natural_width guest_gs_base;
129                 natural_width guest_ldtr_base;
130                 natural_width guest_tr_base;
131                 natural_width guest_gdtr_base;
132                 natural_width guest_idtr_base;
133                 natural_width guest_dr7;
134                 natural_width guest_rsp;
135                 natural_width guest_rip;
136                 natural_width guest_rflags;
137                 natural_width guest_pending_dbg_exceptions;
138                 natural_width guest_sysenter_esp;
139                 natural_width guest_sysenter_eip;
140                 natural_width host_cr0;
141                 natural_width host_cr3;
142                 natural_width host_cr4;
143                 natural_width host_fs_base;
144                 natural_width host_gs_base;
145                 natural_width host_tr_base;
146                 natural_width host_gdtr_base;
147                 natural_width host_idtr_base;
148                 natural_width host_ia32_sysenter_esp;
149                 natural_width host_ia32_sysenter_eip;
150                 natural_width host_rsp;
151                 natural_width host_rip;
152                 natural_width paddingl[8]; /* room for future expansion */
153                 u32 pin_based_vm_exec_control;
154                 u32 cpu_based_vm_exec_control;
155                 u32 exception_bitmap;
156                 u32 page_fault_error_code_mask;
157                 u32 page_fault_error_code_match;
158                 u32 cr3_target_count;
159                 u32 vm_exit_controls;
160                 u32 vm_exit_msr_store_count;
161                 u32 vm_exit_msr_load_count;
162                 u32 vm_entry_controls;
163                 u32 vm_entry_msr_load_count;
164                 u32 vm_entry_intr_info_field;
165                 u32 vm_entry_exception_error_code;
166                 u32 vm_entry_instruction_len;
167                 u32 tpr_threshold;
168                 u32 secondary_vm_exec_control;
169                 u32 vm_instruction_error;
170                 u32 vm_exit_reason;
171                 u32 vm_exit_intr_info;
172                 u32 vm_exit_intr_error_code;
173                 u32 idt_vectoring_info_field;
174                 u32 idt_vectoring_error_code;
175                 u32 vm_exit_instruction_len;
176                 u32 vmx_instruction_info;
177                 u32 guest_es_limit;
178                 u32 guest_cs_limit;
179                 u32 guest_ss_limit;
180                 u32 guest_ds_limit;
181                 u32 guest_fs_limit;
182                 u32 guest_gs_limit;
183                 u32 guest_ldtr_limit;
184                 u32 guest_tr_limit;
185                 u32 guest_gdtr_limit;
186                 u32 guest_idtr_limit;
187                 u32 guest_es_ar_bytes;
188                 u32 guest_cs_ar_bytes;
189                 u32 guest_ss_ar_bytes;
190                 u32 guest_ds_ar_bytes;
191                 u32 guest_fs_ar_bytes;
192                 u32 guest_gs_ar_bytes;
193                 u32 guest_ldtr_ar_bytes;
194                 u32 guest_tr_ar_bytes;
195                 u32 guest_interruptibility_info;
196                 u32 guest_activity_state;
197                 u32 guest_sysenter_cs;
198                 u32 host_ia32_sysenter_cs;
199                 u32 padding32[8]; /* room for future expansion */
200                 u16 virtual_processor_id;
201                 u16 guest_es_selector;
202                 u16 guest_cs_selector;
203                 u16 guest_ss_selector;
204                 u16 guest_ds_selector;
205                 u16 guest_fs_selector;
206                 u16 guest_gs_selector;
207                 u16 guest_ldtr_selector;
208                 u16 guest_tr_selector;
209                 u16 host_es_selector;
210                 u16 host_cs_selector;
211                 u16 host_ss_selector;
212                 u16 host_ds_selector;
213                 u16 host_fs_selector;
214                 u16 host_gs_selector;
215                 u16 host_tr_selector;
216         };
217
218
219 Authors
220 -------
221
222 These patches were written by:
223      Abel Gordon, abelg <at> il.ibm.com
224      Nadav Har'El, nyh <at> il.ibm.com
225      Orit Wasserman, oritw <at> il.ibm.com
226      Ben-Ami Yassor, benami <at> il.ibm.com
227      Muli Ben-Yehuda, muli <at> il.ibm.com
228
229 With contributions by:
230      Anthony Liguori, aliguori <at> us.ibm.com
231      Mike Day, mdday <at> us.ibm.com
232      Michael Factor, factor <at> il.ibm.com
233      Zvi Dubitzky, dubi <at> il.ibm.com
234
235 And valuable reviews by:
236      Avi Kivity, avi <at> redhat.com
237      Gleb Natapov, gleb <at> redhat.com
238      Marcelo Tosatti, mtosatti <at> redhat.com
239      Kevin Tian, kevin.tian <at> intel.com
240      and others.