Posts on Security, Cloud, DevOps, Citrix, VMware and others.
Words and views are my own and do not reflect on my companies views.
Disclaimer: some of the links on this site are affiliate links, if you click on them and make a purchase, I make a commission.
Fortinet has released security updates to address a critical security flaw impacting FortiSwitch that could permit an attacker to make unauthorized password changes.
The vulnerability, tracked as CVE-2024-48887, carries a CVSS score of 9.3 out of a maximum of 10.0.
"An unverified password change vulnerability [CWE-620] in FortiSwitch GUI may allow a remote unauthenticated attacker to modify admin passwords via a specially crafted request," Fortinet said in an advisory released today.
The shortcoming impacts the following versions -
FortiSwitch 7.6.0 (Upgrade to 7.6.1 or above)
FortiSwitch 7.4.0 through 7.4.4 (Upgrade to 7.4.5 or above)
FortiSwitch 7.2.0 through 7.2.8 (Upgrade to 7.2.9 or above)
FortiSwitch 7.0.0 through 7.0.10 (Upgrade to 7.0.11 or above), and
FortiSwitch 6.4.0 through 6.4.14 (Upgrade to 6.4.15 or above)
The network security company said the security hole was internally discovered and reported by Daniel Rozeboom of the FortiSwitch web UI development team.
As workarounds, Fortinet recommends disabling HTTP/HTTPS access from administrative interfaces and restricting access to the system to only trusted hosts.
While there is no evidence that the vulnerability has been exploited, a number of security flaws affecting Fortinet products have been weaponized by threat actors, making it essential that users move quickly to apply the patches.
Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.
from The Hacker News https://ift.tt/aGdC8OL
via IFTTT
Cybersecurity researchers have disclosed details of a now-patched security flaw in the Amazon EC2 Simple Systems Manager (SSM) Agent that, if successfully exploited, could permit an attacker to achieve privilege escalation and code execution.
The vulnerability could permit an attacker to create directories in unintended locations on the filesystem, execute arbitrary scripts with root privileges, and likely escalate privileges or perform malicious activities by writing files to sensitive areas of the system, Cymulate said in a report shared with The Hacker News.
Amazon SSM Agent is a component of Amazon Web Services (AWS) that enables administrators to remotely manage, configure, and execute commands on EC2 instances and on-premises servers.
The software processes commands and tasks defined in SSM Documents, which can include one or more plugins, each of which is responsible for carrying out specific tasks, such as running shell scripts or automating deployment or configuration-related activities.
What's more, the SSM Agent dynamically creates directories and files based on the plugin specifications, typically relying on plugin IDs as part of the directory structure. This also introduces a security risk in that improper validation of these plugin IDs can lead to potential vulnerabilities.
The discovery by Cymulate is a path traversal flaw arising as a result of improper validation of plugin IDs, which could allow attackers to manipulate the filesystem and execute arbitrary code with elevated privileges. The issue is rooted in a function named "ValidatePluginId" within pluginutil.go.
"This function fails to properly sanitize input, allowing attackers to supply malicious plugin IDs containing path traversal sequences (e.g., ../)," security researcher Elad Beber said.
As a result of this flaw, an attacker could essentially furnish a specially crafted plugin ID when creating an SSM document (e.g., ../../../../../../malicious_directory) to execute arbitrary commands or scripts on the underlying file system, paving the way for privilege escalation and other post-exploitation actions.
Following responsible disclosure on February 12, 2025, the vulnerability was addressed on March 5, 2025, with the release of Amazon SSM Agent version 3.3.1957.0.
"Add and use BuildSafePath method to prevent path traversal in the orchestration directory," according to release notes shared by the project maintainers on GitHub.
Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.
from The Hacker News https://ift.tt/XqRjuUE
via IFTTT
Threat actors have been observed distributing malicious payloads such as cryptocurrency miner and clipper malware via SourceForge, a popular software hosting service, under the guise of cracked versions of legitimate applications like Microsoft Office.
"One such project, officepackage, on the main website sourceforge.net, appears harmless enough, containing Microsoft Office add-ins copied from a legitimate GitHub project," Kaspersky said in a report published today. "The description and contents of officepackage provided below were also taken from GitHub."
While every project created on sourceforge.net gets assigned a "<project>.sourceforge.io" domain name, the Russian cybersecurity company found that the domain for officepackage, "officepackage.sourceforge[.]io," displays a long list of Microsoft Office applications and corresponding links to download them in Russian.
On top of that, hovering over the download button reveals a seemingly legitimate URL in the browser status bar: "loading.sourceforge[.]io/download, giving the impression that the download link is associated with SourceForge. However, clicking on the link redirects the user to a completely different page hosted on "taplink[.]cc" that prominently displays another Download button.
Should victims click on the download button, they are served a 7 MB ZIP archive ("vinstaller.zip"), which, when opened, contains a second password-protected archive ("installer.zip") and a text file with the password to open the file.
Present within the new ZIP file is an MSI installer that's responsible for creating several files, a console archive utility called "UnRAR.exe," a RAR archive, and a Visual Basic (VB) script.
"The VB script runs a PowerShell interpreter to download and execute a batch file, confvk, from GitHub," Kaspersky said. "This file contains the password for the RAR archive. It also unpacks malicious files and runs the next-stage script."
The batch file is also designed to run two PowerShell scripts, one of which sends system metadata using the Telegram API. The other file downloads another batch script that then acts on the contents of the RAR archive, ultimately launching the miner and clipper malware (aka ClipBanker) payloads.
Also dropped is the netcat executable ("ShellExperienceHost.exe") that establishes an encrypted connection with a remote server. That's not all. The confvk batch file has been found to create another file named "ErrorHandler.cmd" that contains a PowerShell script programmed to retrieve and execute a text string through the Telegram API.
The fact that the website has a Russian interface indicates a focus on Russian-speaking users. Telemetry data shows that 90% of potential victims are in Russia, with 4,604 users encountering the scheme between early January and late March.
With the sourceforge[.]io pages indexed by search engines and appearing in search results, it's believed that Russian users searching for Microsoft Office on Yandex are likely the target of the campaign.
"As users seek ways to download applications outside official sources, attackers offer their own," Kaspersky said. "While the attack primarily targets cryptocurrency by deploying a miner and ClipBanker, the attackers could sell system access to more dangerous actors."
The disclosure comes as the company revealed details of a campaign that's distributing a malware downloader called TookPS via fraudulent sites impersonating the DeepSeek artificial intelligence (AI) chatbot, as well as remote desktop and 3D modeling software.
This includes websites like deepseek-ai-soft[.]com, to which unsuspecting users are redirected to via sponsored Google search results, per Malwarebytes.
TookPS is engineered to download and execute PowerShell scripts that grant remote access to the infected host via SSH, and drop a modified version of a trojan dubbed TeviRat. This highlights the threat actor's attempts to gain complete access to the victim's computer in a variety of ways.
"The sample [...] uses DLL sideloading to modify and deploy the TeamViewer remote access software onto infected devices," Kaspersky said. "In simple terms, the attackers place a malicious library in the same folder as TeamViewer, which alters the software's default behavior and settings, hiding it from the user and providing the attackers with covert remote access."
The development also follows the discovery of malicious Google ads for RVTools, a popular VMware utility, to deliver a tampered version that's laced with ThunderShell (aka SMOKEDHAM), a PowerShell-based remote access tool (RAT), underscoring how malvertising remains a persistent and evolving threat.
"ThunderShell, sometimes called SmokedHam, is a publicly available post-exploitation framework designed for red teaming and penetration testing," Field Effect said. "It provides a command-and-control (C2) environment that allows operators to execute commands on compromised machines through a PowerShell-based agent."
Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.
from The Hacker News https://ift.tt/Ak2pQ3B
via IFTTT
Microsoft launched its Cybersecurity Governance Council in 2024, and with it, named a group of deputy chief information security officers that ensure comprehensive oversight of the company’s cybersecurity risk, defense, and compliance. These leaders work in tandem with product and engineering leaders across the company to create accountability and advance cybersecurity protection for Microsoft, our customers, and the industry.
In this series, we will introduce these leaders and share more about their background, their role, and their priorities.
Q: Tell us about your current role and responsibilities.
Igor Sakhnov: “As Microsoft’s Corporate Vice President of Engineering for Identity, I lead data and platform engineering along with business-facing initiatives. Since April 2024, I’ve also served as Deputy Chief Information Security Officer (CISO) focusing on identity-related security risks.”
Mark Russinovich: “In my role, I work with a large team to identify and resolve the security risks that come up and evolve under the Microsoft Azure umbrella, the core operating system itself, and the groups that make up the core engineering systems that the entire company depends on. In all these cases, we want the risk mitigations to be durable so once they’re done, the system stays secure and doesn’t have to be revisited every year.”
Yonatan Zunger: “My job is to try and think about all the different ways in which things involving AI can go wrong, make sure that we have good, thoughtful plans for each of those things, and develop the right tools so we can design and run the right incident response for AI issues.”
Q: How did you get your start in cybersecurity?
Igor Sakhnov: “It didn’t really start in cybersecurity. My journey began with a deep interest in understanding how systems work and how they interact and perform at scale. Inevitably, the hard question of security surfaces and the interesting aspects of detection and prevention become top of mind.”
Mark Russinovich: “I’ve always been interested in the way computers and operating systems work. In junior high I started working with computers and figuring out the internals, then went to college and graduate school in it. There was a natural intersection with cybersecurity and operating systems design since both involve understanding complex systems, and I started doing more with cybersecurity.”
Yonatan Zunger: “I started my career as a theoretical physicist. I joined Google, spent years building search and infrastructure, and in 2011 I became the Chief Technology Officer of social. This was a few months before the launch of Google Plus, and I discovered that the hard parts of the job had nothing to do with technology. Instead, all the hard parts were security and privacy, and those were interesting problems to me. It quickly became clear that using these technologies in the right or wrong way can have a huge impact on people’s lives. That stuck with me, and it caused me to genuinely fall in love with the field.”
Q: What does your team do, and how do you work with others across the company?
Igor Sakhnov: “My team is responsible for the work and innovation in the Identity space, building a large-scale enterprise identity system. Over the past year, the point about larger systems being identity-driven has really come to fruition, with the new efforts that leverage identity in the network flows.”
Mark Russinovich: “My team focuses on technical strategy, architecture, and security risk management for the Azure platform, engineering systems, and core operating systems. We work closely with teams across Microsoft to implement durable security measures. I collaborate with emerging technology teams to understand customer requirements and guide Azure’s development while ensuring security remains a priority in all decisions and implementations.”
Yonatan Zunger: “We’re a very horizontal team and our work has six core pillars: AI research, infrastructure, empowerment, evaluation and review, incident response, and policy and engagement. Within those pillars are a lot of people working on a lot of things, from doing safety and teaching it to people, to thoroughly testing and vetting every piece of generative AI software that goes out the door at Microsoft, to bringing AI expertise into incident responses, to engaging with all sorts of stakeholders across the world, and talking and sharing with them but also listening and learning.”
Q: How do you balance the need for security with the need for innovation in your team?
Igor Sakhnov: “Balancing is important and hard. We strive to integrate security into the development process from the outset, shifting left and avoiding interruptions. No matter how innovative the product is, it will not get adapted if it is not secure or not reliable.”
Mark Russinovich: “I don’t think it’s an either or, but it is a balance. The second something may turn into a widget or service that people will depend on, you need security, but if you create such a hardened system that no one can use it, you’ve wasted time. We have a commitment to our customers that security is always in the driver’s seat, but innovation is holding the road map, and we’re delivering on that.”
Yonatan Zunger: “Engineering is the art of building systems to solve problems. If you’re building a system that isn’t safe and secure, you aren’t solving the customer’s problem, you’re building a system that will give them more problems.”
Q: What are some of the biggest cybersecurity misconceptions that you encounter?
Igor Sakhnov: “The desire to make the perfect solution. This is why ‘assume breach’ is the mindset I cultivated with my team. Yes, we must focus on the protection at all costs, and we should expect that any protection will be circumvented. How we detect, reduce the impact, and mitigate in the shortest time is top of mind.”
Mark Russinovich: “The assumption that unless you can prove to me something is not secure, it’s secure. You of course must invest in prevention, but Microsoft has said for close to a decade now that you have to assume any system can and will be breached, so you have to minimize the impact and increase how you detect and mitigate those breaches.”
Yonatan Zunger: “The idea that security, privacy, and safety are three distinct things. They’re not. If you’ve ever seen a security team, say, ‘That sounds like a privacy problem,’ and a privacy team say, ‘That sounds like a security problem,’ and nobody fixes it, you know where this story ends. Artificial boundaries like these are a factory of nasty incidents.”
Q: What’s one piece of advice you would give to your younger self?
Igor Sakhnov: “Shift focus from the local improvements and invest heavily into the influence to shift larger organization for all to move in the needed direction. Microsoft’s Secure Future Initiative is a notable example where a central push supersedes all the local innovation we have done over the years.”
Mark Russinovich: “I don’t look back and think about things that I’ve done wrong, but for those that are just starting out in a career or in life, I’d say this: When you find an area that you’re passionate about, learn that area and the areas around it, and learn one level deeper than you think necessary to be effective. My father gave me that advice and it’s what inspired me to pursue computers.”
Yonatan Zunger: “If you ever find yourself in a relationship where you can’t fully be yourself…leave.”
Microsoft Secure
To see these innovations in action, join us on April 9, 2025 for Microsoft Secure, a digital event focused on security in the age of AI.
Across identity, cloud ecosystems, and privacy, these leaders have independently arrived at similar conclusions: security enables rather than restricts, perfect protection is impossible, but resilience is achievable, and everyone—from engineers to customers—plays a role in defense.
Microsoft’s security transformation isn’t just about technology. It’s about people like Igor Sakhnov, Mark Russinovich, and Yonatan Zunger who demonstrate the diverse leadership needed to strengthen Microsoft’s security posture for our customers and the industry.
Watch for more profiles in this series as we highlight additional deputy chief information security officers, including leaders overseeing cloud infrastructure, customer security, threat intelligence, and more.
RSAC 2025
Learn more about AI-first, end-to-end security at The Microsoft at RSAC Experience. From our signature Pre-Day to demos and networking, discover how Microsoft Security can give you the advantage you need in the era of AI.
To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.
Setting up a VMware Cloud Foundation (VCF) test lab on a single server is a cost-effective way to explore enterprise-grade virtualization and software-defined data center (SDDC) technologies in a controlled environment. This guide provides a step-by-step approach to deploying VCF in a home lab, covering hardware requirements, installation procedures, resource optimization techniques, and best practices for managing a fully functional VCF stack with limited infrastructure.
Whether you’re an IT professional, a VMware enthusiast, or someone looking to gain hands-on experience with VCF, this article will help you build a reliable and efficient test environment.
Hardware Recommendations
Processor
For a VCF lab, it is crucial that the processor supports hardware virtualization—Intel VT-x with EPT or AMD-V with RVI. Without these technologies, you will not be able to run hypervisors (ESXi) inside virtual machines (nested virtualization). Modern multi-threaded Intel Core/Xeon or AMD Ryzen/EPYC CPUs with a sufficient number of cores are suitable. The number of cores affects how many workloads can be run in parallel.
For a complete VCF lab on a single host, a minimum of 12 physical cores (or 24 logical threads with Hyper-Threading) is recommended. In official recommendations, the Holodeck tool (often used for automated VCF deployment) requires 16 cores, but in a test environment, enthusiasts successfully run VCF even on a single host with 12 cores.
When choosing between Intel and AMD, you can focus on budget and power consumption. Many note that AMD processors provide more cores for the same price and are more energy efficient, so they are increasingly used in home labs (see the article VMware Cloud Foundation (VCF) Homelab Hardware Options). The main thing is to make sure that the CPU model is not so outdated that its support disappears in new versions of VMware (you can check compatibility in the VMware knowledge base on CPU support).
Memory (RAM)
VCF includes many components, so it requires a lot of memory. The optimal amount of RAM is 128 GB or more. With this amount, you can run all the necessary virtual machines (4 ESXi nodes, vCenter, NSX Manager, SDDC Manager, etc.) in minimal configurations. Enthusiasts report that by using the smallest VCF component deployment templates (vCenter – Tiny, Log Insight – xSmall, etc.), they were able to freely use VCF on a host with 128 GB of RAM.
If 128 GB is not available to you, the minimum possible option is about 64 GB, but in this case you will have to significantly reduce the settings and not run all components at the same time. Remember that memory usually runs out fastest when experimenting, so it is worth installing the maximum possible RAM for your platform. It is also useful to check the support for the Memory Tiering function (in vSphere 8) – if you have a high-speed NVMe drive, the hypervisor can use it as an additional level of memory, which partially compensates for the lack of RAM.
Storage
The performance and capacity of the storage subsystem greatly affect the operation of the VCF lab. It is recommended to use solid-state drives (SSD or NVMe). The total required capacity is from 2 TB and up, since you will have to store images of several virtual ESXi and all the control VMs. A high-speed NVMe SSD is ideal, especially for data where IOPS is important (for example, for the vSAN cache). A good strategy would be a combination: one fast NVMe disk for the main workloads (including VM files of nested ESXi nodes and their virtual disks for vSAN), and an additional SATA SSD or HDD for less critical data (ISO images, backups, etc.).
Jimmy Mankowitz’s lab, for example, used 1 TB NVMe SSDs for vSAN capacity disks and ~960 GB SATA SSDs for cache disks. In our case, when all virtual nodes are on one physical server, we can do it more simply: create two virtual disks for each nested ESXi – a small one (say, 40-50 GB) for the vSAN cache and a larger one (200-300 GB) for the vSAN capacity.
Note: When placing several virtual disks on one physical drive, use thin provisioning to avoid allocating the entire volume at once. This will allow you to save space, since not all virtual disks will be filled in the test environment. In addition, thin disks together with the compression and memory deduplication mechanism at the ESXi level will help optimize resource consumption.
Network Adapters
For a single host, one physical network adapter is enough, but it is better to have 2 or more ports. One interface can be dedicated to management and external access, and the second – to internal traffic (vSAN, vMotion, NSX overlay). In real VCF deployments, high-speed channels (10 Gbps) are required, but for a home lab, 1 Gbps ports are usually used.
If possible, you can install an inexpensive 10 Gbps card (for example, SFP+ or 10Gbase-T) – this will speed up vSAN and NSX data exchange, but then the home switch must also support 10 Gbps. In extreme cases, 1 Gbps will do, it’s just that I/O operations may be slower. Make sure the network adapter you select is compatible with VMware ESXi (Intel PRO/1000, I350, X520/X710 and similar cards are usually well supported by drivers, as are Broadcom/Emulex/Marvell server adapters).
The network configuration should support VLAN (802.1Q), since the VCF lab will have to separate traffic by type (management, vSAN, vMotion, NSX, etc.). If your switch does not support VLAN, you can limit yourself to one common subnet, but this does not reflect best practices and will make it difficult to emulate a real environment. In extreme cases, all networks (Management, vMotion, vSAN, etc.) can be logically distributed across different IP networks, but going through one unsegmented port is not ideal from a security and debugging point of view. It is recommended to at least configure special ports with the VLAN 4095 tag (trunk mode) on the ESXi virtual switch to isolate the traffic of virtual segments within the host. We will return to the network configuration later, during deployment.
Bottom line on hardware: the optimal setup for a single machine would be a powerful server with 12-16 cores, 128GB RAM, ~2TB SSD/NVMe, and dual gigabit (or better yet, 10Gbps) NICs. This is significantly lower than the requirements for production deployment (in production, VCF usually requires 4 servers only for the management domain and at least 3 more for each load domain (for example, as it is implemented above based on two physical servers), but by compacting resources, we can launch VCF even on a home single server configuration. For example, it is known that with all lab optimizations enabled, VCF runs on a single host with 12 CPUs and 128 GB of RAM. If your budget is limited, you can start with 64 GB of RAM and fewer cores, but then you may want to consider deploying only a few components or use the single-component deployment trick (more on that below). In any case, make sure your hardware is properly cooled (the load will be serious) and has an uninterruptible power supply (preferably a UPS, so that the long deployment process is not interrupted by a power surge).
Step-by-Step Deployment of all VCF Components
Deploying VMware Cloud Foundation in a lab environment is a non-trivial process that involves nested virtualization (running ESXi as a VM inside a hypervisor), setting up networks, and using a special Cloud Builder tool to automatically install components. Below is a step-by-step plan for how to set up all of the VCF components on a single physical server.
Step 1: Installing the Base Hypervisor on the Physical Host
First, install a hypervisor on the physical server itself, which will act as the L1 layer – a platform for all other virtual machines. The best choice is VMware ESXi (the latest version compatible with VCF, for example 8.0 U3). ESXi uses resources most efficiently and officially supports VMware nested virtualization. An alternative option is to use VMware Workstation Pro on an existing OS (for example, Windows), but this will give extra overhead and complicate the network part. Therefore, it is recommended to dedicate the machine entirely to ESXi. Install ESXi on your server (you can use a small separate SSD / USB drive) and perform basic configuration: assign an IP address for the Management Network (in your home network or a separate one), enable SSH access, etc.
After installing ESXi, make sure that virtualization options (Intel VT-x/VT-d or AMD-V/Vi) are enabled in the server BIOS settings – without them, ESXi will not allow you to run “virtualized” guest hypervisors. If everything is OK, connect to ESXi via vSphere Client from another computer and prepare the host for the next step.
Step 2: Configure networking at the physical ESXi level (L1)
Before creating nested hypervisors, you need to configure virtual switches on the physical host to provide communication between nested-ESXi (layer L2) and the outside world. Create a standard vSwitch on ESXi (or a distributed vDS if you will have vCenter for L1, but for one host the standard one is enough) and attach a physical NIC to it, which will be used for lab traffic. This vSwitch will serve all VLANs used inside the VCF. It is recommended to configure a Trunk port with VLAN ID 4095 – this means that the port group will be able to pass traffic from all VLANs (a virtual analogue of the trunk port of the switch). For example, you can create a port group of the type “VCF-Transit” with VLAN 4095 on vSwitch0.
Enable advanced security settings on this port group:
Promiscuous Mode (listening mode for all packets) = Accept
Forged Transmits = Accept
MAC Address Changes = Accept
These parameters are required for the nested ESXi to work correctly, so that their traffic can freely pass through the upper-level virtual switch. Without the Promiscuous mode, virtual machines inside the nested ESXi will not be able to interact, since the external switch will discard “unknown” MAC addresses. Also set the MTU to 9000 (jumbo frames) on this vSwitch if you plan to use NSX-T – NSX overlay networks benefit from a larger MTU, reducing overhead.
If the server has several physical NICs, it is reasonable to separate the traffic: for example, use NIC1 for management (Management, access from the home network), and NIC2 for everything else (vSAN, vMotion, NSX). In the previously mentioned lab on 2 physical hosts, the author connected a 1-gigabit port for management, and 10-gigabit ports for vSAN/vMotion. In our single-host lab, we can run everything through one port with VLAN, or if there are 2+ ports, we can use them for different vSwitches (for example, vSwitch0 – Management on NIC1, vSwitch1 – vSAN+vMotion on NIC2). However, to simplify things, we can leave one combined switch – after all, we physically have one host, and the traffic distribution is logical.
DNS and services. At this stage, also take care of infrastructure services: DNS and NTP. VCF is very demanding of correct DNS pairing – all main components must have resolvable names (A-record) and reverse records (PTR) on IP. If your home network already has a DNS server – write the necessary records there (for ESXi hosts, for future vCenter, NSX Manager, SDDC Manager, etc. with invented domain names). If not – you can deploy a local DNS server as a VM (for example, Windows Server (AD DS) or Linux + BIND) on the same host or use the hosts file on Cloud Builder (temporary solution). Also make sure that all systems will have access to a single NTP server (you can use Internet NTP or set up your own). In the lab scenario, Cloud Builder can even act as a DHCP, DNS and BGP server during automatic installation, but with manual preparation it is better to configure these things by yourself.
Step 3: Creating Nested ESXi Nodes (L2)
Now let’s create four virtual machines that will emulate physical Management Domain VCF nodes (there are 4 in the standard architecture). We will install the ESXi 8.x hypervisor on each VM. This process includes several stages:
Create a VM. In the vSphere Client (on your physical host), create the first virtual machine. Specify the guest OS as VMware ESXi (the appropriate version). Assign resources: at least 4 vCPUs (preferably 6-8 vCPUs, if the total number of cores allows), 16-32 GB of RAM, enable the option to transfer virtualization to the guest OS (Expose hardware virtualization to guest). The latter is critical, otherwise ESXi will not start its virtual machines inside. Add a disk for ESXi itself (for example, 10-20 GB). Then add two more disks: one smaller (e.g. 30-50 GB) for vSAN cache, and one larger (100-200 GB) for vSAN capacity storage. Be sure to set these disks as “Independent, persistent” and without snapshots enabled if you plan to use automated deployment (VCF Lab Constructor may require this).
Network. Connect the VM’s network adapter to the trunk (VCF-Transit) portgroup created in step 2. In the VM’s network settings, you can specify VLAN 4095 or “All (4095)” to accept all VLANs (if the vSwitch is not trunk, but a portgroup with a specific VLAN, then specify the corresponding VLAN ID for the Management network). It is usually easier to use trunk.
Install ESXi. Connect the VMware ESXi installation CD ISO image (the same version as on the physical one, or a compatible one, such as ESXi 8.0) to the VM CD drive and launch its console. Proceed with the ESXi installation as usual, selecting the first virtual disk (10–20 GB) as the target disk. After installation, configure the Management Network: specify a static IP from your management range (e.g. 192.168.70.11/24 for host1, .12 for host2, etc. if you selected the 192.168.70.0/24 subnet for management), mask, gateway, DNS server, and a unique hostname (FQDN, such as esxi-mgmt-1.lab.local). Make sure that each nested ESXi is pingable and resolves by name.
Repeat for 4 nodes. Create three more similar VMs for ESXi2, ESXi3, ESXi4. You can speed up the process: after configuring the first one, make a template out of it and deploy the rest using the template, then adjust the network settings manually. Or you can use automation scripts. For example, there is an Ansible playbook that automatically creates and configures nested ESXi. But to understand the process, you can do it manually (installation of 4 nodes will take ~30 minutes with a pre-downloaded ISO).
Checking the connection between nodes. After starting all four ESXi VMs, make sure that you can connect to them via vSphere Client (each via IP). They are still independent, not in a cluster – this is normal. It is important that each of them sees each other via the management network. If they are in the same VLAN and subnet – ping (from the DCUI page or via ssh, allowing SSH on them). Also check the availability of DNS with nested-ESXi (nslookup hostname of another host) and NTP (ntpq -p).
Note: although the standard for VCF is a 4-host management domain, it is allowed to reduce the number of nodes in the lab. There is a known trick from VMware engineers that allows you to deploy VCF even on a single ESXi node – Cloud Builder can be configured to ignore the 4-host check. In this case, it will deploy a 1-node vSAN cluster (without fault tolerance). However, such a scenario is not officially supported by VMware. However, if resources are very limited, you can try installing only one nested-ESXi and use a special configuration (more on this later, when we launch Cloud Builder). In this article, we proceed from the classic scenario (4 nodes), since it is as close as possible to the real architecture and allows you to study all the components.
Step 4: Deploy VMware Cloud Foundation (Cloud Builder and SDDC Manager)
Once the 4 ESXi virtual hosts are ready, the next step is to deploy the Cloud Foundation components themselves: vCenter, NSX-T, SDDC Manager, etc. Installing them manually one by one is an extremely labor-intensive task. That’s why VMware provides a special automated tool, Cloud Builder. This is a ready-made virtual appliance (OVA template) that needs to be deployed and configured so that it automatically raises the entire VCF infrastructure.
Obtaining Cloud Builder. Go to the VMware portal (my.vmware.com) and download the VMware Cloud Foundation Cloud Builder OVA template of the required version (compatible with the VCF version you are planning, for example, VCF 5.1 -> Cloud Builder 5.1). Please note that downloading requires an account with download rights (in the vExpert program or via a trial invitation). In some cases, the Cloud Builder OVA can be obtained by contacting a VMware representative.
Deploying Cloud Builder. Import the Cloud Builder OVA onto a physical ESXi host (L1) – NOT inside one of the nested ESXi’s! Cloud Builder needs to be able to reach all four ESXi’s, i.e. be on the same physical management network. So deploy it directly onto your primary ESXi host. Give Cloud Builder VMs the following resources: ~4 vCPU, 18–20 GB RAM, 100 GB disk. Connect Cloud Builder’s network adapter to the same vSwitch/portgroup as your nested ESXi’s management network (e.g. VLAN 70 or trunk). Give Cloud Builder a static IP on the Management network (e.g. 192.168.70.10) and the appropriate hostname (cloudbuilder.lab.local). After turning on, Cloud Builder will take some time to initiate its services, then the web interface will become available via HTTPS (https://<IP_CloudBuilder>/) – make sure you can open it.
Preparing deployment parameters. Cloud Builder requires a configuration file that describes our entire future infrastructure: IP addresses, passwords, VLAN ID, host names, etc. This file can be prepared as an Excel Workbook or JSON. It is easier to use a ready-made Excel template (downloaded from VMware) – Deployment Parameter Workbook, fill it out, and then import it into Cloud Builder (the service itself converts it to JSON).
In the workbook you need to specify:
List of ESXi hosts for the Management Domain (their IP, administrator passwords). We have 4 of them (or 1, if you decided on an unsupported scenario).
Planned addresses for VCF managers: vCenter (MGMT), NSX Manager, Edge (if required), SDDC Manager, PSC (if separate, but new versions of vCenter already use embedded PSC), and so on.
VLAN ID for different networks: Management, vMotion, vSAN, NSX-T Overlay, NSX-T Edge Uplink, etc. (for example, Management VLAN 70, vMotion 71, vSAN 72, Overlay 74, as in the example below).
IP pools for NSX-T TEP (Tunnel Endpoints) on the hosts.
Data on DNS, NTP servers.
Passwords for all components (ESXi, vCenter SSO, NSX admin, etc.).
License keys for vSphere, vSAN, NSX, if you have already received them.
However, starting with VCF 5.1.1, the License Later mode is available – you can skip specifying keys at the deployment stage, working in trial mode. To do this, add the flag “deployWithoutLicenseKeys”: true to JSON, or in the Cloud Builder interface in the installation wizard, you mark “License Now: No” and leave the fields blank. We will use this opportunity to not immediately link licenses (we will get a 60-day trial).
An example of a network diagram and addressing. For clarity, plan a small network diagram. For example:
Management Network: VLAN 70, subnet 192.168.70.0/24. It will contain the IPs of all ESXi (192.168.70.11–14), Cloud Builder (.10), vCenter (.20), SDDC Manager (.21), NSX Manager (.22), and Jump-host VM (.100, more on that later).
vMotion Network: VLAN 71, subnet 192.168.71.0/24, IP for vMotion vmk on each ESXi (.11–.14).
vSAN Network: VLAN 72, subnet 192.168.72.0/24, IP for vSAN vmk on each ESXi.
NSX-T Host Overlay (TEP): VLAN 74, subnet 192.168.74.0/24, IP for TEP on each ESXi (Cloud Builder will assign/configure it automatically, you only need to specify the range). To save money, you can set one TEP per host.
NSX Edge Uplink (if you will have Edge): VLAN 75 (or use your external network without VLAN if you will connect Edge to your home router).
DNS/NTP: for example, DNS = 192.168.70.1 (if you already have DNS on .1 or your home router), or the address of your local DNS server.
If you are not using separate VLANs, you can specify the same VLAN (or 0 if untagged) for Management, and set vMotion/vSAN as different IP networks but without VLAN tags – however, Cloud Builder may require VLAN IDs for different networks, so it is better to configure as expected.
Launch deployment (Bring-Up). After filling out the Workbook, upload the file to Cloud Builder. It will check the parameters (Validation). If there is an error somewhere (especially DNS!) – fix it. At this stage, if you are going to deploy less than 4 nodes (for example, 1 node), perform a hack: log in via SSH to the Cloud Builder VM itself and execute:
This will reduce the minimum management domain node requirement to 1 (more details here). After that, verification with one node should proceed without problems.
Next, in the Cloud Builder web interface, click “Bring-up” to start the automatic deployment of VCF. The process will take from 2 to 4 hours (depending on the performance of your host). Cloud Builder will automatically do the following:
Install the required configuration on the specified 4 ESXi (including combining them into a vSphere cluster and enabling vSAN on them). vSAN is used as the main storage for the management domain, so Cloud Builder will configure disk groups on each node (taking one disk as a cache and one as capacity – the same virtual disks that we added). If you deployed one node, it will raise a single-tier vSAN cluster (without replicas).
Deploy a vCenter Server Appliance (VCSA) virtual machine for the Management Domain on one of these hosts. VCSA will be deployed with a Tiny size format (specified in the lab JSON) to minimize resource consumption.
Deploy and configure NSX-T Manager. Usually, a cluster of 3 NSX Manager Appliances is deployed for fault tolerance, but in JSON you can specify the Compact = true mode, then there will be one NSX Manager (to save resources). Cloud Builder will connect NSX to vCenter, register the corresponding VIB modules on the hosts (NSX-T Data Plane). All ESXi nodes will become prepared transport-nodes in NSX-T fabric, with TEP interfaces raised on VLAN Overlay.
Deploy SDDC Manager – the central management machine of Cloud Foundation. SDDC Manager is a separate virtual appliance, which takes on the coordination of all components (monitoring, adding new domains, updating, etc.). It will also be deployed in one of the ESXi management clusters.
(Optional) Deploy additional components if checked: for example, the vRealize Log Insight (vRLI) log management platform for the Management Domain, or load balancers (NSX ALB), and others, depending on the VCF version. In older versions (VCF 3.x), vRealize Suite (logging, monitoring) was installed automatically, in newer versions it is optional. In our case, if the goal is a minimal working VCF, there is no need to deploy anything extra.
Cloud Builder automates the interaction between components – it uses the API of vCenter, NSX, and others to tie them together. For example, it registers NSX Manager under the management of SDDC Manager, creates a record in SDDC Manager about the new Management Domain, etc. When the process is complete, you will have a fully functioning Management Domain VCF. This means:
A cluster of 4 ESXi with vSAN enabled (this is where everything will be stored now – the VM of vCenter itself, NSX, SDDC Manager).
vCenter Server, managing this cluster (you can specify your own Single Sign-On domain, for example vsphere.local).
NSX-T – the manager is running, all 4 hosts are added to NSX Fabric and united by a virtual distributed network (NSX will create an internal vSwitch – N-VDS or use vSphere Distributed Switch – in new versions NSX-T integrates with VDS).
SDDC Manager – the web interface will be available at its address and is ready to accept further commands (for example, deploying additional Workload Domains, lifecycle management, etc.).
Step 5: Testing and further configuration
After a successful bring-up, go to the component interfaces:
SDDC Manager – the main VCF management interface. Go to the SDDC Manager address via HTTPS (specified in the configuration, port 443) under the admin account/password (which you set). You will see a control panel that displays the system status, resources and domains. For example, there will be one Management Domain with a list of resources (hosts, cluster, used CPU/RAM) and the ability to add a Workload Domain. We will consider the interface in more detail below.
vCenter Server – connect via vSphere Client. Make sure that the Management cluster with 4 hosts is visible inside vCenter, vSAN is enabled (one shared datastore). Check that the status of cluster services (vSphere HA, DRS) and vSAN is green. In our lab case, the cluster can be marked as not satisfying the requirement (for example, DRS in Partial mode, if there are few resources), but the main thing is operability.
NSX-T Manager – open the NSX web interface. Log in as administrator. Make sure that all 4 ESXi are displayed as Transport Nodes, that an Overlay Segment is created (for example, for a Tier-0 network or just Host TEPs). In fact, NSX in the Management Domain is needed for future workload networks and for auxiliary networks (for example, Application Virtual Networks for Aria components). At this stage, you can leave NSX-T as is – VCF has configured everything by default.
Network connectivity – check that all new VMs (vCenter, SDDC, NSX) are accessible over the network, pinged by IP and names. It is especially important that SDDC Manager can access vCenter and NSX via DNS, and vice versa, otherwise warnings will appear.
Now you have a working VCF base – Management Domain deployed. If you had more hosts, at this point you could create Workload Domains via SDDC Manager – i.e. additional ESXi clusters for specific tasks (assuming they require separate physical servers). In our case, it is not possible to add a workload domain on a single physical resource (there are no free nodes).
Therefore, we will use a single management cluster for test workloads as well (this is called Consolidated Architecture, when managed workloads run together with management VMs on a single cluster). SDDC Manager will recognize this scenario and mark the domain as Management/VI (consolidated).
Adding virtual machines with workloads. You can deploy any test VMs (e.g. additional AD domain controller, jump-host for access, test servers) inside the resulting infrastructure. You can do this through the familiar vCenter interface, i.e. VCF does not limit the capabilities of vSphere. Keep in mind that all new VMs must be stored on vSAN (since there are no alternative storages in the management cluster). Also, make sure that they are connected to the right networks. For example, if you want a test VM to be accessible from your home network, the easiest way is to connect it to the same management port group that is forwarded to a physical NIC (in this case, it will receive an IP from your home network or VLAN). A more isolated way is to configure a segment and Edge in NSX, but this is beyond the scope of the basic deployment.
We have successfully installed all the main components of VCF. Next, we will analyze how the architecture of this lab is arranged and how the components interact.
Architecture and internal communication of VCF Components
Overview of VCF components. VMware Cloud Foundation is a full stack software-defined data center (SDDC), integrating hypervisor, storage and network virtualization, and management automation tools. VCF includes the main VMware products: vSphere (ESXi + vCenter for compute virtualization), vSAN (software-defined storage on local disks), NSX-T (network virtualization, including distributed switch, routing, firewalls, etc.) and SDDC Manager (management orchestrator). Tools from the vRealize/Aria family (Log Insight for logs, Operations for monitoring, Automation for deployment automation) may also be included, but they are secondary. Simply put, VCF provides everything you need to deploy a private cloud based on the vSphere infrastructure.
The main logical unit in the Cloud Foundation architecture is the domain (Workload Domain). A domain is an isolated group of resources, typically corresponding to a vSphere cluster (or clusters), with its own set of compute, network, and storage resources. VCF separates the Management Domain and the VI Workload Domains.
Management Domain – a special domain that is deployed first and contains infrastructure services: SDDC Manager, vCenter Server for infrastructure management, NSX Manager for network management, etc. This domain serves as the “heart” of the entire system – it hosts management components for all other domains. Management Domain is deployed on a minimum of four ESXi nodes clustered with vSAN and NSX. Why 4? This ensures fault tolerance for vSAN (minimum 3 nodes + 1 witness for quorum) and reserves resources for services. All vCenter and NSX Manager needed for other domains are also hosted in the Management Domain. In the context of our lab, Management Domain is exactly the cluster of 4 nested ESXi that we deployed. We used vSAN inside it, as required by the architecture (in Management Domain, only vSAN can be the primary storage).
VI Workload Domain – a domain for user workloads (application virtual machines, VDI, containers, etc.). Typically, each such domain is a separate ESXi cluster with its own vCenter Server and NSX (or NSX can be shared between domains by design). For example, you can create a 3-node Database domain for a DBMS, a 4-node VDI domain for virtual desktops, etc. VCF allows you to isolate them from each other at the management and resource levels. However, in a small environment, all the roles can be combined.
Consolidated Architecture. In minimal lab scenarios, it is assumed that the Management Domain also plays the role of the Workload Domain, i.e. managed loads are launched on the same cluster. This is called a consolidated architecture. It is typical for POC and small deployments – in this case, the system will have only one vCenter (managing all nodes) and one NSX manager, and the SDDC Manager still sees one Management Domain. Our single-host lab is exactly related to the consolidated scenario: we will launch test VMs on the management cluster, since we do not have other clusters.
SDDC Manager. It acts as an add-on that coordinates several vCenters (each serving its own domain/cluster), which in turn manage the resources of ESXi clusters. In a standard (split) architecture, at least two domains – Management and one Workload – have separate vCenters, isolating the loads from management. In our case (consolidated), in fact, one vCenter and one domain combine both functions, but conceptually the roles remain the same.
Through SDDC Manager, the administrator deploys new domains, monitors the status of equipment, as well as the life cycle (updates, patches) of all VMware products as part of the VCF. SDDC Manager monitors both physical resources (servers, network, storage) and logical ones (clusters, resource pools). In our case, SDDC Manager is deployed as a VM on the management cluster and registered with vCenter and NSX. It “knows” about all four nodes combined into one domain.
In SDDC Manager we can perform further steps: for example, if there are additional hosts – commission (add to inventory) and then create a new Workload Domain via the wizard. SDDC Manager is also used to run updates – it can update ESXi, vCenter, NSX on all nodes in turn, in a coordinated manner, which simplifies the life of the administrator. We can say that SDDC Manager is the brain of VCF, uniting different components into a single whole.
vCenter Server. Each domain (management or workload) is managed by its own vCenter Server instance. In our lab, we have one vCenter deployed (for the management domain). It performs standard functions: ESXi resource management, cluster creation/configuration, VM distribution, migrations, monitoring, etc. It is interesting to note that one vCenter license is enough for all vCenter instances within Cloud Foundation, i.e. VCF comes with the right to several vCenters, but this is not so important for us. Through vCenter, we can manage vSphere settings that Cloud Builder made automatically, for example, check the Distributed Switch through which the hosts are connected. In VCF 4.x+, NSX-T can integrate with vCenter (VDS 7.0), so that the logical NSX switch can be represented as a portgroup in vCenter.
NSX-T and networking. VMware NSX-T Data Center is a VCF component responsible for network virtualization and security. NSX provides a distributed virtual switch (analogue of standard vSwitch, but common to all hosts) and allows you to create overlay networks independent of the physical topology. In Cloud Foundation, NSX-T Manager is located in the Management Domain and manages the network for all domains (by default). That is, if you create a new Workload Domain, you can either connect it to an existing NSX cluster (the one that is deployed), or deploy a separate NSX Manager cluster for the new domain – at your discretion (VCF allows you to share NSX between domains or separate them). In our case, there are no additional domains, only one NSX Manager. It has already prepared transport nodes (ESXi) and created the necessary logical segments.
What is NSX used for in the Management Domain? First of all, to provide a network for possible Aria Suite components (Aria Automation, Orchestrator, etc.), which VMware calls Application Virtual Networks (AVN) – they recommend placing them on isolated NSX segments over BGP. If such components are not used, NSX is not actively used yet. However, we can use NSX for our own VMs: for example, create a logical segment (Layer2) and connect test VMs to it – then these VMs will be able to interact in an isolated network that exists only inside NSX. You can even raise Tier-1 and Tier-0 NSX routers to organize the exit of these segments to the outside (via Edge).
Let’s note the main points of the network architecture in our Cloud Foundation lab:
Each ESXi in the management domain has at least two virtual network interfaces (VMkernel): one Management (it is also used for vCenter traffic, NSX control-plane) and one for vSAN. An interface for vMotion is also often allocated, but Cloud Builder could combine vSAN and vMotion on one subnet or separate them – it depends on the settings. Let’s say we have Management on VLAN 70, vMotion on VLAN 71, vSAN on VLAN 72. This is a traditional separation to ensure performance.
Distributed Switch (vDS). Cloud Builder most likely created a distributed switch for the management domain, on which all 4 hosts hang. It has port groups configured for Management, vMotion, vSAN with the corresponding VLANs. For example, VDS-MGMT with a port group Management (VLAN 70), vMotion (71), vSAN (72). In the mentioned SDDC Manager console, you can see these networks at the domain level.
NSX-T Overlay. Cloud Foundation will configure NSX-T by creating a Transport Zone and connecting hosts to it. Each ESXi received a Tunnel Endpoint (TEP) on its VDS (or N-VDS) – this is a VMkernel interface through which it will exchange Geneve tunnel traffic with other NSX nodes. These TEPs are in a separate VLAN (say, VLAN 74) and subnet (in the example, 192.168.74.0/24). If there are several hosts, a mesh network of tunnels is already established between them (but since none of the hosts have user segments yet, the tunnels are idle). If there is only one host, the tunnels do not go anywhere, but NSX still requires you to configure the TEP.
Tier-0 / Edges. Cloud Builder could also deploy template Edge (NSX Edge) VMs if you specified it in the settings (for example, to organize routing and connect to an external network). Edge can work in the lab on a single host, but remember that Edge is a regular VM, which is better to place on different nodes than it serves (in production on a separate Edge Cluster). In our case, if you need access of NSX logical networks to the outside, you can deploy one Edge VM on the same cluster. It connects with one interface to the overlay (via TEP, as a host) and with another to the external network (for example, VLAN 70 or another, which has access to the Internet). Configuring Edge and Tier-0 router is beyond the scope of this article, we will only mention that BGP is often used to dynamically exchange routes between Tier-0 and a physical router. If BGP is complicated for you, you can configure a static route or just NAT on the Edge.
Hardware layer and external access. In a real infrastructure, VCF assumes that you have at least two physical switches (for fault tolerance) and a top-level router that will border NSX (via BGP). In a home lab, the router role is usually played by your home router or L3 switch, and sometimes no complex topology is required at all. However, think about how you will access the components (vCenter, SDDC Manager, NSX) from your regular environment (for example, from a laptop).
If you placed the Management network (all .70.x addresses) inside an isolated VLAN that the home router does not know, then the easiest way is to use Jump Host.
Jump Host is a virtual machine (Windows or Linux) with two network interfaces: one connected to the VCF Management network, the second – to your home network. On Jump Host you can install VPN, RDP or just a browser and the necessary client applications (vSphere Client) and thus, while at home, log in to Jump Host and from there to an isolated lab. The diagram below shows how Jump Host (Windows 10) connects simultaneously to VCF LAN (e.g. 10.0.0.0/24) and a local network (192.168.0.0/24):
If you made the Management network the same as your local one (for example, everything in 192.168.0.x), then the components are directly accessible, but there is a risk of IP and name overlap. A cleaner way is to allocate separate subnets and VLANs and use either routing or Jump Host. In any case, to simplify the initial setup, you can quickly launch one Windows VM on the same physical host (or on a vSAN cluster), connect it to two networks and thus solve the access issue. Resource management. Since we have one physical machine, all components share its CPU, memory and disk. vSphere (vCenter) distributes these resources using standard algorithms: DRS (if enabled) balances the load between hosts, HA ensures VM restart in case of failures. But in a lab with one host, there is no fault tolerance – if the host goes down, everything will shut down. Therefore, plan downtime manually during updates/reboots. It is useful to set up resource pools in the cluster – for example, allocate a “Management VMs” pool and set increased shares (priority) for CPU and memory for it. By placing SDDC Manager, vCenter, NSX in this pool, you protect them from potential “squeezing out” by test VM resources. You can also manually limit the consumption of test VM resources (via CPU/RAM limits), so that even under load they do not eat up too much of the infrastructure.
Cloud Foundation architecture is designed to provide unified management: all components are integrated. Example: when you run an upgrade procedure via SDDC Manager, it queries vCenter via API for the ESXi version, then downloads the ISO, puts each host in Maintenance Mode in turn, updates ESXi, reboots, checks vSAN, then updates vCenter Appliance, NSX Manager – and so on down the list until the entire stack upgrades to the new version. Such end-to-end operations are one of the advantages of VCF over a heterogeneous self-assembled environment.
By deploying a VCF lab, you essentially have a miniature copy of a private enterprise cloud. Below we will look at recommendations for operating this environment in resource-constrained conditions.
Best Practices for a VCF Lab
When deploying Cloud Foundation on home hardware, we go far beyond the scope of “normal” use (with dozens of servers and powerful storage systems). Therefore, it is important to apply optimizations and follow best practices so that the system works acceptably and is easy to administer.
Resource Optimization:
Minimal component configurations. When we deployed VCF, we already laid down reduced templates for all possible VMs. Make sure that in the JSON file/Workbook you selected the smallest sizes for vCenter (Tiny), NSX Manager (Medium or Compact – NSX-T 3.x supports Compact mode = 1 manager/1 controller), SDDC Manager (luckily, it does not require much – usually 4 CPU/16 GB). Do not install optional packages that you do not need (vRealize Suite Lifecycle Manager, vRA etc.) – you can always add them later. This approach allowed us to fit Cloud Foundation into 128 GB of RAM. If you see that some VM does not use all the allocated resource (for example, vCenter Tiny actually consumes 6 GB out of 12 GB of RAM), you can fine-tune the resources – in the lab, it is allowed, for example, to reduce vCenter RAM to 8 GB (although this is below the minimum requirements) or NSX Manager to 12 GB. Of course, this is not officially supported, but it can save memory. Do this carefully and one VM at a time, monitoring stability.
Deduplication and shared memory. ESXi hypervisor can use TPS (Transparent Page Sharing) – combine identical memory pages between VMs. In our case, we have several similar OSs at once: four ESXi hosts – their hypervisor code in memory is identical, therefore, ESXi on L1 will be able to deduplicate some of this data. Similarly, three NSX Managers (if you deployed a cluster) will have many common pages (these are PhotonOS Linux + Java). This mechanism can save several gigabytes of RAM automatically. The main thing is not to disable TPS. Remember that starting with vSphere 6.0, TPS between different VMs is limited by default (for security reasons), but in the lab you can enable inter-VM sharing (Parameter Mem.ShareForceSalting on ESXi = 0). This is not safe for a multi-tenant environment, but at home it is quite suitable.
Virtual memory swap and ballooning. When RAM is low, ESXi uses the balloon driver and swap. In a nested environment, ballooning is unlikely to work (since VMware Tools do not work in ESXi VMs), but swap will. Therefore, if you have, for example, 96 GB, and all VMs in total want 110 GB, ESXi will swap 14 GB to disk. This will slow down these VMs significantly. It is recommended to avoid constant swap: it is better to manually reduce their memory or turn off some. Do not set Memory Reservation on VMs in this case, since this will only make things worse (ESXi will have to guarantee memory by displacing other VMs entirely to swap). Let the hypervisor dynamically balance everything. If you notice that swapping is active, try disabling one NSX Manager (if there are three in a cluster, one can be temporarily disabled) or reduce its memory size.
Limit background processes. In the lab vCenter, it is worth disabling unnecessary services, for example, vSAN Performance Service (if you do not need detailed tracking of vSAN metrics), CEIP/telemetry, vSAN CLOMD elasticity. Each such service slightly loads the system – you can disable them via UI/API. For NSX-T, it is advisable not to enable functions like IDPS, URL filter, etc., which will simply eat up the CPU.
Thin disk provisioning and space saving. We have already used thin disks for nested ESXi. Similarly, all VMs (vCenter, NSX, SDDC) are thinly deployed by default. Monitor the vSAN datastore filling – do not overfill it. vSAN usually starts complaining when it is >80% full. Make sure that unnecessary ISOs and VM snapshots do not fill the storage. If necessary, connect additional storage: for example, NFS NAS. You can connect a NAS directly to ESXi L1 (e.g. Synology NAS) and store ISO images or even secondary VMs (not participating in VCF) on it. This takes some of the load off vSAN.
Use PowerCLI automation for pause/start. If the resource is at its limit, you can develop a script that, say, stops the NSX Manager cluster (2 of 3 nodes) when you are not testing the networks, and starts them when needed. Or, for example, you have a VDI Workload Domain – you keep it turned off until you need it. By writing a couple of PowerCLI/PowerShell scripts and scheduled tasks, you can dynamically “rest” the resources. The main thing is not to turn off all NSX Managers or vCenter at the same time, otherwise the SDDC Manager will panic. But using one NSX manager out of three is possible.
Test environment administration and management:
DNS and certificates. Make sure there are no DNS issues – VCF has a lot of name integrations. If you suddenly change the IP/name of something, correct DNS/hosts everywhere. For simplicity, you can add entries for the main nodes (vcenter.lab.local, sddc.lab.local, etc.) to the hosts file on your production machine to quickly access them. In terms of certificates, you can leave self-signed ones in the lab (they were generated during installation). If you want to experiment, you can set up an internal CA (for example, Microsoft AD CS) and replace certificates via VMware Certificate Manager or API, but this is not necessary and quite complicated.
Monitoring. Enable resource exhaustion alarms (CPU, memory, space) in vCenter – they will help you react if something suddenly starts growing uncontrollably (for example, log files). It is useful to deploy vRealize Log Insight (aka now Aria Operations for Logs) – it is usually included with VCF. In the lab, you can install it in Tiny size on the same cluster. It will collect logs from all ESXi, vCenter, NSX and SDDC Manager, providing a single log. This will make debugging easier if there are errors. But Log Insight will require ~8 GB of RAM and ~100 GB of space, keep in mind.
Backup. It is a pity to lose even a lab environment, especially after a long setup. Set up backup procedures for the main virtual machines:
vCenter Server: use the built-in File-Based Backup mechanism (via the vCenter VAMI interface) – it can upload the vCenter configuration to SFTP/FTP on a schedule. Set it up and save a backup at least once a week.
NSX-T Manager: NSX also has a backup feature (in the Web UI, under System > Backup). Set up a target (the same SFTP) and save a backup of NSX. If something happens, NSX Manager can be restored from this backup to a newly deployed VM.
SDDC Manager: you can either manually take a snapshot of the SDDC Manager VM (when it is turned off) or clone it, or use the procedure described here. But since SDDC Manager does not store much important data (mostly inventory and state), if you lose it, you can redeploy it as a last resort.
ESXi hosts: no special backup is needed – their configuration is small and is also stored in SDDC/vCenter (inventory). You can save a backup of the host profile.
Infrastructure virtual machines: in a small lab, you can periodically make snapshots of VM vCenter, NSX, SDDC (for example, before experimenting with an update). Important: do not keep snapshots for a long time (vSAN does not like old snapshots, and it will also consume space), delete or roll back within a few days maximum.
Updates. VMware actively releases security updates and patches. In a 60-day isolated lab, you may not be particularly eager to update if everything works. But, for example, an ESXi or NSX patch may be released with a fix that is interesting to you. It is convenient to use Lifecycle Manager (LCM) through SDDC Manager: it can show available updates for VCF (if there is Internet, or you can manually download the packages). Since we have a very non-standard environment, first study the Release Notes – is the update supported in consolidated mode (usually yes). You can update the entire stack (SDDC Manager will do this automatically, rebooting the components one by one). It is recommended not to manually update the components separately, otherwise SDDC Manager will lose version synchronization. Always use the built-in LCM in SDDC Manager.
Recovery practice. It is a good idea to try simulating a failure and recovery. For example, what if your NSX Manager crashes (in a lab with one NSX, this is a problem, so you can test by starting a new NSX Manager VM and connecting it to the cluster). What if vCenter breaks? – then File-Based Backup recovery will help. The lab is a place where you are not afraid to touch any “red button”.
Document and automate. Keep a change log: write down the IP and passwords of all components, VLAN ID, accounts. In Enterprise, they use VMware Cloud Foundation Planner/Tracker for this, but you can just use a text file or a table. Also, if you destroy and redeploy the lab, automation using VCF Lab Constructor (VLC) or API scripts can save a lot of time in the future. By saving your Cloud Builder configuration JSON file, you can easily redeploy the environment later if necessary.
Tips for upgrading and expanding:
During the trial window (60 days), try to explore as many features as possible. Try adding another (nested) host to the management cluster via the SDDC Manager interface – you will see how the node is added (to do this, first make a “commission host” in the SDDC Manager, then “expand cluster”). You can test the Workload Domain creation function: for example, release 3 nested-ESXi from the management cluster (leaving 1 for mgmt, which VCF will not allow normally, but you can bypass this in the lab), and try to create a VI Domain from them – this way you will see the process of deploying the second vCenter and NSX.
If something goes wrong during the experiments (for example, Cloud Builder or NSX did not start up the first time) – do not despair and do not be afraid to completely demolish the unsuccessful configuration. The lab is valuable because it can be rebuilt. Sometimes it is faster to start from scratch than to fix a partially configured VCF, especially since the product is very complex and it is difficult to edit the configuration manually. Therefore, just in case, save copies of working configs (JSON) to quickly repeat the deployment if necessary.
Physical host expansion: if you have additional resources (for example, a second similar server), you can try to distribute the lab: for example, 4 ESXi nodes on one physical one, 3 on another, connected by a 10G link – this brings it closer to reality and removes some of the load. VCF supports vSAN stretched clusters and the like, but in a lab it is easier to scatter the management cluster nodes across two machines. SDDC Manager will not notice the difference – it is important for it that there are 4 ESXi available over the network. Thus, you can gradually expand the testing ground.
In summary: following these guidelines will ensure that your VCF lab, while not the fastest, will be stable enough to allow you to explore all the major features of Cloud Foundation. You will see first-hand how vSphere, vSAN, and NSX work together within a single platform, and how SDDC Manager makes it easy to coordinate them.
Conclusion
We have covered the process of creating a VMware Cloud Foundation home lab on a single physical server – from selecting components, deploying all the necessary virtual machines, to setting up network interaction and best operating practices. Such a lab, although limited in performance, provides a unique opportunity to gain experience working with a fully functional VMware private cloud. You will get acquainted with the VCF architecture: the interaction of vSphere, vSAN, NSX-T and SDDC Manager, and also learn how to apply optimization techniques to fit into modest resources. Do not forget to use the trial period as efficiently as possible – study the interfaces, try adding/removing nodes, configuring NSX segments, and performing updates via SDDC Manager. This practical experience is valuable for a system engineer, allowing him to practice his skills without risking the production infrastructure.
The VMware Cloud Foundation lab is an excellent testing ground for experimenting with modern software-defined data center technologies. Good luck in your research and further immersion into the world of VMware VCF!
from StarWind Blog https://ift.tt/UZJOT73
via IFTTT
Over the next few weeks, we’re breaking down the most critical sections of our 2024 Year in Review.
This week, we examine the most frequently targeted vulnerabilities—particularly those affecting network infrastructure. We also detail a noticeable shift in adversary behavior, as threat actors move away from time-sensitive lures in phishing campaigns. Finally, we highlight the tools most commonly leveraged by attackers last year and provide guidance on how to detect their presence in your environment.
Download thefull reportfor a deeper understanding of these trends and actionable steps to strengthen your defenses.
Security Operations Centers (SOCs) today face unprecedented alert volumes and increasingly sophisticated threats. Triaging and investigating these alerts are costly, cumbersome, and increases analyst fatigue, burnout, and attrition. While artificial intelligence has emerged as a go-to solution, the term "AI" often blurs crucial distinctions. Not all AI is built equal, especially in the SOC. Many existing solutions are assistant-based, requiring constant human input, while a new wave of autonomous, Agentic AI has the potential to fundamentally transform security operations.
This article examines Agentic AI (sometimes also known as Agentic Security), contrasts it with traditional assistant-based AI (commonly known as Copilots), and explains its operational and economic impacts on modern SOCs. We'll also explore practical considerations for security leaders evaluating Agentic AI solutions.
Agentic AI vs. Assistant AI (aka Copilots): Clarifying the Difference
Agentic AI is defined by autonomy. Unlike traditional AI tools—which function as powerful assistants—Agentic AI systems independently perceive, plan, investigate, and conclude. In the context of SOC operations, Agentic AI acts much like a skilled Tier-1 analyst, autonomously triaging alerts using industry best practices, thoroughly investigating incidents, and providing actionable outcomes with minimal human oversight.
Assistant AI solutions, by contrast, are essentially smart tools waiting for human guidance. A security copilot, for example, can suggest insights or answer analyst questions about an alert, but it won't proactively investigate without explicit instruction. Every decision, action, or conclusion must first pass through a human analyst.
Consider a scenario involving potential malware:
Assistant AI waits for the analyst's prompt, then responds to specific queries, leaving investigation decisions to the human.
Agentic AI, conversely, proactively initiates and completes a full investigation—analyzing logs, correlating events, and possibly containing threats, then delivers a detailed report ready for human review.
The crucial distinction here is initiative and autonomy. Agentic AI isn't just another SOC automation tool like SOARs, it's an autonomous member of your security team. Unlike traditional SOAR or Hyperautomation tools, it doesn't need playbooks or scripted workflows. It adapts in real time, triaging and investigating alerts without you having to map out every move.
How Agentic AI Transforms SecOps and Improves SOC Economics
Also known as AI SOC Analysts, Agentic AI transforms the core of security operations by automating triage and investigation which is often the most time-consuming, high-volume tasks in the SOC. It doesn't just accelerate existing workflows, it makes them scalable, consistent, and cost-effective.
Instant triage at scale
Agentic AI evaluates every alert as it arrives, around the clock. It triages based on real indicators of risk, not just severity labels, reducing dwell time and surfacing the right threats faster than any human team could.
Deep, consistent investigations
Unlike basic enrichment or playbook automation, Agentic AI conducts structured investigations that follow lines of questioning an experienced analyst would pursue. Every alert gets the same level of scrutiny, regardless of priority, removing the need to choose between speed and depth.
Fewer gaps, better prioritization
Traditional SOCs often ignore low- and medium-priority alerts due to time constraints. Agentic AI closes those gaps by investigating everything and ranking results based on actual risk. The result is better prioritization and fewer missed threats.
Operational consistency, even under pressure
With no fatigue or bandwidth limits, Agentic AI maintains quality during alert storms and high-pressure moments. It eliminates triage shortcuts and helps avoid costly oversights, regardless of volume.
More focus, less burnout
By offloading repetitive triage and initial investigations (specially around removing the flood of benign alerts from human analyst queue), Agentic AI frees analysts to focus on high-value work like complex investigations and threat hunting. This reduces burnout and improves team retention, a critical factor in a competitive market with persistent skills shortage.
Lower costs, higher capacity
Agentic AI boosts alert coverage and investigative speed without adding pressure to already stretched teams. It helps organizations scale security operations and add capacity in the face of ongoing cybersecurity skills shortages.
Improved outcomes, measurable ROI
By investigating every alert thoroughly and consistently, Agentic AI improves key metrics like dwell time and Mean Time to Investigate (MTTI). Faster detection and deeper investigations reduce risk exposure and mitigate the financial and reputational impact of breaches.
A force multiplier for the SOC
Agentic AI doesn't replace analysts, it amplifies them. It helps teams scale efficiently, operate more effectively, and achieve better outcomes with fewer resources. The result: stronger security and a healthier bottom line.
Key Considerations for Evaluating Agentic AI for your SOC
Not all agentic solutions are equal. Security leaders must assess solutions based on:
Transparency and Explainability: Ensure the solution clearly documents how decisions are made, enabling analysts and auditors to validate results confidently.
Accuracy and Investigative Depth: High accuracy and thorough, multi-dimensional investigations across all relevant data sources are essential.
Seamless Integration: The solution should easily connect to your existing tools and fit within established workflows, minimizing disruption.
Customization and Adaptability: Seek AI solutions capable of learning and adapting to your unique security context.
Impact and ROI: Measure the impact of the AI using the key SOC metrics that matter to your business. Ultimately, you want an Agentic AI tool for your SOC that improves business performance (i.e., lowers risk, lowers costs) and the metrics you track should be aligned with that.
How Prophet Security Redefines Alert Triage: Autonomous but Human-Driven
The introduction of Agentic AI represents a fundamental evolution for SOC teams, not a replacement of human analysts, but an augmentation enabling them to perform at their best. As organizations evaluate this transformative technology, choosing a transparent, accurate, and adaptive solution ensures that the SOC remains effective, efficient, and human-centric.
By handling routine investigations autonomously, Agentic AI empowers human analysts to focus on higher-value tasks, transforming the SOC from reactive to proactive and precise. Embracing this evolution today positions security teams to remain resilient against tomorrow's advanced threats.
Prophet Security exemplifies this evolution by automating alert triage and investigations with exceptional speed and accuracy. Powered by AI Agents, Prophet AI eliminates repetitive manual tasks, reduces analyst burnout, and significantly improves security outcomes. Visit Prophet Security today to request a demo and see firsthand how Prophet AI can elevate your SOC operations.
Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Twitter and LinkedIn to read more exclusive content we post.
from The Hacker News https://ift.tt/l1UqtfZ
via IFTTT