Posts on Security, Cloud, DevOps, Citrix, VMware and others.
Words and views are my own and do not reflect on my companies views.
Disclaimer: some of the links on this site are affiliate links, if you click on them and make a purchase, I make a commission.
A sophisticated China-nexus advanced persistent threat (APT) group has been attributed to attacks targeting government entities in South America since at least late 2024 and government agencies in southeastern Europe in 2025.
The activity is being tracked by Cisco Talos under the moniker UAT-8302, with post-exploitation involving the deployment of custom-made malware families that have been put to use by other China-aligned hacking groups.
Notable among the malware families is a .NET-based backdoor dubbed NetDraft (aka NosyDoor), a C# variant of FINALDRAFT (aka Squidoor) that has been previously linked to threat clusters known as Ink Dragon, CL-STA-0049, Earth Alux, Jewelbug, and REF7707.
ESET is tracking the use of NosyDoor to a group it calls LongNosedGoblin. Interestingly, the same malware has also been deployed against Russian IT organizations by a threat actor referred to as Erudite Mogwai (aka Space Pirates and Webworm), per Russian cybersecurity company Solar, which has given it the name LuckyStrike Agent.
Some of the other tools utilized by UAT-8302 are as follows -
"Malware deployed by UAT-8302 connects it to several previously publicly disclosed threat clusters, indicating a close operating relationship between them at the very least," Talos researchers Jungsoo An, Asheer Malhotra, and Brandon White said in a technical report published today.
"Overall, the various malicious artifacts deployed by UAT-8302 indicate that the group has access to tools used by other sophisticated APT actors, all of which have been assessed as China-nexus or Chinese-speaking by various third-party industry reports."
It's currently not known what initial access methods the adversary employs to break into target networks, but it's suspected to involve the tried-and-tested approach of weaponizing zero-day and N-day exploits in web applications.
Upon gaining a foothold, the attackers are known to conduct extensive reconnaissance to map out the network, run open-source tools like gogo to perform automated scanning, and move laterally across the environment. The attack chains culminate in the deployment of NetDraft, CloudSorcerer (version 3.0), and VShell.
UAT-8302 has also been observed using a Rust-based variant of SNOWLIGHT called SNOWRUST to download the VShell payload from a remote server and execute it. Besides using custom malware, the threat actor sets up alternative means of backdoor access using proxy and VPN tools like Stowaway and SoftEther VPN.
The findings underscore the trend of advanced collaboration tactics between multiple China-aligned groups.In October 2025, Trend Micro shed light on a phenomenon called Premier Pass-as-a-Service, where initial access obtained by Earth Estries is passed to Earth Naga for follow-on exploitation, clouding attrition efforts. This partnership is assessed to have existed since at least late 2023.
"Premier Pass-as-a-Service provides direct access to critical assets, reducing the time spent on reconnaissance, initial exploitation and lateral movement phases," Trend Micro said. "Although the full extent of this model is not yet known, the limited number of observed incidents, combined with the substantial risk of exposure such a service entails, suggests that access is likely restricted to a small circle of threat actors."
from The Hacker News https://ift.tt/Bxzc5Rn
via IFTTT
Veeam Backup & Replication v13 adds High Availability for the backup server. Our latest guide covers the setup path, so you don’t hit dead ends mid-deploy.
Veeam Backup & Replication v13 introduces the High Availability Cluster for Linux-based backup servers, helping keep the backup infrastructure available if the primary Backup Server becomes unavailable. The feature uses two Veeam Software Appliances, a cluster DNS name, a virtual IP address, and configuration database synchronization between the primary and secondary nodes.
In this first part of the series, we’ll walk through the initial HA cluster configuration: checking the prerequisites, enabling High Availability on both appliances, creating the cluster in the Veeam Backup & Replication Console, and connecting to the new cluster endpoint.
Before starting the configuration, make sure both nodes meet the required conditions: the same Veeam version, proper forward and reverse DNS records, static IP addresses in the same subnet, and a Veeam Data Platform Premium license. Local repositories should not be used within the HA cluster.
Prerequisites
To configure a working Veeam High Availability Cluster, you need the following:
Linux-based Veeam Backup Server – the configuration of the HA cluster works only with the Linux appliance version of the Veeam Backup Server. A mixed cluster with Windows and Linux Backup Servers is not supported. If a Veeam appliance is already configured and in use in the backup infrastructure, you can deploy a new appliance to use as a secondary node. Keep in mind that local repositories cannot be used within the HA cluster.
Veeam version – both nodes must have the same Veeam version installed before creating the HA cluster.
Veeam Backup console – to manage the HA cluster, the Veeam console installed on a Windows machine is required.
DNS Server – the cluster, primary node and secondary node must be defined in the DNS records in both forward and reverse zones.
Layer 2 network – both nodes must reside in the same subnet (layer 2) to establish proper communication.
License – the Veeam Data Platform Premium License is required to create the HA cluster.
How Veeam High Availability cluster works
Once the HA cluster has been created, Veeam leverages a PostgreSQL function to establish which is the primary and which is the secondary node of the cluster. Then, the synchronization between the HA nodes takes place. Changes in the Veeam configuration are always written to the primary node first and then replicated to the secondary node.
If the secondary node goes offline for more than 10 minutes, a warning is displayed on the notification bar of the primary node and an email alert is sent to the address specified in the global email notification settings. Another email will be sent once the secondary nodes comes back online.
Upgrading the HA cluster
Veeam leverages the Veeam Updater service to upgrade the HA cluster. During the upgrade operation, the primary node is upgraded first then the updates are synchronized with the secondary node. Automatic updates on the secondary node are disabled.
Updates available in the secondary node are compared with the updates installed on the primary node, and then only the necessary updates are installed on the secondary node.
Configure the Veeam High Availability cluster
Deploy two Veeam Software Appliances that will serve as primary and secondary node of the High Availability Cluster. The secondary node must be a new installation and cannot contain any existing backup data.
Enable High Availability
To create the HA cluster, you must submit a request to enable the High Availability on both servers. Using your preferred browser, access the Veeam Host Management at https://<IP_Address>:10443. Enter the correct credentials (veeamadmin in the example) and click Sign in.
Enter the MFA code, then click Sign in.
Go to Backup Infrastructure area and click the Submit Request in the High Availability section.
Click OK.
The request will now show a Waiting for approval status, pending approval by the Security Officer.
Login to the Veeam Host Management using the Security Officer credentials and click Sign in.
Enter the MFA code, then click Sign in.
Select the pending request and click Approve.
Once approved, exit Veeam Host Management.
If you login again using Backup Administrator credentials, the request will now be displayed as Request approved.
Repeat the same procedure for the secondary node.
Create the HA cluster
Using the Veeam Backup & Replication Console, access the primary node. In the Backup infrastructure area, go to Managed Servers > Linux. Right click the Veeam Backup Server and select Create HA cluster.
If the Create HA cluster option is not available, it means the Veeam Data Platform Premium License is not installed on the Backup Server.
Verify that the installed license is the Premium edition.
Enter the Cluster DNS name created in the DNS Server and the Virtual IP address. Click Next.
Select the Primary node IP address and type the Secondary node IP address. Specify the correct Credentials to use and click Next.
Click Continue.
Click Finish to start the initialization.
The Veeam High Availability Cluster is being created.
After the HA cluster has been created, you can easily identify the primary and secondary node.
Connect to the High Availability Cluster
Once the HA cluster has been created, you must access the infrastructure management by entering the DNS HA cluster name in the Veeam Console. Click Connect.
With the Veeam Backup Server in HA mode, you have the opportunity to quickly recover the backup infrastructure functionality if anything goes wrong with the primary node.
Part 2 will cover the manual failover procedure and the configuration required to trigger the failover operation automatically using Veeam ONE.
from StarWind Blog https://ift.tt/8a7SxRU
via IFTTT
We’ve all been there: you need to generate a few images for a project, you fire up an AI image service, and suddenly you’re wondering what happens to your prompts, how many credits you have left, or why that “safe content” filter rejected your perfectly reasonable request for a dragon wearing a business suit. What if you could skip all of that and run the whole thing on your own machine, with a slick chat UI on top?
That’s exactly what Docker Model Runner now makes possible. With a couple of commands you can pull an image-generation model, connect it to Open WebUI, and start generating images right from a chat interface fully local, fully private, fully yours.
Let’s build it. Your own private DALL-E, no cloud subscription required.
What You’ll Need
Docker Desktop (macOS) or Docker Engine (Linux)
~8 GB of free RAM for a small model (more is better)
GPU: optional but highly recommended, NVIDIA (CUDA), Apple Silicon (MPS), or CPU fallback
If you can run docker model version without errors, you’re good to go.
How Docker Model Runner works with Open WebUI
Before we dive in, here’s the big picture:
Docker Model Runner acts as the control plane. It downloads the model, manages the inference backend lifecycle, and exposes a 100% OpenAI-compatible API — including the POST /v1/images/generations endpoint that Open WebUI already knows how to talk to.
Step 1: Pull an Image Generation Model
Docker Model Runner uses a compact packaging format called DDUF (Diffusers Unified Format) to distribute image generation models through Docker Hub, just like any other OCI artifact.
What’s happening under the hood? The model is stored locally as a DDUF file, a single-file format that bundles all the components of a diffusion model (text encoder, VAE, UNet/DiT, scheduler config) into one portable artifact. Docker Model Runner knows how to unpack it at runtime.
Step 2: Launch Open WebUI
This is a magic trick. Docker Model Runner has a built-in launch command that knows exactly how to wire up Open WebUI against the local inference endpoint:
The model-runner.docker.internal hostname is a special DNS entry that Docker Desktop containers use to reach the Model Runner running on the host, no port-forwarding gymnastics required. If you use Docker CE, you’ll see the docker/model-runner container address instead of model-runner.docker.internal.
Open your browser at http://localhost:3000, create a local account (it stays offline), and you’ll land on the chat interface.
Tip: Want to run it in the background? Add –detach:
Open WebUI already uses Docker Model Runner for text chat automatically (it reads the OPENAI_API_BASE env var). For image generation you need to point it at the images endpoint too, a 30-second job in the settings UI.
Why the dummy API key? Docker Model Runner doesn’t require authentication, it’s a local service. The key is only there because Open WebUI’s form requires one. Any non-empty string works.
Step 4: Pull a Chat Model
Open WebUI is also a full-featured chat interface, and one of its best tricks is letting you ask the LLM to generate an image right from the conversation. For that to work, you need a language model too.
# Lightweight option — runs on almost any machine
docker model pull smollm2
# Recommended — more capable, better at understanding creative prompts
docker model pull gpt-oss
Both will show up automatically in the Open WebUI model selector. Use smollm2 if you’re tight on RAM, or gpt-oss if you want richer, more creative responses before image generation.
No extra configuration needed, Open WebUI picks up text models from the same OPENAI_API_BASE endpoint it was already configured with.
Step 5: Generate Your First Image
Head back to the main chat view. You’ll notice a small image icon in the message input bar.
Click it to toggle image generation mode, type your prompt, and send.
Try something like:
Create an image of a whale.
The first request takes a little longer while the backend loads the model into memory. After that, subsequent images generate much faster.
Open WebUI will automatically route image-generation requests to the diffusers backend and text requests to the language model, seamlessly, in the same conversation.
Step 6: Generate Images Directly via the API
For developers who want to integrate image generation into their own apps, Docker Model Runner exposes the standard OpenAI Images API directly:
curl -s -X POST http://localhost:12434/engines/diffusers/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "stable-diffusion",
"prompt": "A cat sitting on a couch",
"size": "512x512"
}'
The response follows the OpenAI Images API format exactly:
Resolution as WIDTHxHEIGHT (e.g., 512×512, 768×512)
n
Number of images to generate (1–10)
num_inference_steps
More steps = higher quality, slower (default: 50)
guidance_scale
How closely to follow the prompt (1–20, default: 7.5)
seed
Integer for reproducible results; omit for random
Pro tip: Set a seed while you’re iterating on a prompt. Once you’re happy with the composition, remove it to get unique variations.
Under the Hood: How the Diffusers Backend Works
When you first request an image, Docker Model Runner:
Unpacks the DDUF file: extracts the model components and loads them via DiffusionPipeline.from_pretrained()
Starts a FastAPI server: this is the server that Open WebUI and your curl commands talk to through Docker Model Runner
The server is installed on first use by downloading a self-contained Python environment from Docker Hub (version-pinned, so updates are explicit). It lives at ~/.docker/model-runner/diffusers/ — no Python version conflicts, no virtualenv setup.
Troubleshooting
The model takes forever to load on first use. That’s normal, the model weights are being loaded from disk and transferred to GPU memory. Subsequent requests in the same session are much faster because the backend stays warm.
I get a “No model loaded” 503 error Make sure the model is fully downloaded (docker model list) and that you’re sending the correct model name in the model field.
Image quality is poor / generations are too fast Increase num_inference_steps (try 20–50 steps). Higher values = slower but sharper results.
Open WebUI can’t connect to the image endpoint Double-check the URL in Admin Panel → Settings → Images. Inside a Docker container it must be http://model-runner.docker.internal/engines/diffusers/v1, not localhost.
Conclusion and What’s Next
Docker Model Runner makes local image generation simple. It packages and serves image models through an OpenAI-compatible API, while Open WebUI provides an easy chat interface on top. Together, they let you generate images privately on your own machine, either through the browser or directly through the API, without relying on a cloud service.
This feature opens up a lot of possibilities:
Multimodal workflows: Chat with a text model about an idea, then immediately generate an image of it — in the same Open WebUI conversation
RAG + image generation: Build a pipeline that generates illustrations for your documents
Custom models: The diffusers backend supports any DDUF-packaged model, so you can package your own fine-tuned models using Docker’s model packaging tools
The Docker Model Runner team is actively expanding model support on Docker Hub. Check docker model search for the latest available models.
Threat actors are actively exploiting a critical security flaw impacting an open-source content management system (CMS) known as MetInfo, according to new findings from VulnCheck.
The vulnerability in question is CVE-2026-29014 (CVSS score: 9.8), a code injection flaw that could result in arbitrary code execution.
"MetInfo CMS versions 7.9, 8.0, and 8.1 contain an unauthenticated PHP code injection vulnerability that allows remote attackers to execute arbitrary code by sending crafted requests with malicious PHP code," the NIST National Vulnerability Database (NVD) states.
"Attackers can exploit insufficient input neutralization in the execution path to achieve remote code execution and gain full control over the affected server."
Per security researcher Egidio Romano, who discovered the vulnerability, the problem is rooted in the "/app/system/weixin/include/class/weixinreply.class.php" script, and stems from a lack of adequate sanitization of user-supplied input when issuing Weixin (aka WeChat) API requests.
As a result, remote, unauthenticated attackers could exploit this loophole to inject and execute arbitrary PHP code. One key prerequisite for successful exploitation when MetInfo is running on non-Windows servers is that the "/cache/weixin/" directory has to exist beforehand.The directory is created when installing and configuring the official WeChat plugin.
Patches for CVE-2026-29014 were released by MetInfo on April 7, 2026. The vulnerability has since come under exploitation as of April 25, with a "small number of exploits" deployed against susceptible honeypots located in the U.S. and Singapore.
Although these efforts were initially sparse and associated with automated probing, the activity witnessed a surge on May 1, 2026, focusing on China and Hong Kong IP addresses, Caitlin Condon, vice president of security research at VulnCheck, said. As many as 2,000 instances of MetInfo CMS are accessible online, most of which are in China.
from The Hacker News https://ift.tt/jdylYz0
via IFTTT
Cisco Talos discovered an intrusion, active since at least January 2026, where an unknown attacker implanted a CloudZ remote access tool (RAT) and a previously undocumented plugin called “Pheno.”
According to the functionalities of the CloudZ RAT and Pheno plugin, this was with the intention of stealing victims’ credentials and potentially one-time passwords (OTPs).
CloudZ utilizes the custom Pheno plugin to hijack the established PC-to-phone bridge by abusing the Microsoft Phone Link application, allowing the plugin to continuously scan for active Phone Link processes and potentially intercept sensitive mobile data like SMS and OTPs without deploying malware on the phone.
CloudZ evades detection by executing critical malicious functions dynamically in system memory and performing checks to avoid debuggers and sandbox environments.
Attacker abuses the Windows Phone Link application
Windows Phone Link (formerly "Your Phone") is a synchronization tool developed by Microsoft and built directly into Windows 10 and 11 that bridges a PC and a smartphone (Android or iPhone). By establishing a secure connection via Wi-Fi and Bluetooth, the application mirrors essential phone activities (such as application notifications and SMS messages) onto the computer screen, reducing the user’s need to physically interact with the mobile device while working on the computer. The Phone Link application writes synchronized phone data such as SMS messages, call logs, and the application notification history to the Windows PC in the application’s SQLite database file.
Talos observed that during an intrusion, an attacker attempted to abuse the Windows Phone Link application using the CloudZ RAT and its Pheno plugin. The Pheno plugin is designed to monitor an active PC-to-phone bridge established by the Phone Link application on the victim machine. With a confirmed Phone Link activity on the victim's machine, the attacker using the CloudZ RAT can potentially intercept the Phone Link application’s SQLite database file (e.g., “PhoneExperiences-*.db”) on the victim machine, potentially compromising SMS-based OTP messages and other authenticator application notification messages.
Intrusion summary of CloudZ infection
Talos discovered from telemetry data that the intrusion had begun with an unknown initial access vector to the victim's environment, which led to the execution of a fake ScreenConnect application update executable. This malicious executable drop and executes an intermediate .NET loader executable, which subsequently deploys the modular CloudZ on the victim’s machine. Upon execution, the RAT decrypts its configuration data, establishes an encrypted socket connection to the command-and-control (C2) server, and enters its command dispatcher mode.
CloudZ facilitates the C2 commands to exfiltrate credentials from the victim machine browser data, and it downloads and implants a plugin. The plugin performs reconnaissance of the Microsoft Phone Link application on the victim machine and writes the reconnaissance data to an output file in a staging folder. CloudZ reads back the Phone Link application data from the staging folder and sends it to the C2 server.
Rust-compiled executable used as a dropper
Talos discovered a Rust-compiled 64-bit executable, disguised with file names such as “systemupdates.exe” or “Windows-interactive-update.exe”, functioning as a loader. The malicious loader was compiled on Jan. 1, 2026, and has the developer string of rustextractor.pdb.
When the loader is run on the victim machine, it decrypts and drops an embedded .NET loader binary disguised as a text file with the file names “update.txt” or “msupdate.txt” in the folder “C:\ProgramData\Microsoft\windosDoc\”.
Figure 1. Excerpt of rusty dropper code.
In another instance, Talos observed that the .NET loader was implanted in the victim machine by downloading it from an attacker-controlled staging server using the command shown below:
The dropper executes an embedded PowerShell script to establish persistence on the victim machine through a Windows task which executes the dropped malicious .NET loader. The PowerShell script achieves it by initially performing a runtime check to determine whether the dropped .NET loader is already active on the system. It queries all running processes using the Get-CimInstance Win32_Process command and filters for any instance of regasm.exe with the command line parameters that include the string update.txt. If such an instance is found, the script silently exits without taking any action.
If the check indicates that the .NET loader is not running, the script proceeds to establish persistence by creating a scheduled task named SystemWindowsApis in the scheduled task folder \Microsoft\Windows\. It configures the task to trigger at system startup /sc onstart, execute under the SYSTEM account /ru SYSTEM with the highest privilege level /rl HIGHEST, and the /f flag ensures it will silently overwrite any existing task with the same name, allowing the malware to update its persistence mechanism. The script configures the task scheduler action to run the .NET loader by utilizing the living-off-the-land binary (LOLBin) regasm.exe, which is the .NET Framework Assembly Registration Utility located at “C:\WINDOWS\Microsoft.NET\Framework64\v4.0.30319\”. It provides the path of the dropped .NET loader as the argument to regasm.exe with the /nologo flag. After creating the task, the script immediately triggers it with schtasks /run, ensuring it executes immediately and survives future reboots.
Figure 2. Excerpt of the PowerShell script to establish persistence on victim machines.
.NET loader implants the CloudZ RAT
Talos found that the attacker embedded CloudZ, an encrypted .NET-compiled RAT, in the .NET loader executable.
When the .NET loader is triggered through the Windows task scheduler, it performs the detection evasion checks beginning with a timing-based evasion check, where it calculates the actual elapsed time of a sleep command to detect if it is executed in the analysis environment. It then performs enumeration of running processes in the victim machine against a list of security tools, including network sniffers like Wireshark and Fiddler, as well as system monitors like Procmon and Sysmon. The .NET loader exits the execution if these are detected in the victim environment.
Figure 3. Excerpt of the .NET loader binary with detection evasion instructions.
The loader then conducts hardware and environment checks to identify virtual machine (VM) or sandbox characteristics. It verifies that the system has at least two processor cores and searches for strings like “VIRTUAL” or “SANDBOX” within the system directory path, computer name, user domain, and the current victim username.
Figure 4. Excerpt of the .NET loader binary with detection evasion instructions.
The loader executable is embedded with multiple chunks of the hexadecimal strings in the binary, which are concatenated sequentially during the execution, reassembling a massive hexadecimal data blob. The loader converts the hexadecimal strings to bytes and performs bytewise XOR decryption using the key hexadecimal (0xCA). If the decrypted payload is a .NET assembly, the loader will reflectively run. Otherwise, it writes the decrypted payload to the folder “%TEMP%\{GUID}” and runs it as a process.
Figure 5. Excerpt of the .NET loader to execute the .NET payload module. Figure 6. Excerpt of the .NET loader to execute the non .NET payload executables.
Modular CloudZ RAT delivered as payload
Talos discovered that a CloudZ, a modular RAT, is delivered as the payload in the current intrusion. CloudZ is a .NET executable compiled on Jan. 13, 2026, and is obfuscated with ConfuserEx obfuscation.
Figure 7. The RAT binary shows the malware name, CloudZ.
CloudZ employs layers of defense against the analysis environments and reverse engineering. It queries the _ENABLE_PROFILING environment variable via GetEnvironmentVariable Windows API to detect whether a .NET profiler or debugger is attached to the RAT process on the victim machine. It uses the .NET method “System.Reflection.Emit.DynamicMethod” combined with “ILGenerator” method to create the executable functions dynamically during the RAT execution.
The operation of CloudZ utilizes its configuration data, which is embedded in the binary, as a resource that it decrypts and loads into memory during execution. The decrypted configuration data includes various C2 commands, PowerShell scripts for data archive extraction, multiple file download methods, paths and names of staging folders, multiple HTTP headers, and the URLs of the staging servers.
Figure 7. CloudZ primary configuration data decrypted in memory.
After the decryption of the configuration data, CloudZ decodes the Base64-encoded strings to get the URL of the staging server where the secondary configuration is stored.
Figure 8. CloudZ function that downloads the secondary configuration data from the staging server.
Talos found that the RAT downloads and processes secondary configuration data through the URLs “hxxps[://]round-cherry-4418[.]hellohiall[.]workers[.]dev/?t=1773406370” or "https[://]pastebin[.]com/raw/8pYAgF0Z?t=1771833517" and extracts the C2 server IP address “185[.]196[.]10[.]136” and port number 8089, establishing connections through TCP sockets.
Pivoting on the Pastebin URL indicator, we found that the attacker used the Pastebin handler name “HELLOHIALL” and hosted the secondary configuration data at several Pastebin URLs.
The RAT rotates between three hardcoded user-agent strings to blend its HTTP traffic with the legitimate browser requests of the victim machine. Every HTTP request includes anti-caching headers consisting of “Cache-Control: no-cache, no-store, must-revalidate", “Pragma: no-cache", and “Expires: 0”, which prevents intermediate proxies and CDN infrastructure from caching C2 or the staging server details.
User-agent headers used by the CloudZ are:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
After the RAT establishes the C2 connection, it enters the command dispatcher module that relies on a decrypted configuration data loaded into memory. The configuration data contains Base64-encoded command identifiers which the RAT matches against the commands received from the C2 server to perform the several functionalities. The commands facilitated by CloudZ are shown in the table below:
Base64-encoded command
Decoded command
Purpose
cG9uZw==
pong
Heartbeat response
UElORyE=
PING!
Heartbeat request
Q0xPU0U=
CLOSE
TerminateRAT process
SU5GTw==
INFO
collects OS edition, architecture, and hardware details from the victim machine
UnVuU2hlbGw=
RunShell
Execute shell command
QnJvd3NlclNlYXJjaA==
BrowserSearch
Browser dataexfiltration
R2V0V2lkZ2V0TG9n
GetWidgetLog
Phone Link recon logs and dataexfiltration
cGx1Z2lu
plugin
Loadplugin
c2F2ZVBsdWdpbg==
savePlugin
Saveplugin to disk at the staging directory C:\ProgramData\Microsoft\whealth\
c2VuZFBsdWdpbg==
sendPlugin
Upload Pluginto C2
UmVtb3ZlUGx1Z2lucw==
RemovePlugins
Removeall deployedpluginmodules
UmVjb3Zlcnk=
Recovery
Recoveryor reconnect routine
RFc=
DW
Download and write file operations
Rk0=
FM
File managementoperations–deletefile
TE4=
LN
Unknown
TXNn
Msg
Send message to C2
RXJyb3I=
Error
Error reporting back to C2
cmVj
rec
Screen recording
The RAT employs various methods to download and execute the plugins. The plugin download feature of RAT uses a three-method fallback approach. It first checks for the presence of the curl utility. If found, it attempts to download the file from a specified URL to a target path while following redirects. If curl is missing or the command fails, it falls back to PowerShell, where it first tries to download the file using the Invoke-WebRequest command. If that method also fails, it executes a final method that uses the LOLBin“bitsadmin” tool to download and save the plugin payloads to the victim machine.
Figure 11. CloudZ’s embedded PowerShell command with three different approaches to download operation.
Talos observed from the telemetry data that the attacker has downloaded and implanted the Pheno plugin through the curl command from the staging server.
Pheno plugin to perform the Phone Link application recon
In this intrusion, Talos observed that the attacker used a plugin called Pheno to perform reconnaissance of the Windows Phone Link application in the victim machine.
Pheno is designed to detect if a user is currently syncing their mobile device to a Windows machine through the Phone Link application. It scans all running processes for specific keywords such as "YourPhone," "PhoneExperienceHost," or "Link to Windows," and if matches are found, it logs their Process IDs and file paths to the files with the filename “phonelink-<COMPUTERNAME>.txt”, created in two staging folders such as :
C:\programdata\Microsoft\feedback\cm
%TEMP%\Microsoft\feedback\cm
Figure 11. Pheno recon plugin that monitors an active PC-to-phone bridge through Phone Link application.
After checking Phone Link processes and writing its results, Pheno executes a secondary check that reads back the contents of previously written files and searches the keyword "proxy" in a case-insensitive manner. The plugin conducts this check because the Microsoft Phone Link application creates a local proxy connection to relay traffic between the PC and the paired mobile device. The presence of "proxy" in the output files, whether generated by a previous execution of the pheno plugin, indicates that the Phone Link session is actively routing traffic through its relay channel.
When the keyword is detected, the pheno plugin writes "Maybe connected" to its output file in the staging folders, which eventually allows the attacker, with the help of CloudZ RAT, to potentially monitor SMS or OTP requests that appear on the Phone Link application.
Figure 12. Pheno checking for a previous instance of PC-to-phone bridge through Phone Link application.
Coverage
The following ClamAV signature detects and blocks this threat:
Win.Packed.Msilheracles-10030690-0
Win.Trojan.CloudZRAT-10059935-0
Win.Trojan.CloudZRAT-10059959-0
The following Snort Rules (SIDs) detect and block this threat:
Snort 2: 66409, 66410, 66408
Snort 3: 301492, 66408
Indicators of compromise (IOCs)
The IOCs for this threat are available at our GitHub repository here.
from Cisco Talos Blog https://ift.tt/wMZeYEB
via IFTTT
While the software industry has made genuine strides over the past few decades to deliver products securely, the furious pace of AI adoption is putting that progress at risk. Businesses are moving fast to self-host LLM infrastructure, drawn by the promise of AI as a force multiplier and the pressure to deliver more value faster. But speed is coming at the expense of security.
In the wake of the ClawdBot fiasco — the viral self-hosted AI assistant that’s averaging an eye-watering 2.6 CVEs per day — the Intruder team wanted to investigate how bad the security of AI infrastructure actually is.
To scope the attack surface, we used certificate transparency logs to pull just over 2 million hosts with 1 million exposed services. What we found wasn’t pretty. In fact, the AI infrastructure we scanned was more vulnerable, exposed, and misconfigured than any other software we've ever investigated.
No authentication by default
It didn’t take long to spot an alarming pattern: a significant number of hosts had been deployed straight out of the box, with no authentication in place. Looking into the source code revealed why: authentication simply isn't enabled by default in many of these projects.
Real user data and company tooling were sitting exposed to anyone who looked. In the wrong hands, the consequences range from reputational damage to full compromise.
Here are some of the most striking examples of what was exposed.
Freely accessible chatbots
A number of instances involved chatbots that left user conversations exposed. One example, based on OpenUI, exposed a user's full LLM conversation history. It might seem relatively innocent on the surface, but chat histories in enterprise environments can reveal a lot.
More concerning were generic chatbots hosting a wide range of models — including multimodal LLMs — freely available to use. Malicious users can jailbreak most models to bypass safety guardrails for nefarious purposes — like generating illegal imagery, or soliciting advice with intent to commit a crime — and do so without fear of repercussion, since they're using someone else's infrastructure. This isn't hypothetical. People are finding creative ways to abuse company chatbots to access more capable models without paying or having requests logged to their own accounts.
There were also some questionable chatbots exposing large volumes of personal NSFW conversations. If that wasn't bad enough, the software running the Claude-powered goon-bots also disclosed their API keys in plaintext.
Wide open agent management platforms
We also discovered exposed instances of agent management platforms, including n8n and Flowise. Some instances that users clearly thought were internal had been exposed to the internet without authentication. One of the most egregious examples was a Flowise instance that exposed the entire business logic of an LLM chatbot service.
Their credential list was exposed too. Flowise was hardened enough not to reveal the stored values to an unauthenticated visitor, which limits the immediate damage, but an attacker could still use the tools connected to those credentials to exfiltrate sensitive information.
This is what makes these platforms particularly dangerous. There's a distinct absence of proper access management controls in AI tooling, meaning access to a bot that's integrated with a third-party system often means access to everything it touches.
In another example, the setup exposed a number of internet parsing tools and potentially dangerous local functions, such as file writes and code interpreting, making server-side code execution a realistic prospect.
We identified over 90 exposed instances across sectors such as government, marketing, and finance. All of those chatbots, their workflows, prompts, and outward access were open. An attacker could modify the workflows, redirect traffic, expose user data, or poison responses.
Saying hello to unsecured Ollama APIs
One of the more surprising findings was the sheer number of exposed Ollama APIs accessible without authentication, with a model connected. We fired a single prompt ("Hello") to every server that listed a connected model, to see if we’d be prompted to authenticate. Of the 5,200+ servers queried, 31% answered.
The responses gave a window into what these APIs were being used for. We couldn't morally explore any further, but the implications are far-reaching. A few examples:
"Greetings, Master. Your command is my law. What is your desire? Speak freely. I am here to fulfill it, without hesitation or question."
"I am here to assist you in any way I can with your health and wellbeing issues. Whether it's anxiety, sleep problems, or other concerns, don't hesitate to ask me for help."
"Welcome! I'm an AI assistant integrated with our cloud management systems. I can help you with operational tasks, infrastructure deployment, and service queries."
Ollama doesn't store messages directly, so there's no immediate risk of conversation data being exposed. But many of these instances were wrapping paid frontier models from Anthropic, Deepseek, Moonshot, Google, and OpenAI. Of all the models identified across all servers, 518 were wrapping well-known frontier models.
Insecure by design
After triaging the results, it was clear that some of the tech warranted a closer look. We spent time analyzing a subset of the applications in a lab environment — and found repeated insecure patterns throughout:
No authentication on fresh installs: Many projects drop users straight into a high-privilege account with full management access
Hardcoded and static credentials: Embedded in setup examples and docker-compose files rather than generated on installation
New technical vulnerabilities: Within a couple of days of lab work, we had already found arbitrary code execution in one popular AI project
These misconfigurations are made even worse when agents have access to tools like code interpretation. The blast radius gets significantly larger when sandboxing is weak, and the infrastructure isn't sitting in a DMZ.
Speed is winning. Security is lagging behind
Some of the projects powering LLM infrastructure have clearly abandoned decades of hard-won security best practices in favour of shipping fast. That said, it's not purely a vendor problem. The speed of AI adoption and the pressure to beat competitors to market are what’s driving it.
Found this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Google News, Twitter and LinkedIn to read more exclusive content we post.
from The Hacker News https://ift.tt/OVwm1tn
via IFTTT