Overview
This guide provides developers with complete technical documentation for the RM-01 portable supercomputer, covering system architecture, network configuration, model deployment and other core content:Network Configuration
Configure network connections for data interaction between device and host
System Architecture
Understand how the inference module, application module and management chip work together
Model Deployment
Master AI model deployment, configuration and optimization methods
Read Before UseRM-01 consists of an Inference Module, an Application Module, and an Encryption and Management Chip (hereinafter referred to as the Management Module), interconnected via an onboard Ethernet switch chip, forming an internal LAN subnet. When a user connects to a host (e.g., PC, smartphone, iPad) via the USB Type-C interface, RM-01 will virtualize an Ethernet interface for the host through USB Ethernet functionality. The host will then obtain an IP address and automatically join the subnet for data interaction.
After the device is powered on and connected to the host via the USB Type-C interface, the system will automatically configure the local network subnet. The user host will be assigned a static IP address
10.10.99.100, and the Out-of-Band Management Chip will have a static IP address of 10.10.99.97. The Inference Module (IP: 10.10.99.98) and the Application Module (IP: 10.10.99.99)—both deploy independent SSH services, allowing users to access them directly via standard SSH clients (e.g., OpenSSH, PuTTY). The Management Module, however, requires access via a serial port tool.Network Configuration
About the Out-of-Band Management Chip
Network Configuration
- IP Address:
10.10.99.97 - Access Method: Web browser
You can access
10.10.99.97 through a web browser to monitor the connection status and operational status of each module in real-time.How to Provide Internet Access to RM-01 from the Host (Using macOS as an Example)
After connecting the user host via USB Type-C, RM-01 will appear in the network interface list as:AX88179A(Developer Version)RMinte RM-01(Commercial Release Version)
1
Open System Settings
Open System Settings
2
Go to Network Sharing
Go to Network → Sharing
3
Enable Internet Sharing
Enable Internet Sharing
4
Configure Sharing Settings
Click the “i” icon next to the sharing settings to enter the configuration interface:
- Set “Share your connection from” to: Wi-Fi
- In “To computers using”, select: AX88179A or RMinte RM-01 (depending on the device model)
5
Complete Configuration
Click Done
6
Manually Configure Network Interface
Return to the Network settings page and manually configure the RM-01 network interface:
- IP Address:
10.10.99.100 - Subnet Mask:
255.255.255.0 - Router:
10.10.99.100(i.e., the host’s own IP)
This configuration sets the host as a gateway, providing NAT network access for RM-01. The default gateway and DNS for RM-01 are automatically assigned by the host via DHCP. Manually setting the IP ensures that it remains within the
10.10.99.0/24 subnet, consistent with the device’s internal service communication.System Architecture
About the CFexpress Type-B Storage Card
The CFexpress Type-B storage card is one of the core components of the RM-01 device, responsible for system boot, deployment of the inference framework, and key functions such as ISV/SV software distribution and authorization authentication. The storage card is divided into three independent partitions:rm01rootfs
System PartitionThe operating system and core runtime environment of the Inference Module are installed in this partition.
Users or developers are strictly prohibited from accessing, modifying, or deleting the contents of this partition. Any unauthorized changes may cause the Inference Module to fail to boot or render inference functions inoperable, and any resulting hardware or software damage is not covered by any warranty services.
rm01app
Application PartitionThis partition is used to temporarily store
Docker image files submitted by users or developers. After the image is written to rm01app, the RM-01 system will automatically migrate it to the device’s built-in NVMe SSD storage and complete containerized deployment.Do not directly run or modify application files in this partition.
rm01models
Model PartitionDedicated to storing large-scale AI models (e.g., LLMs, multimodal models, etc.) loaded by users or developers.
For details on model formats, size limitations, loading procedures, and compatibility requirements, refer to the “Model Deployment” section below.
About the Application Module
Network Configuration
- IP Address:
10.10.99.99 - Port Range:
59000-59299
Application Module Hardware Specifications
Application Module SSH Access Credentials
Security NoticeTo ensure system security, immediately use the
passwd command to change the default password after the first SSH login. The default password is only for initial configuration and must not be used in production or deployment environments.Pre-installed Software
The Application Module has Open WebUI pre-installed on port80 to facilitate simple model debugging and conversational work.
You can access Open WebUI by navigating to
10.10.99.99 in your web browser.About the Inference Module
Network Configuration
- IP Address:
10.10.99.98 - Service Port Range:
58000–58999
Hardware Configuration Options
| Memory | Memory Bandwidth | Compute Power | Tensor Core Count |
|---|---|---|---|
| 32 GB | 204.8 GB/s | 200 TOPS (INT8) | 56 |
| 64 GB | 204.8 GB/s | 275 TOPS (INT8) | 64 |
| 64 GB | 273 GB/s | 1,200 TFLOPS (FP4) | 64 |
| 128 GB | 273 GB/s | 2,070 TFLOPS (FP4) | 96 |
Pre-installed Inference Frameworks
RM-01 comes pre-installed with the following two inference frameworks on the CFexpress Type-B storage card, both running on the Inference Module:- vLLM
- TEI (Text Embedding Inference)
- Status: Automatically starts
- Default Port: 58000
- Function: Provides OpenAI-compatible API interfaces
- Supported Requests: Standard POST
/v1/chat/completionsetc.
API Access Method
After successfully loading a model, the vLLM inference service can be accessed via the following address:Supports direct calls using standard OpenAI clients (e.g., openai-python, curl, Postman).
Security NoticeTo ensure system security and stability, the Inference Module does not provide SSH access permissions. Users and developers cannot directly log in or interactively operate the underlying operating system of this module.Any attempts to bypass security policies or directly access the Inference Module may result in system anomalies, data corruption, or service interruptions, which are not covered by warranty services.
Model Deployment
About Models
RM-01 supports inference for various AI models, including but not limited to:LLM
Large Language Models
MLM
Multimodal Models
VLM
Vision-Language Models
Embedding
Text Embedding Models
Reranker
Reranker Models
All model files must be stored on the device’s built-in CFexpress Type-B storage card, and users need to use a compatible CFexpress Type-B card reader to upload, manage, and update models on the host side.
models at the path /home/rm01/models. Its standard file structure is as follows:
- The
auto/directory is used for lightweight, standardized model deployment, automatically recognized by the system - The
dev/directory is used for fine-grained control of model behavior by developers, with higher priority thanauto/. The system will ignore models inauto/ifdev/is used
Deployment Mode Selection
- Automatic Mode (auto)
- Manual Mode (dev)
Simplified mode suitable for quick verification and standardized deployment.
Usage
Usage
Place the complete weight files of the model (e.g.,
.safetensors, .bin, .pt, .awq, etc.) directly in the auto/llm/ directory, without nesting in subfolders.System Behavior
System Behavior
- Upon device startup, the system scans the
auto/llm/directory and automatically loads models in compatible formats - Automatic loading is not supported for embedding or reranker models, only LLMs
- After loading, the model enables basic inference capabilities by default and does not enable the following advanced features:
- Speculative Decoding
- Prefix Caching
- Chunked Prefill
- The maximum context length (
max_model_len) is restricted to the system’s safe threshold (typically ≤ 8192 tokens)
- Limited performance optimization: To ensure system stability and multitasking concurrency, models in automatic mode use a conservative memory allocation strategy (
gpu_memory_utilization≤ 0.8)
Important NoteAutomatic mode is suitable for quickly verifying model compatibility or standardized deployment scenarios, but it is not suitable for high-performance inference in production environments. For full performance, use manual mode (dev).
Security and Maintenance Notes
- Prohibited SSH Login to Inference Module: All model management must be done via the CFexpress Type-B storage card
- Model Files Must Be Raw Weights: Do not use compressed files (.zip/.tar.gz), encrypted packages, or non-standard formats
- File Permissions: All model files must be readable (
chmod 644), and directories must be executable (chmod 755)
- Version Control: It is recommended to use Git or file naming conventions (e.g.,
Qwen3-30B-A3B-Instruct-v1.2-20250930) to manage model versions - Backup Recommendation: Back up the
dev/andauto/directories before updating models to avoid configuration loss
Mode Selection Recommendations
| Scenario | Recommended Mode | Description |
|---|---|---|
| Quickly verify model compatibility | Automatic Mode (auto) | No configuration required, plug and play |
| High-performance inference in production | Manual Mode (dev) + fine-tuned configuration | Full performance optimization |
| Multi-model parallel deployment | Manual Mode (dev) + multiple .yaml files | Flexible service orchestration |
| Development debugging, prototype validation | Manual Mode (dev) | Complete control |
Technical Support
Developer Documentation
Complete API reference and technical documentation
GitHub Repository
Sample code and open-source tools
Technical Forum
Developer community and technical discussions
Technical Support
Professional technical support services
© 2025 Panidea (Chengdu) Artificial Intelligence Technology Co., Ltd. All rights reserved.