Overview
This guide provides developers with complete technical documentation for the RM-01 portable supercomputer, covering system architecture, network configuration, model deployment and other core content:Network Configuration
Configure network connections for data interaction between device and host
System Architecture
Understand how the inference module, application module and management chip work together
Model Deployment
Master AI model deployment, configuration and optimization methods
After the device is powered on and connected to the host via the USB Type-C interface, the system will automatically configure the local network subnet. The user host will be assigned a static IP address
10.10.99.100, and the Out-of-Band Management Chip will have a static IP address of 10.10.99.97. The Inference Module (IP: 10.10.99.98) and the Application Module (IP: 10.10.99.99)—both deploy independent SSH services, allowing users to access them directly via standard SSH clients (e.g., OpenSSH, PuTTY). The Management Module, however, requires access via a serial port tool.Network Configuration
About the Out-of-Band Management Chip
Network Configuration
- IP Address:
10.10.99.97 - Access Method: Web browser
How to Provide Internet Access to RM-01 from the Host (Using macOS as an Example)
After connecting the user host via USB Type-C, RM-01 will appear in the network interface list as:AX88179A(Developer Version)RMinte RM-01(Commercial Release Version)
1
Open System Settings
Open System Settings
2
Go to Network Sharing
Go to Network → Sharing
3
Enable Internet Sharing
Enable Internet Sharing
4
Configure Sharing Settings
Click the “i” icon next to the sharing settings to enter the configuration interface:
- Set “Share your connection from” to: Wi-Fi
- In “To computers using”, select: AX88179A or RMinte RM-01 (depending on the device model)
5
Complete Configuration
Click Done
6
Manually Configure Network Interface
Return to the Network settings page and manually configure the RM-01 network interface:
- IP Address:
10.10.99.100 - Subnet Mask:
255.255.255.0 - Router:
10.10.99.100(i.e., the host’s own IP)
This configuration sets the host as a gateway, providing NAT network access for RM-01. The default gateway and DNS for RM-01 are automatically assigned by the host via DHCP. Manually setting the IP ensures that it remains within the
10.10.99.0/24 subnet, consistent with the device’s internal service communication.System Architecture
About the CFexpress Type-B Storage Card
The CFexpress Type-B storage card is one of the core components of the RM-01 device, responsible for system boot, deployment of the inference framework, and key functions such as ISV/SV software distribution and authorization authentication. The storage card is divided into three independent partitions:rm01rootfs
System PartitionThe operating system and core runtime environment of the Inference Module are installed in this partition.
rm01app
Application PartitionThis partition is used to temporarily store
Docker image files submitted by users or developers. After the image is written to rm01app, the RM-01 system will automatically migrate it to the device’s built-in NVMe SSD storage and complete containerized deployment.rm01models
Model PartitionDedicated to storing large-scale AI models (e.g., LLMs, multimodal models, etc.) loaded by users or developers.
For details on model formats, size limitations, loading procedures, and compatibility requirements, refer to the “Model Deployment” section below.
About the Application Module
Network Configuration
- IP Address:
10.10.99.99 - Port Range:
59000-59299
Application Module Hardware Specifications
Application Module SSH Access Credentials
Pre-installed Software
The Application Module has Open WebUI pre-installed on port80 to facilitate simple model debugging and conversational work.
About the Inference Module
Network Configuration
- IP Address:
10.10.99.98 - Service Port Range:
58000–58999
Hardware Configuration Options
| Memory | Memory Bandwidth | Compute Power | Tensor Core Count |
|---|---|---|---|
| 32 GB | 204.8 GB/s | 200 TOPS (INT8) | 56 |
| 64 GB | 204.8 GB/s | 275 TOPS (INT8) | 64 |
| 64 GB | 273 GB/s | 1,200 TFLOPS (FP4) | 64 |
| 128 GB | 273 GB/s | 2,070 TFLOPS (FP4) | 96 |
Pre-installed Inference Frameworks
RM-01 comes pre-installed with the following two inference frameworks on the CFexpress Type-B storage card, both running on the Inference Module:- vLLM
- TEI (Text Embedding Inference)
- Status: Automatically starts
- Default Port: 58000
- Function: Provides OpenAI-compatible API interfaces
- Supported Requests: Standard POST
/v1/chat/completionsetc.
API Access Method
After successfully loading a model, the vLLM inference service can be accessed via the following address:Model Deployment
About Models
RM-01 supports inference for various AI models, including but not limited to:LLM
Large Language Models
MLM
Multimodal Models
VLM
Vision-Language Models
Embedding
Text Embedding Models
Reranker
Reranker Models
All model files must be stored on the device’s built-in CFexpress Type-B storage card, and users need to use a compatible CFexpress Type-B card reader to upload, manage, and update models on the host side.
models at the path /home/rm01/models. Its standard file structure is as follows:
- The
auto/directory is used for lightweight, standardized model deployment, automatically recognized by the system - The
dev/directory is used for fine-grained control of model behavior by developers, with higher priority thanauto/. The system will ignore models inauto/ifdev/is used
Deployment Mode Selection
- Automatic Mode (auto)
- Manual Mode (dev)
Simplified mode suitable for quick verification and standardized deployment.
Usage
Usage
Place the complete weight files of the model (e.g.,
.safetensors, .bin, .pt, .awq, etc.) directly in the auto/llm/ directory, without nesting in subfolders.System Behavior
System Behavior
- Upon device startup, the system scans the
auto/llm/directory and automatically loads models in compatible formats - Automatic loading is not supported for embedding or reranker models, only LLMs
- After loading, the model enables basic inference capabilities by default and does not enable the following advanced features:
- Speculative Decoding
- Prefix Caching
- Chunked Prefill
- The maximum context length (
max_model_len) is restricted to the system’s safe threshold (typically ≤ 8192 tokens)
- Limited performance optimization: To ensure system stability and multitasking concurrency, models in automatic mode use a conservative memory allocation strategy (
gpu_memory_utilization≤ 0.8)
Security and Maintenance Notes
Mode Selection Recommendations
| Scenario | Recommended Mode | Description |
|---|---|---|
| Quickly verify model compatibility | Automatic Mode (auto) | No configuration required, plug and play |
| High-performance inference in production | Manual Mode (dev) + fine-tuned configuration | Full performance optimization |
| Multi-model parallel deployment | Manual Mode (dev) + multiple .yaml files | Flexible service orchestration |
| Development debugging, prototype validation | Manual Mode (dev) | Complete control |
Technical Support
Developer Documentation
Complete API reference and technical documentation
GitHub Repository
Sample code and open-source tools
Technical Forum
Developer community and technical discussions
Technical Support
Professional technical support services
© 2025 Panidea (Chengdu) Artificial Intelligence Technology Co., Ltd. All rights reserved.