Skip to main content

Overview

This guide provides developers with complete technical documentation for the RM-01 portable supercomputer, covering system architecture, network configuration, model deployment and other core content:
Read Before UseRM-01 consists of an Inference Module, an Application Module, and an Encryption and Management Chip (hereinafter referred to as the Management Module), interconnected via an onboard Ethernet switch chip, forming an internal LAN subnet. When a user connects to a host (e.g., PC, smartphone, iPad) via the USB Type-C interface, RM-01 will virtualize an Ethernet interface for the host through USB Ethernet functionality. The host will then obtain an IP address and automatically join the subnet for data interaction.
After the device is powered on and connected to the host via the USB Type-C interface, the system will automatically configure the local network subnet. The user host will be assigned a static IP address 10.10.99.100, and the Out-of-Band Management Chip will have a static IP address of 10.10.99.97. The Inference Module (IP: 10.10.99.98) and the Application Module (IP: 10.10.99.99)—both deploy independent SSH services, allowing users to access them directly via standard SSH clients (e.g., OpenSSH, PuTTY). The Management Module, however, requires access via a serial port tool.

Network Configuration

About the Out-of-Band Management Chip

Network Configuration
  • IP Address: 10.10.99.97
  • Access Method: Web browser
In addition to handling certain encryption tasks, the Out-of-Band Management Chip also hosts the RM-01’s real-time system performance monitoring dashboardRobOS.
You can access 10.10.99.97 through a web browser to monitor the connection status and operational status of each module in real-time.

How to Provide Internet Access to RM-01 from the Host (Using macOS as an Example)

After connecting the user host via USB Type-C, RM-01 will appear in the network interface list as:
  • AX88179A (Developer Version)
  • RMinte RM-01 (Commercial Release Version)
1

Open System Settings

Open System Settings
2

Go to Network Sharing

Go to NetworkSharing
3

Enable Internet Sharing

Enable Internet Sharing
4

Configure Sharing Settings

Click the “i” icon next to the sharing settings to enter the configuration interface:
  • Set “Share your connection from” to: Wi-Fi
  • In “To computers using”, select: AX88179A or RMinte RM-01 (depending on the device model)
5

Complete Configuration

Click Done
6

Manually Configure Network Interface

Return to the Network settings page and manually configure the RM-01 network interface:
  • IP Address: 10.10.99.100
  • Subnet Mask: 255.255.255.0
  • Router: 10.10.99.100 (i.e., the host’s own IP)
This configuration sets the host as a gateway, providing NAT network access for RM-01. The default gateway and DNS for RM-01 are automatically assigned by the host via DHCP. Manually setting the IP ensures that it remains within the 10.10.99.0/24 subnet, consistent with the device’s internal service communication.

System Architecture

About the CFexpress Type-B Storage Card

The CFexpress Type-B storage card is one of the core components of the RM-01 device, responsible for system boot, deployment of the inference framework, and key functions such as ISV/SV software distribution and authorization authentication. The storage card is divided into three independent partitions:

rm01rootfs

System PartitionThe operating system and core runtime environment of the Inference Module are installed in this partition.
Users or developers are strictly prohibited from accessing, modifying, or deleting the contents of this partition. Any unauthorized changes may cause the Inference Module to fail to boot or render inference functions inoperable, and any resulting hardware or software damage is not covered by any warranty services.

rm01app

Application PartitionThis partition is used to temporarily store Docker image files submitted by users or developers. After the image is written to rm01app, the RM-01 system will automatically migrate it to the device’s built-in NVMe SSD storage and complete containerized deployment.
Do not directly run or modify application files in this partition.

rm01models

Model PartitionDedicated to storing large-scale AI models (e.g., LLMs, multimodal models, etc.) loaded by users or developers.
For details on model formats, size limitations, loading procedures, and compatibility requirements, refer to the “Model Deployment” section below.

About the Application Module

Network Configuration
  • IP Address: 10.10.99.99
  • Port Range: 59000-59299

Application Module Hardware Specifications

Processor: Intel Core i3-N305 (8 cores, 8 threads, base frequency 1.8 GHz, max turbo frequency 3.8 GHz)
Memory: 16 GB / 24 GB LPDDR5-4800MT/s (onboard, non-expandable)
Storage: 512 GB / 1 TB / 2 TB (optional) NVMe SSD

Application Module SSH Access Credentials

ssh rm01@10.10.99.99
# Default Username: rm01
# Default Password: rm01 (factory preset, for initial login only)
Security NoticeTo ensure system security, immediately use the passwd command to change the default password after the first SSH login. The default password is only for initial configuration and must not be used in production or deployment environments.

Pre-installed Software

The Application Module has Open WebUI pre-installed on port 80 to facilitate simple model debugging and conversational work.
You can access Open WebUI by navigating to 10.10.99.99 in your web browser.

About the Inference Module

Network Configuration
  • IP Address: 10.10.99.98
  • Service Port Range: 58000–58999
The Inference Module is the core computing unit of RM-01, supporting various high-performance AI inference configurations. Users can select the appropriate model based on model scale and performance requirements.

Hardware Configuration Options

MemoryMemory BandwidthCompute PowerTensor Core Count
32 GB204.8 GB/s200 TOPS (INT8)56
64 GB204.8 GB/s275 TOPS (INT8)64
64 GB273 GB/s1,200 TFLOPS (FP4)64
128 GB273 GB/s2,070 TFLOPS (FP4)96

Pre-installed Inference Frameworks

RM-01 comes pre-installed with the following two inference frameworks on the CFexpress Type-B storage card, both running on the Inference Module:
  • vLLM
  • TEI (Text Embedding Inference)
  • Status: Automatically starts
  • Default Port: 58000
  • Function: Provides OpenAI-compatible API interfaces
  • Supported Requests: Standard POST /v1/chat/completions etc.

API Access Method

After successfully loading a model, the vLLM inference service can be accessed via the following address:
http://10.10.99.98:58000/v1/chat/completions
Supports direct calls using standard OpenAI clients (e.g., openai-python, curl, Postman).
Security NoticeTo ensure system security and stability, the Inference Module does not provide SSH access permissions. Users and developers cannot directly log in or interactively operate the underlying operating system of this module.Any attempts to bypass security policies or directly access the Inference Module may result in system anomalies, data corruption, or service interruptions, which are not covered by warranty services.

Model Deployment

About Models

RM-01 supports inference for various AI models, including but not limited to:

LLM

Large Language Models

MLM

Multimodal Models

VLM

Vision-Language Models

Embedding

Text Embedding Models

Reranker

Reranker Models
All model files must be stored on the device’s built-in CFexpress Type-B storage card, and users need to use a compatible CFexpress Type-B card reader to upload, manage, and update models on the host side.
When the CFexpress Type-B storage card is connected to RM-01, the system mounts it as a read-only data volume named models at the path /home/rm01/models. Its standard file structure is as follows:
models/
├── auto/                  # Directory for automatic model loading (production-grade deployment)
   ├── embedding/         # Embedding models (not automatically loaded, see below)
   ├── llm/               # Large language models (weight files stored directly, see below)
   └── reranker/          # Reranker models (not automatically loaded, see below)
└── dev/                   # Developer-defined model directory (higher priority)
    ├── embedding/
    ├── llm/
    └── reranker/
  • The auto/ directory is used for lightweight, standardized model deployment, automatically recognized by the system
  • The dev/ directory is used for fine-grained control of model behavior by developers, with higher priority than auto/. The system will ignore models in auto/ if dev/ is used

Deployment Mode Selection

  • Automatic Mode (auto)
  • Manual Mode (dev)
Simplified mode suitable for quick verification and standardized deployment.
Place the complete weight files of the model (e.g., .safetensors, .bin, .pt, .awq, etc.) directly in the auto/llm/ directory, without nesting in subfolders.
auto/llm/Qwen3-30B-A3B-Instruct-2507-AWQ/model.safetensors
  • Upon device startup, the system scans the auto/llm/ directory and automatically loads models in compatible formats
  • Automatic loading is not supported for embedding or reranker models, only LLMs
  • After loading, the model enables basic inference capabilities by default and does not enable the following advanced features:
    • Speculative Decoding
    • Prefix Caching
    • Chunked Prefill
    • The maximum context length (max_model_len) is restricted to the system’s safe threshold (typically ≤ 8192 tokens)
  • Limited performance optimization: To ensure system stability and multitasking concurrency, models in automatic mode use a conservative memory allocation strategy (gpu_memory_utilization ≤ 0.8)
Important NoteAutomatic mode is suitable for quickly verifying model compatibility or standardized deployment scenarios, but it is not suitable for high-performance inference in production environments. For full performance, use manual mode (dev).

Security and Maintenance Notes

  • Prohibited SSH Login to Inference Module: All model management must be done via the CFexpress Type-B storage card
  • Model Files Must Be Raw Weights: Do not use compressed files (.zip/.tar.gz), encrypted packages, or non-standard formats
  • File Permissions: All model files must be readable (chmod 644), and directories must be executable (chmod 755)
  • Version Control: It is recommended to use Git or file naming conventions (e.g., Qwen3-30B-A3B-Instruct-v1.2-20250930) to manage model versions
  • Backup Recommendation: Back up the dev/ and auto/ directories before updating models to avoid configuration loss

Mode Selection Recommendations

ScenarioRecommended ModeDescription
Quickly verify model compatibilityAutomatic Mode (auto)No configuration required, plug and play
High-performance inference in productionManual Mode (dev) + fine-tuned configurationFull performance optimization
Multi-model parallel deploymentManual Mode (dev) + multiple .yaml filesFlexible service orchestration
Development debugging, prototype validationManual Mode (dev)Complete control

Technical Support


© 2025 Panidea (Chengdu) Artificial Intelligence Technology Co., Ltd. All rights reserved.