This article contains everything you need to know about managing the Iconik Storage Gateway Pro (ISG Pro) clusters and nodes
Node roles
Each node has one or more roles that determine what kinds of jobs it will handle:
| Role | Handles |
|---|---|
| MAIN | Scanning, ingest, deletion, event handling. Polls Iconik for events. |
| CHECKSUM | File checksum calculation |
| TRANSCODE | Transcoding jobs |
| TRANSFER | Uploads, downloads, archives, exports |
A node can hold any combination of roles. Roles can also be split: one node could be dedicated to transcoding while another handles transfers. This is how customers tune the cluster for their workload: a customer with heavy transcoding demand can spin up multiple TRANSCODE-role nodes; a customer with heavy ingest can scale CHECKSUM-role nodes.
Each role must be held by at least one node for the cluster to be fully operational. The default cluster setup gives the first node every role.
ISG Pro High availability, Leader Selection & failure handling
Only one MAIN-role node is the Leader at any time. The Leader runs MAIN-role jobs (scanning, ingest). Other MAIN-role nodes are Followers, ready to take over if the Leader fails. Election is PostgreSQL-based.
If the Leader goes offline — the machine dies, loses network access, loses Iconik access, or loses shared storage access — a Follower wins the next election and picks up MAIN-role work. CHECKSUM, TRANSCODE, and TRANSFER work continues uninterrupted on healthy nodes throughout, because that work doesn't require Leader status.
Each node runs continuous health checks: database connectivity, Iconik API connectivity, shared storage access. A node that fails health checks steps back; another node takes over.
Failure scenarios to cover:
-
Leader machine dies — a Follower wins the next election after passing all health checks; the new Leader updates the old Leader's
availability_statussince the old node can't. - Leader loses iconik access (proxy / external gateway down) — Leader stops MAIN work, releases the lock, becomes Follower. Another healthy node takes over.
- Leader loses shared storage access — same outcome: drop leadership, release lock.
- Leader loses all network — equivalent to "machine died" from the cluster's perspective.
Health checks each node performs:
- Database connectivity
- iconik API connectivity
- Shared storage access (per configured storage)
What does not require leader status (continues even on Followers as long as the node is healthy):
- Checksum calculation
- Transcoding
- Transfers / archives (uploads, downloads)
Settings & Precedence
Settings can live in three places. Higher overrides lower:
config.ini(highest) → ISG (node)settings→ ISG clustersettings(lowest)
Currently available settings that can be defined using admin panel:
-
checksum_max_workers- checksum calculation concurrency (applied to nodes with CHECKSUM role) -
scanner_concurrency_value- scanner concurrency (applied to a Leader Main node) -
file_download_parallel_downloads_num- max download jobs amount per node (applied to nodes with TRANSFER role) -
file_upload_parallel_uploads_num- max upload jobs amount per node (applied to nodes with TRANSFER role) -
max_transcoding_jobs- max transcoding jobs amount per transcoder profile per node (applied to nodes with TRANSCODER role)
Cluster-only settings:
-
db_connection_uri- connection string a node uses for opening a database connection. -
visibility_timeout- how long a node holds a lease on a queued job before another node can pick it up.
Monitoring & telemetry
What admins should watch for in the UI:
- A cluster node with the Primary label is the current Leader.
- Node related logs are available at
Error logandLog linestabs. - Linked storage related monitoring remains at the storage's
Logstab. - Check local logs for more details: https://help.iconik.backlight.co/hc/en-us/articles/25304282187415-FAQ-for-iconik-Storage-Gateway#where-can-i-find-the-logs-from-the-isg.
Upgrades
- Recommended order:
- Disable cluster (via Web);
- Upgrade main nodes;
- Upgrade remaining nodes;
- Enable cluster.
ISG Pro Troubleshooting
- No node becomes Primary — check DB connectivity from every node; check shared storage reachability; check iconik connectivity.
-
Jobs stuck "in progress" forever — check
visibility_timeoutis sane; check if at least one node has a required role; look for nodes that disappeared mid-job (the lease will expire). -
Two nodes claim same
storage_gateway_id— visible in telemetry as multipleworker_ids for one gateway. Stop the duplicate. -
PostgreSQL TLS errors — verify
sslmode, certificate paths, and that pgBouncer (if used) is reachable.