Security And Architecture
How do we prevent DDoS ?
We follow sentry architecture for our validator node. Basically what it does is create two layers of network. One is a private network over which our sentries interact with our validator and the other is a public network over which our sentries interact with other nodes. This prevents our validator node from any DoS attempts as the only nodes that interact with our validator are our own sentries.
How we safely store validator signing key ?
We run horcrux by Strangelove Ventures, which is a remote signer using a setup of 3 sentries and 3 signers. For horcrux to work, each signer has its own share of the private validator key. The validator node need not store the private key. Instead the key is divided and each signer stores its share, making the private key material more secure.
Each signer node maintains the last state that the node signed as well as the last state the whole cluster signed. In this way we can assure that the cluster doesn't doublesign.
We prioritize using top-tier infrastructure providers for our operations. Our infrastructure setup follows a hybrid model that combines the use of both bare metal and cloud machines. Validators are primarily run on bare metal servers, while we deploy sentries on cloud platforms. This approach allows us to achieve geographical distribution across the globe, which helps improve fault tolerance and enhances peer-to-peer connectivity within the network. By utilizing this combination of infrastructure types, we ensure optimal performance, reliability, and network resilience.
Location and Backup
We operate our infrastructure from Bangalore, Karnataka, India, which adds to the geographically decentralization of the network's nodes, as the majority of node infrastructure on any network is usually from Europe.
We have a rigorous policy for backup keys, where the private validator key is stored with a redundancy of 3. This means that even in the event of a loss of the private validator key, we would experience only a small downtime. It would take us no longer than a few hours to recover our validator using the backed-up private validator key.
Furthermore, we prioritize the security of our private keys by encrypting them using AES256 before storing them in our backup buckets. This ensures an additional layer of protection for the keys during storage.
Monitoring and Alerting
We use Grafana for monitoring and have set up a number of dashboards that display various metrics about validator health and machine resources.
We usually set up many alerts on our node, which can automatically call our number and send out an email. We use Grafana Alerting to set up alerts in combination with PagerDuty to send notifications for any alerting events and handle on-call duties.
All these alerts have different severity levels, which help us reduce TTD (Time to Detect), and we already have a standard operating procedure (SOP) written and managed in documentation. This enables us to reduce our TTR (Time to Recover).