Reducing Network Load

Team Spacemesh

Jun 3, 2024

Reducing Network Load

In the past, we’ve not always been the most transparent with what we’re dealing with behind the scenes, and what we’re considering as solutions to the challenges we face. After hearing the concerns of our community, though, we’re making efforts to communicate these things in a more timely fashion.

While reading, please bear in mind that these are emergency measures that we will avoid using if at all possible, and will roll out only if absolutely necessary.

Background

Since the launch of the Spacemesh Mainnet in July 2023, we’ve been seeing an ever-increasing load on smeshers. We’ve taken many steps over this time to reduce this load. We’ve improved our P2P protocol and made it more efficient where that was possible. We’ve applied incremental improvements to the sync protocol over time and eventually we completely rewrote it. We’ve redesigned our database and made it more efficient. We’ve reduced the ATX validation load by performing sampled validation and we’ve introduced a malfeasance proof for bad ATXs so that nodes still detect bad ATXs, even if they haven’t validated those ATXs themselves.

But despite all these measures, the load on smeshers still keeps increasing. The factor that we can’t easily control, that causes this increasing load, is the number of smesher identities on the network. More smesher identities doesn’t just mean more ATXs are published each epoch, beyond a certain number it also means more valid ballots and block proposals appear each layer because of the way Spacemesh guarantees every miner at least one reward per epoch.

Before launching mainnet we tried to estimate the number of identities we expected to see. The analysis we performed made some assumptions about the distribution of weight between home smeshers, semi-professional private smeshers and large, industrial-scale smeshers. While we underestimated demand for smeshing, we weren’t that far off. The safety margins we took would have been wide enough, if it hadn’t been for one emergent behavior we missed.

When designing the PoST protocol we were careful to make it “fair”. We wanted to ensure that all smeshers, large and small, get the same reward for allocating space and that industrial smeshers don’t get an unfair advantage, something that’s common in other protocols. To this end we introduced k2pow, a mechanism to prevent a specific attack on the network where a sophisticated actor uses an ASIC (specialized hardware that can do one kind of computing job extremely quickly and cheaply) to efficiently use a large number of nonces in parallel. That would allow them to “boost” their storage and make it appear like they have much more than they actually do. This solution, k2pow, requires all smeshers to perform a small amount of work for every nonce they intend to use. The actual amount of work depends on the space allocation, since the financial benefit of the “attack” also grows with the allocated space. This creates an incentive, especially for large industrial smeshers, to optimize their use of nonces.

A number of sophisticated miners and mining pools have done just that. They found a way, within the rules of the protocol, to use fewer nonces than the reference implementation. They achieve this by splitting smeshers’ space allocation into smaller identities. By using fewer nonces than the reference implementation, they take into account that some of these identities fail to produce a valid proof on the first attempt. Since most identities do produce a valid proof on the first attempt, however, they’re able to then use all of their resources to more quickly perform a second attempt for the fraction of identities that failed at first. This approach is smart and doesn’t break any explicit protocol rules. But it has caused a dramatic increase in the rate of identity growth.

As I said earlier, we assumed that a large portion of the allocated space would consist of industrial smeshers. We assumed that an industrial smesher with several petabytes of storage would publish one or maybe a handful of ATXs each epoch. Instead, such smeshers today end up publishing hundreds, sometimes thousands of ATXs.

As a result we’re seeing more than 100x the number of ATXs we expected to see for this amount of total allocated storage. And it keeps growing.

A Solution is On the Horizon

The solution needs to have two effects.

It must remove the (implicit) incentive to split identities.
It has to allow smeshers to merge their existing identities, since the number of active smesher identities today is already untenable in the long term.

As a first step, we’re currently working on a protocol upgrade called “ATX Merge”. This will allow smeshers to put multiple PoST proofs from different identities into a single ATX. By doing so, instead of each small identity being eligible to produce 1 ballot + block proposal each epoch (we currently guarantee eligibility for each individual identity), each combined identity would instead get a number of eligibilities depending on its total weight. So large smeshers will produce fewer block proposals, but each proposal will have significantly greater weight and reward (the total will remain exactly the same).

This still doesn’t reduce the number of PoST proofs. It also doesn’t fix the incentive to split identities (even if they’re later merged). We need a deeper change. We’re therefore working on a new version of the PoST protocol. It will not require anyone to reinitialize their storage. Based on the same files, smeshers will be able to create a different kind of proof that will become the only valid PoST proof when this change goes into effect. We’ll write more about the new protocol closer to the rollout, but I want to describe it at a very high level here, to give the community a sense of what’s coming.

Today smeshers declare their allocated size and then prove they indeed have that amount of storage. With too few nonces they may fail to produce this proof, but if they succeed their weight will be directly proportional to the declared storage. In other words, to create a PoST proof, a smesher tries to find “good labels” in their PoST data—labels that, when concatenated with a challenge, their hash is below a threshold. The result is binary, they either find enough or they don’t, but how good the labels are (how much below the threshold) doesn’t matter, as long as they’re below the threshold.

Under the new PoST, we need to make the weight proportional to the quality of the found labels. This means that if you try fewer nonces, you’re expected to have less weight. You might get lucky sometimes, but over time you’ll end up losing. Trying more nonces might get you a tiny bit of extra weight, but we’ll set the parameters such that the effect is negligible.

In addition, instead of increasing the amount of work required for each nonce linearly with the allocated storage, we’ll keep the amount of work for each nonce fixed (or close to fixed). To compensate, we’ll increase the number of labels included in the proof. The number of labels in the proof has an exponential impact on the number of nonces required to “fake” a larger allocation than the smesher actually has. Since industrial smeshers today create fixed size identities, the total number of PoST labels across all ATXs grows linearly with the total allocated space. With this change, the number of labels will grow with storage, but dramatically less (logarithmically).

When this change is rolled out, smeshers will still get rewarded in proportion to the amount of storage they allocated, but the incentive to split identities will be gone. There will actually be a small indirect incentive to aggregate identities and merge them. The incentive is not in the reward one can potentially receive, but in the amount of total work required for creating the proofs. The work required can now be consolidated across all of a smesher’s allocated space.

A side effect of this new approach is that a smesher’s weight will no longer depend on what they declare, but on the actual quality of the labels they can find. Since the quality of labels is a statistical property, smeshers’ weight will fluctuate slightly from epoch to epoch. Some epochs you’ll find really good labels and get a little more weight, while in other epochs you’ll get a little less. On average, over multiple epochs, smeshers are still expected to get the same reward as today.

We’re hard at work making these changes a reality. We’ll roll them out when they’re implemented and tested. ATX Merge is already in advanced stages of implementation and our research team is now finalizing the exact design and parameters for the new version of PoST. We’ll do our best to finish designing and implementing the solution, but the work is complex and these things take time. We don’t yet know exactly how long this will take, but we’ll keep the community updated.

What Happens in the Meantime?

The number of ATXs, ballots and block proposals is expected to increase in the meantime. We hope we can roll out ATX Merge while smeshers can still handle the load. The change must also be adopted quickly by large smeshers.

Note: We don’t recommend that smaller smeshers, who only control a handful of identities, perform a merge. There’s technical risk involved (you might accidentally invalidate your identities) and for small smeshers the benefit to both the smesher and the network is small.

While there are technical risks involved and the benefits accrue more to the network than to individual smeshers, we’re still confident that most large smeshers will choose to merge identities in the interest of overall network health, which is in everyone’s interest. This will buy us time to finish the implementation of the new PoST. Since identities will already be merged by then, it will also mean instant adoption.

Things can go wrong, though:

We may face issues and ATX Merge could get delayed.
The rate of ATX growth per epoch might increase.

We might get to a point where we must enact emergency measures to keep load under control and the network alive.

Currently the best emergency measure we have is to temporarily reduce the cadence of eligibility from every epoch to every other epoch. This means smeshers will still publish ATXs every epoch, but each epoch only half the population of smeshers will be eligible. This has no impact on the amount of rewards received over time. Each epoch, half the identities share the reward, so they get twice as much, but then the next epoch they get nothing. This has the effect of instantly cutting the number of ballots and block proposals in half, and it’s quick and easy to implement.

But we might also be forced to resort to more extreme, dramatic measures, like significantly increasing the minimum space allocation or removing the guaranteed eligibility entirely.

We hope that none of these emergency measures need to be applied. But it’s important to us to keep the community in the loop and that people are aware that these options are on the table.

Thank you for taking the time to read this. If you have any comments or questions, please share them on our Discord server. We also plan to hold a town hall meeting to allow the community to engage with us on this and answer any questions. Details will be published shortly.

Join our newsletter to stay up to date on features and releases

Keep Reading

Spacemesh Decentralization

#Technology #Community

Identity Merge: The Basics

#Technology

Introducing Athena

#Technology #Featured