RAM Rekt! EOS Storage Pitfalls

EOS Storage PitfallsKeith MukaiBlockedUnblockFollowFollowingMay 23A case study on disastrous smart contract data design… and how to fix it.

This discussion will be technical, but it’s written so that anyone can mostly follow along regardless of whether or not you’re a coder.

I’m currently building Achieveos — an open platform for an Achievement, Trophy, or Badge system that writes its data to the EOS blockchain.

The idea is that your achievements should live on forever, even if the organization that awarded them has long since disappeared.

And since data on the blockchain is forever, well, there you go.

Achieve(ments) + EOS = AchieveosSo let’s figure out how we’re going to store these eternal achievements.

EOS storage basicsAt first glance the EOS smart contract storage system looks just like a traditional database (just ignore the C++ syntax clutter):struct [[eosio::table]] Organization { uint64_t key; string organization_name; uint64_t primary_key() const { return key; } };My Organization table has a key field which is a unique integer identifier for each row of data we insert into Organization.

It also has an organization_name for display purposes.

So the Organization with key == 0 might be “Keith’s Gymnastics Team” while key == 1 could be “Barb’s Book Club”.

I’d like to be able to set up my Organization's various possible achievements within customizable categories.

For my gymnasts I want to have “Strength Goals” as well as specific accomplishments on each apparatus (e.

g.

“Floor Exercise Achievements”).

So that’s simple:struct [[eosio::table]] Category { uint64_t key; uint64_t organization_id; string category_name; uint64_t primary_key() const { return key; } };Looks just like the Organization table, but this one has an organization_id which is a reference to the Organization this Category belongs to.

So I can create aCategory that belongs toorganization_id = 0 (“Keith’s Gymnastics Team”) and give it acategory_name of “Strength Goals”.

Great, so far this is exactly like a traditional database that most programmers are already comfortable with.

But there’s a “but”.

Gotta pay the piper (…more than you think)Data storage on any blockchain is always going to be a little expensive; you’re asking the entire network to store your data.

Forever.

They ain’t gonna do that for free.

Luckily our data structures so far are pretty small.

The uint64_t fields are 64-bit integers (there are 8 bits in 1 byte so each uint64_t takes up 8 bytes).

And each character in a string is a single byte.

So for “Keith’s Gymnastics Team” (23 characters) the total memory footprint is:8 + 23 = 31 bytesEOS stores your data in RAM (using hard drives would be too slow).

The cost of EOS RAM storage is currently about $0.

293/kB.

There are 1,024 bytes in 1 kB.

So the cost of storing our first Organization is:31 bytes / 1024 bytes/kB * $0.

293/kB = $0.

00887Less than 1 cent.

Not bad for eternity.

But when I actually created that Organization my user’s RAM hit was actually 255 bytes.

Wait, that’s 8x more than I was expecting!!The real storage costsAfter a ton of testing and lots of googling I finally found an explanation:Issue #4532 on the EOSIO/eos github repoFrom what I can tell the costs for the first Organization entry are:Creating the table for a new user: 112 bytes.

Overhead for every new row: 112 bytes.

The actual data for that row: 31 bytes.

The table creation fee is a one-time only expense for each user.

But 112 bytes of overhead for each new row is significant.

Whelp, this ain’t gonna workLet’s do some quick math to see how the rest of our data storage needs pan out.

We’ll want an Achievements table to store the different possible goals, trophies, etc.

Each Achievement would reference its parent Organization and which Category it belongs to.

We’ll need aUsers table for all the people who will be pursuing these achievements.

And finally a UserAchievements table to indicate which Users were granted which Achievements.

Now let’s say we offer 100 possibleAchievements to our 500 Users and we expect the average user to be granted, say, 30 UserAchievements each.

For simplicity’s sake, let’s only focus on the overhead costs we saw above: 112 bytes per row.

100 + 500 + 30*500 = 15,600 rows15,600 rows * 112 bytes/row / 1024 bytes/kB = 1,706 kB1,706 kB * $0.

293/kB = $499.

93And remember, this was just the overhead.

We haven’t even counted the cost of storing our actual data!Holy hell!.Forget it!Vectors<are here to save the day>!Notice that it’s all the rows that are expensive.

The actual data — a 23-byte string here, a couple 8-byte integers there — are tiny and inexpensive.

What if we rearchitected our data structures to minimize the number of rows required?Let’s revisit our first two tables: Organization and Category.

The only reason Category is in its own table is to have a unique key to reference:Achievement.

category_id = 5 // File this under Category #5It turns out that we can accomplish the same thing using a C++ vector inside the Organization table:struct [[eosio::table]] Organization { uint64_t key; string organization_name; vector<string> categories; uint64_t primary_key() const { return key; } };A vector is just a list.

So the new categories vector for my “Keith’s Gymnastics Team” Organization could look like:categories = [ "Strength Goals", "Floor Exercise Achievements", "Pommel Horse Achievements", .

]Each of those entries has an integer index based on its position:categories[0] == "Strength Goals"categories[1] == "Floor Exercise Achievements"categories[2] == "Pommel Horse Achievements"Those indices — 0, 1, 2 — are unique, consistent ways to refer to each category within that Organization.

In effect they serve the same purpose as the prior Category.

key.

But the crucial difference is that all those categories are now contained within a single row — within the Organization that they belong to.

The previous Category table and all of its expensive per-row overhead has been completely eliminated.

More concretelyIn the original table structure a new Category row would have cost us 112 bytes in overhead, plus the size of the data itself.

So adding “Post-Season Training Goals” (26 characters) plus its associated 8-byte key plus its 8-byte reference to its parent organization_id would cost:112 + 26 + 8 + 8= 154 bytesBut in our new approach we would just modify() my existing Organization row and add a new element to the end of the categories vector.

The total net change of storage would simply be:(existing row size) + 26 bytes = 26 byte increaseThat’s a whopping 84% savings vs the previous approach!Lesson Learned: Vectors, vectors, everywhere!So forget everything you know about structuring data for a database.

We are in a brave new world, folks!.The final Organization table can hold EVERYTHING related to that organization — in a single row.

It’s a bit beyond the scope of this article to go into the full details here, but the final version uses a few utility structs within a single EOS storage table to rule them all:struct Category { string name; vector<string> achievements;};struct UserAchievementsList { vector<uint64_t> userachievements;};struct User { string name; map<uint64_t, UserAchievementsList> bycategory;};// The ONE and ONLY storage table!struct [[eosio::table]] Organization { uint64_t key; string organization_name; vector<Category> categories; vector<User> users; uint64_t primary_key() const { return key; }};Cost improvementsLet’s add a new Achievement to my first Category, “Strength Goals”:org.

categories[0].

achievements.

push_back("5-Minute Plank");Total net RAM expenditure: 14 bytes.

Let’s say that new Achievement ended up as index ID 4 within its Category.

Now we’ll grant the new Achievement touser_id 53:org.

users[53].

bycategory[0].

userachievements.

push_back(4);If this was the first Achievement that user_id 53 was granted from Category 0, then we’ll have to write the 0 and the 4 to the blockchain.

Total net RAM expenditure: 16 bytes.

We have completely eliminated the killer overhead costs.

Boom!.Tell your RAM quota, “You’re welcome.

”Caveat: it does remain to be seen if we end up hitting other limits by centering everything around an increasingly bloated single data object.

But I’m still claiming victory until proven otherwise.

Bonus: Secondary Indexes are a double whammyThis is even more geeky, but each table in EOS is indexed on its primary key.

Indexing just means that it’s optimized to locate the row quickly based on that key.

As with regular databases, it’s often advantageous to index on other fields to facilitate fast lookups on those fields as well.

Imagine a version of aUserAchievements table that has a reference to a user_id, the achievement_id, and its own primary key key.

You could easily imagine wanting to quickly retrieve all the UserAchievements for a specific user (index on user_id) or the list of everyone who was granted a particular Achievement (index on achievement_id).

Great, EOS can do that.

But you pay additional overhead for each secondary index.

That 112 bytes per row overhead is basically the cost of indexing the primary key.

Each additional index you add incurs another 112 bytes per row!.So for our UserAchievements table we’d be paying:112 byte primary key index per row112 byte secondary index on user_id per row112 byte secondary index on achievement_id per row8 + 8 +8 = 24 bytes for the actual data (three 8-byte integers)That’s 360 bytes to record just 24 bytes of actual data!!Use secondary indexes sparingly, if at all!.

. More details

Leave a Reply