ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Documentation Get Started with ScyllaDB Data Modeling Data Modeling Best Practices

Data Modeling Best Practices¶

These additional topics provide a broader perspective on data modeling, query design, schema design, and best practices when working with ScyllaDB or similar distributed NoSQL databases.

Partition Key Selection

Choose your partition keys to avoid imbalances in your clusters. Imbalanced partitions can lead to performance bottlenecks, which impact overall cluster performance. Balancing the distribution of data across partitions is crucial to ensure all nodes are effectively utilized in your cluster.

Let’s consider a scenario with poor partition key selection:

CREATE TABLE my_keyspace.messages_bad (
  user_id uuid,
  message_id uuid,
  message_text text,
  created_at timestamp,
  PRIMARY KEY (user_id, message_id)
);

In this model, the partition key is chosen as user_id, which is a globally unique identifier for each user. This choice results in poor partition key selection because it doesn’t distribute data evenly across partitions. As a result, messages from popular users with many messages will create hot partitions, as all their messages will be concentrated in a single partition.

A better solution for partition key selection would look like:

CREATE TABLE my_keyspace.messages_good (
  message_id uuid PRIMARY KEY,
  user_id uuid,
  message_text text,
  created_at timestamp
);

In this improved model, the partition key is chosen as message_id, which is the unique identifier for each message. This choice results in even data distribution across partitions because each user’s messages are distributed across multiple partitions. Popular users with many posts won’t create hot partitions, as their messages are distributed across the cluster. This approach ensures that all nodes in the cluster are effectively utilized, preventing performance bottlenecks.

Was this page helpful?

PREVIOUS
Schema Design
NEXT
Learn to Use ScyllaDB
  • Create an issue
  • Edit this page
ScyllaDB Documentation
  • Get Started with ScyllaDB
    • Why ScyllaDB?
    • Develop with ScyllaDB
      • Run ScyllaDB
      • Install a Driver
      • Connect an Application
      • Tutorials and Example Projects
    • Query Data
      • CQL
      • Schema
      • Inserting Data
      • Reading Data
      • Updating Data
      • Deleting Data
    • Data Modeling
      • Query Design
      • Schema Design
      • Data Modeling Best Practices
    • Learn to Use ScyllaDB
  • Versioning and Support Policy
    • ScyllaDB Version Support
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 07 May 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6
OSZAR »