Skip to content

Commit 4fc1ea7

Browse files
aliuyIEvangelistrobvet
authored
Update relational-vs-nosql-data.md (dotnet#23846)
* Update relational-vs-nosql-data.md Improve accuracy of comparison * Apply suggestions from code review Co-authored-by: Rob Vettor <[email protected]> Co-authored-by: David Pine <[email protected]> Co-authored-by: Rob Vettor <[email protected]>
1 parent 4b700cc commit 4fc1ea7

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

docs/architecture/cloud-native/relational-vs-nosql-data.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -44,21 +44,23 @@ The theorem states that distributed data systems will offer a trade-off between
4444

4545
- *Partition Tolerance.* Guarantees the system continues to operate even if a replicated data node fails or loses connectivity with other replicated data nodes.
4646

47+
CAP theorem explains the tradeoffs associated with managing consistency and availability during a network partition; however tradeoffs with respect to consistency and performance also exist with the absence of a network partition. CAP theorem is often further extended to [PACELC](http://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf) to explain the tradeoffs more comprehensively.
48+
4749
Relational databases typically provide consistency and availability, but not partition tolerance. They're typically provisioned to a single server and scale vertically by adding more resources to the machine.
4850

4951
Many relational database systems support built-in replication features where copies of the primary database can be made to other secondary server instances. Write operations are made to the primary instance and replicated to each of the secondaries. Upon a failure, the primary instance can fail over to a secondary to provide high availability. Secondaries can also be used to distribute read operations. While writes operations always go against the primary replica, read operations can be routed to any of the secondaries to reduce system load.
5052

51-
Data can also be horizontally partitioned across multiple nodes, such as with [sharding](/azure/sql-database/sql-database-elastic-scale-introduction). But, sharding dramatically increases operational overhead by spitting data across many pieces that cannot easily communicate. It can be costly and time consuming to manage. It can end up impacting performance, table joins, and referential integrity.
53+
Data can also be horizontally partitioned across multiple nodes, such as with [sharding](/azure/sql-database/sql-database-elastic-scale-introduction). But, sharding dramatically increases operational overhead by spitting data across many pieces that cannot easily communicate. It can be costly and time consuming to manage. Relational features that include table joins, transactions, and referential integrity require steep performance penalties in sharded deployments.
5254

53-
If data replicas were to lose network connectivity in a "highly consistent" relational database cluster, you wouldn't be able to write to the database. The system would reject the write operation as it can't replicate that change to the other data replica. Every data replica has to update before the transaction can complete.
55+
Replication consistency and recovery point objectives can be tuned by configuring whether replication occurs synchronously or asynchronously. If data replicas were to lose network connectivity in a "highly consistent" or synchronous relational database cluster, you wouldn't be able to write to the database. The system would reject the write operation as it can't replicate that change to the other data replica. Every data replica has to update before the transaction can complete.
5456

55-
NoSQL databases typically support high availability and partition tolerance. They scale out horizontally, often across commodity servers. This approach provides tremendous availability, both within and across geographical regions at a reduced cost. You partition and replicate data across these machines, or nodes, providing redundancy and fault tolerance. The downside is consistency. A change to data on one NoSQL node can take some time to propagate to other nodes. Typically, a NoSQL database node will provide an immediate response to a query - even if the data that is presented is stale and hasn't updated yet.
57+
NoSQL databases typically support high availability and partition tolerance. They scale out horizontally, often across commodity servers. This approach provides tremendous availability, both within and across geographical regions at a reduced cost. You partition and replicate data across these machines, or nodes, providing redundancy and fault tolerance. Consistency is typically tuned through consensus protocols or quorum mechanisms. They provide more control when navigating tradeoffs between tuning synchronous versus asynchronous replication in relational systems.
5658

57-
If data replicas were to lose connectivity in a "highly available" NoSQL database cluster, you could still complete a write operation to the database. The database cluster would allow the write operation and update each data replica as it becomes available.
59+
If data replicas were to lose connectivity in a "highly available" NoSQL database cluster, you could still complete a write operation to the database. The database cluster would allow the write operation and update each data replica as it becomes available. NoSQL databases that support multiple writable replicas can further strengthen high availability by avoiding the need for failover when optimizing recovery time objective.
5860

59-
This kind of result is known as eventual consistency, a characteristic of distributed data systems where ACID transactions aren't supported. It's a brief delay between the update of a data item and time that it takes to propagate that update to each of the replica nodes. Under normal conditions, the lag is typically short, but can increase when problems arise. For example, what would happen if you were to update a product item in a NoSQL database in the United States and query that same data item from a replica node in Europe? You would receive the earlier product information, until the cluster updates the European node with the product change. By immediately returning a query result and not waiting for all replica nodes to update, you gain enormous scale and volume, but with the possibility of presenting older data.
61+
Moden NoSQL databases typically implement partitioning capabilities as a feature of their system design. Partition management is often built-in to the database, and routing is achieved through placement hints - often called partition keys. A flexible data models enables the NoSQL databases to lower the burden of schema management and improve availablity when deploying application updates that require data model changes.
6062

61-
High availability and massive scalability are often more critical to the business than strong consistency. Developers can implement techniques and patterns such as Sagas, CQRS, and asynchronous messaging to embrace eventual consistency.
63+
High availability and massive scalability are often more critical to the business than relational table joins and referential integrity. Developers can implement techniques and patterns such as Sagas, CQRS, and asynchronous messaging to embrace eventual consistency.
6264

6365
> Nowadays, care must be taken when considering the CAP theorem constraints. A new type of database, called NewSQL, has emerged which extends the relational database engine to support both horizontal scalability and the scalable performance of NoSQL systems.
6466
@@ -68,13 +70,11 @@ Based upon specific data requirements, a cloud-native-based microservice can imp
6870

6971
| Consider a NoSQL datastore when: | Consider a relational database when: |
7072
| :-------- | :-------- |
71-
| You have high volume workloads that require large scale | Your workload volume is consistent and requires medium to large scale |
72-
| Your workloads don't require ACID guarantees | ACID guarantees are required |
73-
| Your data is dynamic and frequently changes | Your data is predictable and highly structured |
74-
| Data can be expressed without relationships | Data is best expressed relationally |
75-
| You need fast writes and write safety isn't critical | Write safety is a requirement |
76-
| Data retrieval is simple and tends to be flat | You work with complex queries and reports|
77-
| Your data requires a wide geographic distribution | Your users are more centralized |
73+
| You have high volume workloads that require predictable latency at large scale (e.g. latency measured in milliseconds while performing millions of transactions per second) | Your workload volume generally fits within thousands of transactions per second |
74+
| Your data is dynamic and frequently changes | Your data is highly structured and requires referential integrity |
75+
| Relationships can be de-normalized data models | Relationships are expressed through table joins on normalized data models |
76+
| Data retrieval is simple and expressed without table joins | You work with complex queries and reports|
77+
| Data is typically replicated across geographies and requires finer control over consistency, availablity, and performance | Data is typically centralized, or can be replicated regions asynchronously |
7878
| Your application will be deployed to commodity hardware, such as with public clouds | Your application will be deployed to large, high-end hardware |
7979

8080
In the next sections, we'll explore the options available in the Azure cloud for storing and managing your cloud-native data.
@@ -159,7 +159,7 @@ If your services require fast response from anywhere in the world, high availabi
159159

160160
![Overview of Cosmos DB](./media/cosmos-db-overview.png)
161161

162-
**Figure 5-12**: Overview of Cosmos DB
162+
**Figure 5-12**: Overview of Azure Cosmos DB
163163

164164
The previous figure presents many of the built-in cloud-native capabilities available in Cosmos DB. In this section, we’ll take a closer look at them.
165165

@@ -203,10 +203,10 @@ In the previous table, note the [Table API](/azure/cosmos-db/table-introduction)
203203
| | Azure Table Storage | Azure Cosmos DB |
204204
| :-------- | :-------- |:-------- |
205205
| Latency | Fast | Single-digit millisecond latency for reads and writes anywhere in the world |
206-
| Throughput | Limit of 20,000 operations per table | 10 Million operations per table |
206+
| Throughput | Limit of 20,000 operations per table | Unlimited operations per table |
207207
| Global Distribution | Single region with optional single secondary read region | Turnkey distributions to all regions with automatic failover |
208208
| Indexing | Available for partition and row key properties only | Automatic indexing of all properties |
209-
| Pricing | Based on storage | Based on throughput |
209+
| Pricing | Optimized for cold workloads (low throughput : storage ratio) | Optimizied for hot workloads (high throughput : storage ratio) |
210210

211211
Microservices that consume Azure Table storage can easily migrate to the Cosmos DB Table API. No code changes are required.
212212

@@ -285,7 +285,7 @@ One of the more time-consuming tasks is migrating data from one data platform to
285285
- Azure Database for MySQL
286286
- Azure Database for MariaDB
287287
- Azure Database for PostgreSQL
288-
- CosmosDB
288+
- Azure Cosmos DB
289289

290290
The service provides recommendations to guide you through the changes required to execute a migration, both small or large.
291291

0 commit comments

Comments
 (0)