Databases
Database types
-
SQL (Structured Query Language): technically differen from RDBMS (Relational Database Management System) but used interchangeably: Rigid schemas AND rigid relations between tables.
-
NoSQL: weak or no schemas. Relationships are handled differently.
Key-Value Store
Lists of key-value pairs. Keys must be unique. No schema or structures.
Used for Unstructured Data.
Strengths:
-
Very scalable because pairs can be distributed
-
Very fast
-
They handle multiple data types
-
Keys are linked directly to their values, no need for indexing or complex join operations
-
The content of a key can easily be copied to other systems without reprogramming the data, because they support many data types
Weaknesses:
-
Analytical queries are difficult since there are no Joins
-
Access pattern must be known in advance for optimum performance
Use cases:
-
Simple requirements
-
Data is made up of just names/keys and values, no or little structure
AWS services:
-
DynamoDB
In-memory Databases
For workloads that need ultrafast access to data that’s frequently accessed but rarely updated.
Data is stored as Key-Value pairs.
Strengths:
-
sub-milliseconds response times
-
Extremely scalable with no downtime
Weaknesses:
-
Not suitable for seldom changing data
-
Applications must tolerate stale data
Use cases:
-
Real-time data access
-
In-memory caching
-
Games
-
Session store
AWS services:
-
ElastiCache
-
MemoryDB for Redis
Wide Column Store
They contain a Partition key and optionally items can have additional keys.
Used for Structured Data.
Every item in a table has to have the same layout: the same number of keys. That keys combination is unique in that table.
Those are not the same tables as SQL databases. Each item doesn’t have to have the same attributes as the others.
Document
Entities are stored in collections and cannot be accessed using a key. Each document has a unique ID and is very flexible when it comes to fields (as opposed to SQL).
Used for Semi-structured and Unstructured Data.
Strengths:
-
Flexible
-
Don’t require planning data structures too much
-
Scalable
Weaknesses:
-
Flexibility comes with lack of ACID compliance
-
Cannot perform cross-file queries natively
Use cases:
-
Purchase orders
-
Contacts
-
Structures with deeply nested attributes
-
Structures that need to be linked to each other
AWS Services:
-
Amazon DocumentDB (with MongoDB compatibility)
Row Store
Like MySQL.
Used for Structured Data.
Also called OLTP (Online Transaction Processing)
Use cases:
-
Adding/updating/deleting things that are managed as items
Column Store
They’re not designed to describe an item, because they’re optimized to perform queries in a columnar way.
Used for Structured Data.
In row stores you may have | id | name | size | and add items in the form "{ 001, "this", 45 }", you then query for one or more fields (columns);
In column stores you add data and are not interested in the item as a whole, rather you want to perform queries on attributes. They’re very good for reporting. You may be interested in making a chart of items for a particular size you sold, so you query for all the sizes.
They’re very inefficient for transactions.
Strengths:
-
They support a large volume of data
-
Scalable
-
Fast write speeds
AWS Services:
-
Amazon Keyspaces (for Apache Cassandra)
Graph
They focus on relations.
Used for Structured, Semi-structured and Unstructured Data.
Relations, in a graph database, are data themselves, they don’t need to be computed. Relations can have attributes that define them.
Strengths:
-
Straightforward, fast retrieval of complex hierarchical structures
Weaknesses:
-
Not ideal for transactional data
-
Not efficient for analytics
Use cases:
-
DB for social media
-
Data with complex relationships
-
real-time big data mining
-
Recommendations
AWS Services:
-
Neptune
OLTP and OLAP (for Relational Databases)
Online transaction processing (OLTP) databases focus on recording Update, Insertion, and Deletion data transactions. OLTP queries are simple and short, which require less time and space to process.
Online analytical processing (OLAP) databases store historical data that has been input by OLTP. With OLAP databases, users can view different summaries of multidimensional data. Using OLAP, users can extract information from a large database and analyze it for decision making. A good example of an OLAP system is a business intelligence tool.
Transaction models
ACID and BASE a transaction models.
CAP Theorem
CAP stands for Consistency, Availability, Partition tolerant. Where:
-
Consistency: Every read either receives the most recent data written or an error. This is this definition is different from the that in ACID design (see below).
-
Availability: Every read from a non-failing has a response (not an error).
-
Partition tolerance: The system keeps working despite an arbitrary number of messages being dropped or delayed by the network between nodes.
"Any distributed data store can provide only two of the above".
In case of a network partition fail you need to decide between:
-
Consistency: cancel the transaction and throw an error or cause a timeout, because otherwise the next read is not guaranteed to read the most recent value. Availability drops.
-
Availability: proceed with the transaction and return the value, though it’s not guaranteed to be the most recently written.
In the absence of partitioning both Consistency and Availability can be guaranteed. E.g.: a single node database, where you either reach the node and DO get the most up-to-date value, or you can’t reach the node.
ACIDs are designed to favor consistency. BASEs favor availability.
ACID
In the ACID model transactions are:
-
Atomic: All the parts of the transaction are successful or the whole transaction is discarded and the state is reverted/unchanged. E.g.: a bank wants all transfers (reduce the amount on A, increase the amount on B) to be atomic.
-
Consistent: Given 2 valid states, transactions move the system from one to the other but at no time an in-between state exists.
-
Isolated: Multiple transactions happening at the same time don’t interfere with each other. They execute as if each were the only transaction running on the database.
-
Durable: Once a transaction has been committed, it remains committed even in the case of a system failure.
ACID usually refers to RDS/Aurora database and comes with a limited scalability.
BASE
BASE stands for Basically Available, Soft state, Eventually consistent.
-
Basically Available: read and write are available as much as possible, with no consistency guarantees. Those databases aim to spread data over the nodes.
-
Soft state: It delegates consistency to the developer. If the database optionally allows it, the developer must ask it to provide consistent data (I.e.: most up-to-date) or the application has to tolerate potential inconsistencies.
-
Eventually consistent (Optimistic replication): The system cannot guarantee that, immediately after a write operation, consistency is achieved.
BASE database are highly scalable and provide high performance.
Some BASE databases offer ACID functionality. E.g.: DynamoDB transactions.
Databases on EC2
They’re generally a very bad practice. It should be avoided and only allowed under very few circumstances.
Even running the application on one instance and the database on a dedicated instance is problematic, because it introduces a network dependency. Also, if the two instances are running in two different AZs this introduces cross-AZs networking costs.
Downsides
-
Admin overhead
-
You need to take care about backups and DR. ⇒ Admin overhead
-
EC2 is AZ resilient.
-
You can’t benefit from AWS solutions features
-
You have no serverless or scaling
-
Replication is on you ⇒ Admin overhead
-
You can’t have AWS solutions performance
When you’re forced to do so
-
You want to access the instance OS, maybe with root account permissions. Sometimes clients request this, and you need to understand if the need it, want it, or think they want it.
-
You need DB root account permissions (are you sure?)
-
You need to run a DB or DB version that AWS don’t support.
-
You requirea a specific DB and OS combination.
-
You require an architecture that AWS don’t provide, like some types of replication or resilience.
-
Decision makers say so.
Database Migration Service
A managed migration service to migrate from one DB to another, one of which must be in AWS. It uses an EC2 instance where you can define replication tasks, source endpoint and destination endpoint. It’s NOT limited to SQL.
It provides no-downtime migrations.
Many sources are supported, and include: MySQL, Aurora. MSSQL, MariaDB, MongoDB, PostgreSQL, Oracle, Azure SQL and others.
There are different kinds of jobs:
-
Full load: one-off migration of all data
-
Full load + CDC (Change Data Capture): migrates data and captures ongoing changes.
-
CDC only: allows for using custom migration tools like those provided by the source DB.
DMS does not offer schema conversion betweem DB versions or engines but you can use AWS SCT (Schema Conversion Tool) to do so.
UPDATE: Now it does. See DMS Schema Conversion
DMS can also be used for large migrations where data is serveral TB in size, in which case transferring over the network consumes time and capacity. It must be used with the Snowball family of products (That uses SCT):
-
Use SCT to extract data
-
Move the data to the snowball device you ordered
-
Ship the device to AWS (Data will be available in S3)
-
(optional) Use DMS CDC to monitor data changes
-
DMS migrates data from the S3 bucket to the target
Schema Conversion Tool (SCT)
It’s used only when the DB engine is incompatible, including a generic file format (no db related) for transfer.
Used for:
-
Convert schemas from one database engine to another.
-
Larger Migrations: when you need an alternative way to migrate data to AWS (including DB to S3) rather than using a data link.
It’s not used to move data across compatible engines.
Works with:
-
OLTP: MySQL, MSSQL, Oracle.
-
OLAP: Teradata, Oracle, Vertica, Greenplum.
Examples:
-
On-prem MySQL to AWS Aurora MySQL: compatible engines. Use DMS
-
On-prem MSSQL to AWS Aurora MySQL: incompatible engines. Use SCT
-
On-prem Oracle to AWS Aurora (only MySQL and Postgres supported): incompatible engines. Use SCT