It’s not headline news anymore that we’re producing large amounts of data. From 500 million Twitter feeds per day to terabytes of data collected by the Sloan Digital Sky Survey to map the universe, big data is here to be processed.
How do we store and process all this data efficiently? Big data is often messy and we can’t just query it in raw form. We can use sample data but that doesn’t give us the most accurate results. To mine data accurately, we need to filter the data, find correlations, and run predictive analysis algorithms on it. The challenge is that relational databases were not designed to scale at this level and process data quickly — does this mean end of the relational database era and a move towards something, which is new, and not like SQL?
Why not relational databases?
Traditionally, we have relied on relational database systems for storing data. Relational database systems provide data integrity and consistency by enforcing atomicity, consistency, isolation and durability (ACID) properties. This is essential in many scenarios. For example, it avoids contention should an ATM withdrawal and a deposit transaction happen on the same account at the same time. The problem is that in many scenarios, such as caching shopping cart history, ACID properties are a significant performance overhead, which leads to problems with scalability.
Another aspect of many modern applications is that they work with unstructured data. Many applications use JSON to store their data. Relational database management systems (RDMS) don’t provide an efficient way to provide create, read, update, and delete (CRUD) operations on this data.
Is NoSQL the answer?
NoSQL databases were designed to scale efficiently, so shouldn’t big data be handled solely by NoSQL databases? The answer is not so black and white. Most NoSQL databases excel in scalability and speed. However, they have limitations that don’t make them a good solution in all cases.
The most important restriction is that, in general, NoSQL databases don’t support multi-document atomic transactions. This makes it harder for applications that require two-phase commit protocol support to move to NoSQL databases.
Also, unlike relational databases, there’s no standardized language that’s used by NoSQL databases. With so many different NoSQL databases, moving from one database to another becomes much harder.
Solution: polyglot persistence
Polyglot persistence leverages the strengths of many kinds of databases in the same system. This has become necessary because different databases are designed to solve different problems. Using a single database engine for all requirements usually leads to non-performant solutions. For example, an e-commerce application may use a key-value store for its shopping cart. Accessing a shopping cart doesn’t require the overhead of transactions and ACID properties. The key aspect is to access the cart quickly. On the other hand, when the user checks out, the transactional data has to be secure and atomic. So a relational database is a better fit here. To store the transaction history, a document-based database may be a good choice. You can search it quickly and it scales well as the e-commerce application grows.
It make sense on paper, however, the question still remains, how do we integrate two vastly different database management systems? This is where SourcePro DB can help. SourcePro DB has a proven track record for accessing relational databases using a high-level, database-independent C++ interface. Using the same interface, SourcePro DB can also interact with NoSQL databases, enabling polyglot persistence without the need to learn new query languages.
Try SourcePro DB now and see for yourself.
• Read this white paper – Overcoming relational database limitations with NoSQL
• Learn how to use SourcePro DB with MongoDB, Cassandra, and Redis.