Skip to main content

What is sharding - Databases

What is Database sharding?

Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.

Here's how Jason Tee explains sharding on The Server Side: "In the simplest sense, sharding your database involves breaking up your big database into many, much smaller databases that share nothing and can be spread across multiple servers."

Technically, sharding is a synonym for horizontal partitioning. In practice, the term is often used to refer to any database partitioning that is meant to make a very large database more manageable.
The governing concept behind sharding is based on the idea that as the size of a database and the number of transactions per unit of time made on the database increase linearly, the response time for querying the database increases exponentially. 

Additionally, the costs of creating and maintaining a very large database in one place can increase exponentially because the database will require high-end computers. In contrast, data shards can be distributed across a number of much less expensive commodity servers. Data shards have comparatively little restriction as far as hardware and software requirements are concerned. 
In some cases, database sharding can be done fairly simply. One common example is splitting a customer database geographically. Customers located on the East Coast can be placed on one server, while customers on the West Coast can be placed on a second  server. Assuming there are no customers with multiple locations, the split is easy to maintain and build rules around.
Data sharding can be a more complex process in some scenarios, however. Sharding a database that holds less structured data, for example, can be very complicated, and the resulting shards may be difficult to maintain.

Database architecture

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.
There are numerous advantages to this partitioning approach. Since the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, which means that the database performance can be spread out over multiple machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.

In practice, sharding is far more complex. Although it has been done for a long time by hand-coding (especially where rows have an obvious grouping, as per the example above), this is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. Consistent hashing is one form of automatic sharding to spread large loads across multiple smaller services and servers.
Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a shard approach may also be useful.

Shards compared to horizontal partitioning

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which table a particular row will be found, without first needing to search the index, e.g., the classic example of the 'CustomersEast' and 'CustomersWest' tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The hoped-for gains in efficiency would be lost, if querying the database required both instances to be queried, just to retrieve a simple dimension table. Beyond partitioning, sharding thus splits large partitionable tables across the servers, while smaller tables are replicated as complete units.

This is also why sharding is related to a shared nothing architecture—once sharded, each shard can live in a totally separate logical schema instance / physical database server / data center / continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.
This makes replication across multiple servers easy (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers would otherwise be a bottleneck.


There is also a requirement for some notification and replication mechanism between schema instances, so that the unpartitioned tables remain as closely synchronized as the application demands. This is a complex choice in the architecture of sharded systems: approaches range from making these effectively read-only (updates are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding) and many options in between.

Source From :

http://en.wikipedia.org/wiki/Shard_%28database_architecture%29

http://www.codefutures.com/database-sharding/

Comments

Popular posts from this blog

Asp.Net AjaxFileUpload Control With Drag Drop And Progress Bar

This Example explains how to use AjaxFileUpload Control With Drag Drop And Progress Bar Functionality In Asp.Net 2.0 3.5 4.0 C# And VB.NET. Previous Post  I was Explained about the   jQuery - Allow Alphanumeric (Alphabets & Numbers) Characters in Textbox using JavaScript  ,  Fileupload show selected file in label when file selected  ,  Check for file size with JavaScript before uploading  . May 2012 release of AjaxControlToolkit includes a new AjaxFileUpload Control  which supports Multiple File Upload, Progress Bar and Drag And Drop functionality. These new features are supported by Google Chrome version 16+, Firefox 8+ , Safari 5+ and Internet explorer 10 + , IE9 or earlier does not support this feature. To start with it, download and put latest AjaxControlToolkit.dll in Bin folder of application, Place ToolkitScriptManager  and AjaxFileUpload on the page. HTML SOURCE < asp:ToolkitScriptManager I...

View online files using the Google Docs Viewer

Use Google Docs Viewer for Document viewing within Browser I was looking for a way to let users see Microsoft Word Doc or PDF files online while using my application without leaving their browser without downloading files and then opening it to view with Word or PDF viewer . I was looking for some way out either via any PHP or Microsoft.NET libraries, did some googling on that; but later on I just got an idea that google already has all code written for me.. when I have any email attachment in PDF or DOC or DOCX google does it for me ..! Even while searching I can see PDFs by converting them in HTML. So I just googled it up and found that Google already has this ability that we can use Google Docs Viewer without any Google Account Login . YES that's true no Google Account login is required. It's damn simple and easy. Just pass document path as attachment as parameter and we are done. Google Docs Viewer gives us ability to embed PDF, DOC/DOCX, PPT, TIFF:...

How to send mail asynchronously in asp.net with MailMessage

With Microsoft.NET Framework 2.0 everything is asynchronous and we can send mail also asynchronously. This features is very useful when you send lots of bulk mails like offers , Discounts , Greetings . You don’t have to wait for response from mail server and you can do other task . By using     SmtpClient . SendAsync Method (MailMessage, Object)    you need to do  System.Net.Mail has also added asynchronous support for sending email. To send asynchronously, you need need to Wire up a SendCompleted event Create the SendCompleted event Call SmtpClient.SendAsync smtpClient.send() will initiate the sending on the main/ui  thread and would block.  smtpClient.SendAsync() will pick a thread from the .NET Thread Pool and execute the method on that thread. So your main UI will not hang or block . Let's create a simple example to send mail. For sending mail asynchronously you need to create a event handler that will notify that mail success...