The problem of too much data

Too big tables in the database is a problem we all face. Still, we are reluctant to delete these extra records because we are afraid to lose data, or we believe that we may use this data in the future.

But, managing very big tables has a tremedous effect on performance:

  • Big tables cause degradation of performance

  • Insert and update operations are too slow

  • Schema changes such as add or remove index or column might result in a table lock

  • Table rebuild is very long

  • Bak and restore operations performance deteriorates

How can PartitionDB help?

PartitionDB lets you keep the existing data and even continue growing it, as much as you need without changing your code. By using the distributed table option.

In order to run the example below, it is required to download PdbLogic.
Here is an example:

In this example, a database is created, and a table, Deposits. The table is filled out by values.
Now, we wish to distribute the table. In this example it would make most sense to do it based on date and month.
We will create three tables: Deposits201701, Deposits201702 and Deposits201703.

This script partitioned the table to three tables.

Have a look into what we do

Note that you can use the optional parameter  IsOutputScript when calling PdbLogic..PdbdistributeTable; if IsOutputScript is set, the function PdbLogic..PdbdistributeTable execution will generate a script, instead of really distribute the table. The script includes the code that is responsible to the table distribution, and you can run it yourself. This gives you a full control over your database manipulation.

The below function call will output the script below it (if you already ran the code above that distributes the table, you cannot run it again with the parameter IsOutputScript set):

exec PdbLogic..PdbdistributeTable @SourceDatabaseName='TestDistrebutedTableDB', @SourceObjectName='Deposits', @TargetObjectName='Deposits201701', @DistributedColumnName='DepositTime', @DistributedDateValue='2017/01/01', @DistributedRangeDateValue='2017/02/01', @IsOutputScript = 1

You can use this script to distribute your table, exactly as it is done using PdbdistributeTable.

Manage partitioned tables

We now have three tables, containing the data from the original table.
We need to manage these tables, with no code change, as we promised …

We will wrap these three tables with a Partition View. Using PartitionDB SDK, we will use DepositsMaster View:

The last step would be to change the name of the view to the name of the original table, so we don’t need any code change:

Here is a list of the parameters to pass to PdbdistributeTable:

Parameter Description Mandatory
SourceDatabaseName Name of source database Yes
SourceLinkedServerName Name of source linked server No
SourceSchemaName Name of source schema No
SourceObjectName Name of table to distribute Yes
TargetDatabaseName Name of target database No
TargetLinkedServerName Name of target linked server for distributed tables No
TargetSchemaName Name of schema for distributed tables No
TargetObjectName Names of distributed tables Yes
DistributedColumnName Name of column to distribute by Yes
DistributedStringValue String value to distribute by (in case of range, this is the minimum value) No
DistributedNumberValue Number value to distribute by (in case of range, this is the minimum value) No
DistributedDecimalValue Decimal number value to distribute by (in case of range, this is the minimum value) No
DistributedDateValue Date value to distribute by (in case of range, this is the minimum date) No
DistributedRangeStringValue Maximum value of string to distribute by No
DistributedRangeNumberValue Maximum number value to distribute by No
DistributedRangeDecimalValue Maximum decimal number value to distribute by No
DistributedRangeDateValue Maximum date value distribute by No
IdentityIncrementValue Inceremnet value for identity No
IsIgnoreIdentity If true, ignore identity in distribution No
IsIgnoreRowVersion If true, ignore row version in distribution No
BulkCount Number of entries to process within a bulk. If set to 0, distribute all at once.  Default is 10,000 rows. No
IsOutputScript Decide if print the command or run it No

Here is a list of the parameters to pass to PdbsetDistributedObject:

Parameter Description Is mandatory
SourceDatabaseName Name of database Yes
SourceLinkedServerName Name of linked server No
SourceSchemaName Name of schema No
SourceObjectName Name of table Yes
TargetDatabaseName Name of database No
TargetLinkedServerName Name of server for distributed tables No
TargetSchemaName Name of schema for distributed tables No
TargetObjectName Names of distributed tables Yes
DistributedColumnName Name of column to distribute by Yes
DistributedStringValue String value to distribute by (in case of range, this is the minimum value) No
DistributedNumberValue Number value to distribute by (in case of range, this is the minimum value) No
DistributedDecimalValue Decimal number value to distribute by (in case of range, this is the minimum value) No
DistributedDateValue Date value to distribute by (in case of range, this is the minimum date) No
DistributedRangeStringValue Maximum value of string to distribute by No
DistributedRangeNumberValue Maximum number value to distribute by No
DistributedRangeDecimalValue Maximum decimal number value to distribute by No
DistributedRangeDateValue Maximum date value distribute by No
IsOutputScript Decide if print the command or run it No

Note:

  • Distributed value should match the type of the distribution column.
  • One distribution value or range must be passed.
  • In case of range, all values that are inside the range or equal to the minimum and maximum values will get in, besides date, in which the upper value will be smaller than the maximum value.

To use table distribution functionality and other PartitionDB utilities, download our PdbLogic and attach to your database.

To read more check out our blog.