The previous article talked about the advantages of adopting the hybrid cloud solution. This article is focusing on how hybrid cloud solution can be applied to companies that want to take advantage of the non operational data they can gather.
Let’s assume a company has already an operational database resides on premise, and it wants to keep it untouched. The company feels it is time to start logging non operational data. Non operational data can be for example data coming from the devices, in case of an IoT companies. Or eCommerce companies are storing clicks and impression data. The value of non operational data is already well known to companies in the market. For IoT, it can help identifying devices’ faults and apply preventive maintenance, in other cases it helps learning about usage trends, and the options are endless. The nature of the non operational data is that it tends to grow very quickly (however, not necessarily steadily), so the best way to go is to have it stored in the cloud.
Working with on premise database and a cloud database can be achieved in different ways. We can have the client managing communication with both databases, which is a very complex solution. We can have the application servers doing this work but it requires massive change in the application level. The solution that is presented here is done in the database level. We create a link between the on premise database to the cloud database, and there is no need to make any changes in the application level.
This solution that we bring in this article answers the following challenges:
- Create a database to store the non operational data
- Keep the operational database untouched
- Manage the distributed database (operational and non operational) as one database
- Run join queries across the two databases
The example below is using an eCommerce database, but it can be applied to any business that is looking to start storing its non operational data.
Create an operational database
This step is required for the sake of this example. As said before, we assume that the company is already using an operational database to support its business and offerings. Here we will create a simple database that resides on premise. It includes tables: customers, products and orders.
Build a Gate
PartitionDB Gate is the core of the system. It is an engine that is responsible to dispatch the queries to the correct table and letting the user the simplicity of managing all his databases as one database.
The user has to choose one of the two options:
- Create a separate Gate database.
- Have the Gate part of the operations database.
Each approach has its benefits. In short, the first option will keep the Gate separate from the data, and if a decision is made to stop using the Gate, it will be simple to apply. The second option is easier for the user, since he has to deal with one database only.
1. Gate is separate from the on premise database
2. Gate is combined with the on premise database
For simplicity, the article follows the second option of Gate combined into the operational database.
Creating a Gate requires running the following commands:
The script takes the following steps:
- Create a Gate by calling Pdbinstall
- Link the Gate to the operational database. Call PdbcreatePartition.
- Lastly call PdbstartSync to publish the tables.
Note that in order to have all databases be able to “see” a table, the table must be published.
Create a non operational database in the cloud
This database stores non operational data, and is created remotely in the cloud. As discussed before, this data varies based on the nature of the company. Here is how to create it:
/*create the remote database. Note, @LinkedServerName=’Cloud’ should be set to the name of your cloud database*/
exec PdbcreatePartition @GateName=’BayMart’, @DatabaseName=’BayMartCloud’,@DatabaseTypeId=11, @PrimaryDatabaseName=’BayMart’;/*,@LinkedServerName=’Cloud’*/
Note that you need to set LinkedServerName parameter with your linked server name in the cloud.
Create a table at the Gate
We now create a new table. The table is created at the Gate level. In the example we create a table to store nonoperational data such as logs, and it will store the non operational data in the cloud.
Publish the table
The table that was created at the Gate should be replicated to the cloud database. This is done by publishing it. When publishing a table we need to pass the Gate name, the table name and the target database:
exec PdbtargetSplitTable @GateName='BayMart', @SchemaName='dbo', @TableName='SaleLogs', @DatabaseName='BayMartCloud'; --@LinkedServerName='Cloud'
Work on the distributed database as one database
To show our point, we now change the RandomSale procedure to use a tables from the two databases; on premise and cloud (originally, it was using the on premise tables only). The fact that there is more than one database is transparent to the user. The user keeps working as there is one database only. The Gate will take care of the dispatching to the correct database.
The following line is taken from the script below and is the most interesting one. It inserts records to the table that is in the cloud. The insert is done as if the table reside on premise:
insert into SaleLogs (OrderId, UserName, FirstName, LastName, CatalogNumber, ItemName, Quantity, BankName, CardNumber, LogTime,IP) values (@OrderId, @UserName, @FirstName, @LastName, @CatalogNumber, @ItemName, @Quantity, @BankName, @CardNumber, @OrderTime,@IP);
We can now run the stored procedure; we run it 1000 times:
Let’s look at the output
We can now see the content of the table in our database. Again, we don’t care where a table resides. We reference both databases as they are one. We can still refer the tables with their database name, as shown below:
-- Display cloud table
--select * from SaleLogs;
-- We can also use the database name to refer these tables
--select * from BayMartCloud..SaleLogs;
But what about performance?
Executing transactions in the cloud in a one by one manner may take longer than we can afford. In order to make things more efficient, we can accumulate the table’s values in a temporary table and update the table in the cloud in bulk mode. When it is time to update the table, we change the table’s name and load it to the corresponding table in the cloud. At the same time we keep working with our table.
More about performance – use compression
Using bulk transfers does improve performance, but we can do even better with compression.
In this case we will create a temporary database, copy the temporary table that we created before to the database, compress the database and then move to the cloud.
Below are the commands to achieve the above flow.
-- Create temporary database
exec PdbcreatePartition @GateName='BayMart', @DatabaseName='BayMartTemp', @DatabaseTypeId=11, @PrimaryDatabaseName='BayMart';
-- Direct temporary table into temp
exec PdbtargetSplitTable @GateName='BayMart', @SchemaName='dbo', @TableName='TempSaleLogs', @DatabaseName='BayMartTemp';
At this point we will change the procedure CloudLoad to accomplish the compression before moving the data to the cloud, and restoring at the cloud. The script is doing the following:
- Create new database for loading BayMartLoad
- Sync BayMartLoad with BayMartTemp
- Switch databases temp and load, so we keep updating temp and send load to the cloud
- Back up and compress BayMartLoad
- Move BayMartLoad database to the cloud
- Restore the database in cloud
- Bulk load the database into the table in the cloud
- Drop the database, it is not necessary anymore
Note that as in the previous script, it is required to set the name of the cloud server instead of the /*cloud*/.
This concludes our example. We can see that choosing a hybrid cloud solution can adds lots of value to a business, and can be fairly simple to implement. We encourage you to try!
The next step – partition non operational database
Taking into account that non operational data tend to be big, we can consider partitioning the non operational database. The partitioning will distribute the database to a few smaller databases (based on the business needs), and they will all be managed by the Gate. Working at the Gate level will make the partitioning transparent to the user. The different partition will also be able to reside anywhere, on premise, on the cloud or on a multi could configuration…. But we will keep something for next time…