Did you know it’s possible to access data stored in cloud object storage directly from your on-premises Oracle database?
You may be wondering why you would want to do this? Well apart from the fact that object storage is extremely economical at $0.01 USD a GB/Month, there are plenty of good reasons to do so, perhaps you wish to run machine learning algorithms in your database with data stored in AVRO or Parquet files staged in the cloud? Or perhaps you want to access data stored in a third-party cloud using Swift APIs? Or perhaps you want to use Oracle Hybrid Partitioning to hold your archived data in partitions mapped to the cloud? These are just some examples and I’m sure you can think of other applications too.
In this post I will show you how to set up a Hybrid Partitioned Table, but the other approaches are just as straightforward.
At a high level this blog post is broken into the following 5 steps:
- Install the DBMS_CLOUD package in the database
- Create an Object Storage Bucket in the cloud, and upload sample data into the bucket
- Create a credential to access the object storage bucket
- Create an External Table that maps to the bucket
- Create a Hybrid-Partitioned table with partitions stored in the bucket
First off, there are a few requirements for this to work:
- You must be running Oracle Database version 19.9 and above.
- This is only supported in a Container Database or Multitenant database.
- You need to install the DBMS_CLOUD package.
- You will need a C##CLOUD$SERVICE schema to hold the package.
- The database will require an SSL wallet and certificates for accessing the object storage and HTTPS URIs.
1. Install the DBMS_CLOUD package in the database
In order to access Cloud Object Storage, the database requires the DBMS_CLOUD package. The package provides a nice set of functions to perform the storage operations and if you’ve used the Oracle Autonomous Database, you should already be familiar with the package and what it can do for you.
Installing DBMS_CLOUD is fairly straightforward and is documented here:
Oracle Support Document 2748362.1 – How To Setup And Use DBMS_CLOUD Package.
The support document provides code that you can simply cut-and-paste to run in your favorite SQL editor. It even includes a script to create the common user called C##CLOUD$SERVICE which will own the package, but don’t worry, the schema is locked so you will not be left with a new account in your database to secure.
After creating the DBMS_CLOUD package, the next step is to setup the security components to connect to your cloud object store. This includes creating a Wallet to store your cloud certificates, downloading certificates and defining your Access Control Entries to permit your database to communicate over HTTPS. The good news is that all of these steps are also included in the support document and can be run by simply using cut-and-paste too.
Once you’ve installed the DBMS_CLOUD package and set up your cloud connection, you’re now ready to start using the DBMS_CLOUD package to connect to your cloud object storage bucket.
Note: Credit for the support document and great instructions go to: Hermann Baer Oracle Product Manager
2. Create an Object Storage Bucket and upload sample data into the bucket
Before we get started we will create a Bucket to store our data that we will be mapping to our database. Rather than guide you step by step, I have provided a short 30 second video to demonstrate how easy this is to do.
The following steps are being performed in the viewlet:
- Navigate to Object Storage
- Create a bucket
- Copy 4 files from my local disk to the bucket
- Get the object-handle for one of the files and paste into a notepad
We now have some sample data to work with.
3. Create a credential to access the object storage bucket
Next, we need a way to authenticate our database to our cloud object store. This is done using a Database Credential. In our cloud console we can create an Auth Token to permit the connection. The following animation demonstrates how to do this.
You should have noticed that the very last step was to “copy” the token to my clipboard. It is important to do this as the Token can never be displayed again. Though as you’ve seen, creating a new Token is a piece of cake.
Back to the database…
Once you have copied your token you are ready to create a Credential
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'DBCLOUD_TOKEN',
username => '<your cloud username>',
password => '<paste your auth token string here>' );
END;
/
The parameters required here are:
- credential_name – The name we gave our Auth Token in the cloud console.
- username – Our cloud account user name. In my case the cloud username includes a prefix: “oracleidentitycloudservice/email.address@email.com”.
- password – it’s the string that I copied from the cloud console.
btw – To drop (and recreate) your credential inside the database you can run the following command:
exec dbms_cloud.drop_credential('DBCLOUD_TOKEN') ;
You can see the credential by running the following query:
select * from USER_CREDENTIALS ;
4. Create an External Table that maps to the object storage bucket
Now that we’ve set up our connection for our cloud account, the next step is to check that we can see the contents of our Cloud Object Storage Bucket. We can use the following query to check this:
select *
from dbms_cloud.list_objects(
'DBCLOUD_TOKEN',
'https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxy/b/MyBucket/o/' )
/
The string used in this query is the URI we copied from our object storage bucket in Step 2. You should see something like the following.
In my first example I am creating an External Table that is mapped to a file stored in my Object Storage bucket. This step is not required to use Hybrid Partitioned Tables, but it provides a nice way to demonstrate how easy this is to use.
For this example, the file I am using is relatively small in size, it has just 100 rows with 5 columns. It contains the population changes for all 100 counties of North Carolina from 2010 to 2020.
For reference – here’s a sample of what my .csv file looks like: (displayed in excel and a text editor)
Creating the External Table is very straightforward- you specify the format being loaded using the json_object function. In my case it’s ‘csv’, but this could just as easily have been ‘avro’, ‘orc’ or ‘parquet’.
Additional information on the formatting operations is available in the online documentation for the Autonomous Database: https://docs.oracle.com/en/cloud/paas/autonomous-database/adbsa/format-options.html
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name => 'COUNTY_GROWTH',
credential_name => 'DBCLOUD_TOKEN',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxy/b/MyBucket/o/countygrowth.csv',
format => json_object('type' value 'csv', 'skipheaders' value '1'),
field_list => 'COUNTY,
CY2010,
CY2020,
CHANGE,
PERCENTAGE',
column_list => 'COUNTY VARCHAR2(20),
CY2010 NUMBER,
CY2020 NUMBER,
CHANGE NUMBER,
PERCENTAGE NUMBER' );
END;
/
It’s possible to verify that the external table contents are mapping as expected with the following query:
BEGIN
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE(
table_name => 'COUNTY_GROWTH',
schema_name => 'SCOTT'
);
END;
/
The output of this statement is placed into two tables called: VALIDATE$1_LOG and VALIDATE$1_BAD. Querying the VALIDATE$1_LOG table should look similar to this:
NOTE: The number placeholder in the table name will increment every time you call the package DBS_CLOUD.VALIDATE_EXTERNAL_TABLE…
select * from VALIDATE$1_LOG ;
SQL> select * from VALIDATE$1_LOG ;
RECORD
-----------------------------------------------------------------------
LOG file opened at 03/12/21 08:55:23
Bad File: VALIDATE$1_2205.bad
Field Definitions for table COUNTY_GROWTH
Record format DELIMITED BY
Data in file has same endianness as the platform
Rows with all null fields are accepted
Fields in Data Source:
COUNTY CHAR (255)
Terminated by ","
Enclosed by """ and """
CY2010 CHAR (255)
Terminated by ","
Enclosed by """ and """
CY2020 CHAR (255)
Terminated by ","
Enclosed by """ and """
CHANGE CHAR (255)
Terminated by ","
Enclosed by """ and """
PERCENTAGE CHAR (255)
Terminated by ","
Enclosed by """ and """
LOG file opened at 03/12/21 08:55:23
Total Number of Files=1
Data File: https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxy/b/MyBucket/o/countygrowth.csv
Log File: VALIDATE$1_9160.log
38 rows selected.
In my case the External Table was created without any issues so querying the VALIDATE$1_BAD returns no rows:
SQL> select * from VALIDATE$1_BAD
no rows selected
As I mentioned earlier, the file I used for this demo is relatively small, it has just 100 rows with 5 columns of information. So it’s possible to run a query against the entire table (or flat-file) to see it’s contents. But where’s the fun in that? Instead I can use a SQL query to be more selective with the data I am returning (This time I’m looking for the counties with over 100,000 residents that experienced a drop in population from 2010 to 2020.
SQL> select *
2 from COUNTY_GROWTH
3 where CY2020 <= CY2010
4 and CY2020 >= 100000
5 /
COUNTY CY2010 CY2020 CHANGE PERCENTAGE
-------------------- ---------- ---------- ---------- ----------
Craven 103498 103016 -482 -.5
Robeson 134229 131238 -2991 -2.2
But I am guessing this is not a very exciting demonstration. So let’s shift gears and take a look at using cloud object storage to host portions of a larger data-set.
5. Create a Hybrid-Partitioned table with partitions stored in the Object Storage bucket
For this example I will create a Hybrid Partitioned Table. The data set spans 17 years: 2005 to 2021 and I am placing the oldest data into 3 separate csv files – one for each 5 year grouping. I have uploaded the files to my object storage bucket and am creating partitions to transparently map the data using Range Partitioning with the most recent partitions remaining locally in my on-premises database.
Here’s the statement I am using to create the new table. You will notice that there is a specific procedure for defining a Hybrid Partitioned Table:
BEGIN
DBMS_CLOUD.CREATE_HYBRID_PART_TABLE (
table_name => 'RESTAURANT_INSPECTIONS',
credential_name => 'DBCLOUD_TOKEN',
format => json_object('type' value 'CSV', 'skipheaders' value '1'),
column_list => 'id number, inspect_date date, restaurant_name varchar2(100), address varchar2(200), city varchar2(40), state varchar2(2), zip_code varchar2(12), lat number, lon number',
partitioning_clause => 'partition by range (INSPECT_DATE) (
partition inspections_arch_2005 values less than (TO_DATE(''01-01-2006'',''DD-MM-YYYY''))
external location (''https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxy/b/MyBucket/o/inspections_arch_2005.csv''),
partition inspections_arch_2010 values less than (TO_DATE(''01-01-2011'',''DD-MM-YYYY''))
external location (''https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxy/b/MyBucket/o/inspections_arch_2010.csv''),
partition inspections_arch_2015 values less than (TO_DATE(''01-01-2016'',''DD-MM-YYYY''))
external location (''https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxy/b/MyBucket/o/inspections_arch_2015.csv''),
partition inspections_2020 values less than (TO_DATE(''01-01-2021'',''DD-MM-YYYY'')),
partition inspections_2021 values less than (TO_DATE(''01-01-2022'',''DD-MM-YYYY''))
)'
);
END ;
/
After creating the table it’s a good practice to verify that the data has been mapped without issue. The good news is that DBMC_CLOUD can do this for you:
BEGIN
DBMS_CLOUD.VALIDATE_HYBRID_PART_TABLE (table_name => 'RESTAURANT_INSPECTIONS') ;
END ;
/
Executing the procedure simply produces the following output:
PL/SQL procedure successfully completed.
But it does create 2 tables with the results: VALIDATE$n_LOG and VALIDATE$n_BAD. (The “n” will be a number that is incremented every time you call the procedure).
When I query the VALIDATE$2_LOG table I see a nice report of the DDL operation as follows:
SQL> select * from VALIDATE$2_LOG ;
RECORD
-------------------------------------------------------------------
LOG file opened at 03/15/21 12:50:27
Bad File: VALIDATE$2_2205.bad
Field Definitions for table RESTAURANT_INSPECTIONS
Record format DELIMITED BY
Data in file has same endianness as the platform
Rows with all null fields are accepted
Fields in Data Source:
ID CHAR (255)
Terminated by ","
Enclosed by """ and """
INSPECT_DATE CHAR (255)
Terminated by ","
Enclosed by """ and """
RESTAURANT_NAME CHAR (255)
Terminated by ","
Enclosed by """ and """
ADDRESS CHAR (255)
Terminated by ","
Enclosed by """ and """
CITY CHAR (255)
Terminated by ","
Enclosed by """ and """
STATE CHAR (255)
Terminated by ","
Enclosed by """ and """
ZIP_CODE CHAR (255)
Terminated by ","
Enclosed by """ and """
LAT CHAR (255)
Terminated by ","
Enclosed by """ and """
LON CHAR (255)
Terminated by ","
Enclosed by """ and """
Date Cache Statistics for table RESTAURANT_INSPECTIONS
Max Size: 1000
Entries : 204
Hits : 672
Misses : 0
LOG file opened at 03/15/21 12:50:27
Total Number of Files=3
Data File: https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxr/b/MyBucket/o/inspections_arch_2005.csv
Data File: https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxr/b/MyBucket/o/inspections_arch_2010.csv
Data File: https://objectstorage.us-ashburn-1.oraclecloud.com/n/ixxxxxxxxxxr/b/MyBucket/o/inspections_arch_2015.csv
Log File: VALIDATE$2_9160.log
LOG file opened at 03/15/21 12:50:27
Bad File: VALIDATE$2_2207.bad
Field Definitions for table RESTAURANT_INSPECTIONS
Record format DELIMITED BY
Data in file has same endianness as the platform
Rows with all null fields are accepted
Fields in Data Source:
ID CHAR (255)
Terminated by ","
Enclosed by """ and """
INSPECT_DATE CHAR (255)
Terminated by ","
Enclosed by """ and """
RESTAURANT_NAME CHAR (255)
Terminated by ","
Terminated by ","
Enclosed by """ and """
ADDRESS CHAR (255)
Terminated by ","
Enclosed by """ and """
CITY CHAR (255)
Terminated by ","
Enclosed by """ and """
STATE CHAR (255)
Terminated by ","
Enclosed by """ and """
ZIP_CODE CHAR (255)
Terminated by ","
Enclosed by """ and """
LAT CHAR (255)
Terminated by ","
Enclosed by """ and """
LON CHAR (255)
Terminated by ","
Enclosed by """ and """
Date Cache Statistics for table RESTAURANT_INSPECTIONS
Max Size: 1000
Entries : 477
Hits : 1798
Misses : 0
SQL>
But when I query VALIDATE$2_BAD I see no rows returned, which is a good thing:
SQL> select * from VALIDATE$2_BAD ;
no rows selected
Let’s go ahead and try a quick query of the table before inserting any data into the table:
SQL> select to_char(inspect_date,'YYYY') YEAR, count(*)
2 from restaurant_inspections
3 group by to_char(inspect_date, 'YYYY')
4 order by 1 ;
YEAR COUNT(*)
---- ------------------
2005 876
2010 1,086
2015 1,189
You can see the external data has already been mounted and is visible to query.
A quick query of the user_tables shows me the table is a Hybrid Partitioned Table:
SQL> select table_name, hybrid
2 from user_tables
3 where table_name = 'RESTAURANT_INSPECTIONS' ;
TABLE_NAME HYB
------------------------- ---
RESTAURANT_INSPECTIONS YES
I can also query USER_TAB_PARTITIONS
col table_name form A25
col partition_name form A25
select table_name,
partition_name,
tablespace_name,
segment_created,
indexing,
read_only
from user_tab_partitions
where table_name = 'RESTAURANT_INSPECTIONS' ;
TABLE_NAME PARTITION_NAME TABLESPACE_NAME SEGM INDE READ
------------------------- ------------------------- ------------------------------ ---- ---- ----
RESTAURANT_INSPECTIONS INSPECTIONS_2020 USERS NO ON NO
RESTAURANT_INSPECTIONS INSPECTIONS_2021 USERS NO ON NO
RESTAURANT_INSPECTIONS INSPECTIONS_ARCH_2005 SYSTEM YES OFF YES
RESTAURANT_INSPECTIONS INSPECTIONS_ARCH_2010 SYSTEM YES OFF YES
RESTAURANT_INSPECTIONS INSPECTIONS_ARCH_2015 SYSTEM YES OFF YES
SQL>
You will see that the 3 partitions that are in the object storage are Read-Only and listed as residing in the SYSTEM tablespace.
I’m now ready to insert data into the remaining (or in-database) partitions. I happen to have a table called “STAGE_TBL” with the recent/remaining sample data that I will use for this purpose. This table contains entries for every year after 2015. So it’s very straight forward to populate my table:
SQL> insert into RESTAURANT_INSPECTIONS
2 select * from STAGE_TBL ;
1206 rows created.
…and if we rerun our query we can see the additional rows of data without any indication that the data is stored in a completely different location:
SQL> select to_char(inspect_date,'YYYY') YEAR, count(*)
2 from restaurant_inspections
3 group by to_char(inspect_date, 'YYYY')
4* order by 1 ;
YEAR COUNT(*)
---- ------------------
2005 876
2010 1,086
2015 1,189
2016 219
2017 213
2018 234
2019 249
2020 274
2021 17
9 rows selected.
The last thing I want to demonstrate is what this looks like to the optimizer. To keep things simple I just enabled AUTO TRACE in my my session and re ran my query:
set autotrace on
Wrapping things up
Hopefully you found this post to be of use to you. I also hope you can see how very straightforward it is to setup your on-premises database to access data hosted in Oracle Cloud Object Storage. The same procedures can also be applied to leverage other cloud storage platforms like Amazon S3 and Azure BLOBs. Obviously the credentials would need to be defined to work with those platforms and the URI-string would follow the SWIFT protocol for the file mapping, but other than that, the steps remain the same.