What is Compression

  • By minimizing the disk space that is used by your data and indexes, it’s easy to save money
  • Helps improving I/O
  • Ability to store data rows in compressed format on disk
  • Saves up to 90% of row storage space
  • Ability to estimate possible compression ratio
  • Fits more data onto a page
  • Fits more data into buffer pool
  • Reduces logical log usage

How IDS Storage Optimization works!

  • By considering the entire row and all its columns
  • IDS looks for repeating patterns and stores those patterns as symbols in a compression dictionary

  • By considering the entire row and all its columns
  • IDS looks for repeating patterns and stores those patterns as symbols in a compression dictionary

Creating a compression dictionary

Compressing the data in table

  • After creating the dictionary, IDS starts a background process that goes through the table or fragment and compresses the table or fragment rows.
  • The process compresses each row and leaves it in the page where it was compressed. Any new rows that are inserted or updated are also compressed.
  • This compress operation runs while other transactions and queries are occurring on the table.
  • Therefore, IDS performs the operation in small transactions and holds locks on the rows being actively compressed for only a short duration.

Reclaiming free space

  • After all the rows have been repacked, the shrink operation removes the unused table or fragment space and returns free space to the dbspace that contains the table or fragment.

What we are using behind the scene!

  • Lempel-Ziv (LZ) based algorithm – static dictionary, built by random sampling
  • Frequently repeating patterns replaced with 12-bit symbol numbers
  • Any byte that does not match a pattern is also replaced with a 12-bit reserved symbol number
  • Patterns can be up to 15 bytes long
  • Max possible compression = 90% (15 bytes replaced with 1.5 bytes = 12 bits)

Performance Impact of Compression

  • IO-bound workloads
    • Compression may improve performance by reducing IOs (both data page and logical log)
    • More data fits on a page, therefore more in buffer pool
    • Log records are smaller, therefore less logging
  • For CPU-bound workloads
    • Additional CPU used to compress and expand rows
    • Should not be a large impact

What cannot be compressed

  • You cannot compress data in rows in the following database objects: tables or fragments that are in the sysmaster, sysutils, sysuser, syscdr, and syscdcv1 databases
  • Catalogs
  • Temporary tables
  • The tblspace tblspace
  • Internal partition tables
  • Dictionary tables (these tables, one per dbspace, hold compression dictionaries for the fragments or tables that are compressed in that dbspace and metadata about the dictionaries.)
  • You cannot compress a table while an online index build is occurring on the table
  • Encrypted data, data that is already compressed by another algorithm
  • Small tables are not good candidates for compression, because you might not be able to gain back enough space from compressing the rows to offset the storage cost of the compression dictionary.

Compression is a 3 Phase Operation

  • Compress
  • Repack
  • Shrink

And it has some prerequisites:

  • There must be at least 2,000 rows in each fragment of the table, not just a total of 2,000 rows in the whole table.
  • You must be able to connect to the sysadmin database (by default only user OneDB), and you must be a DBSA.
  • Logical and physical logs are large enough to handle normal processing and compression operations. Compression, repacking, and shrinking, operations can use large amounts of logs.

Storage optimization methods

  • You can use the COMPRESSED option in the CREATE TABLE statement to enable automatic compression of the table when the table has at least 2000 rows.
  • You can use the COMPRESSED option in the CREATE INDEX statement to enable automatic compression of the index if the index has 2000 or more keys. Compression is not enabled if the index has fewer than 2000 keys.
  • You can use the SQL administration API task or admin function to perform any type of storage optimization on a table, fragment, or index.

Compression with replication

  • All are supported on compressed tables
  • HDR/RSS/SDS
    • Tables will be compressed on secondary if they are compressed on primary
  • ER
    • Compression status of tables is independent between source and target, specified by user

Compression Benefits

  • Reduce the space occupied by the row
  • Compressing a table can be done online
  • Compress either a table or fragment
  • Smaller Archives
  • More data in the buffer pool
  • Fewer long/forwarded rows
  • Few I/O for same amount of data read/written
  • You have saved your money

Uncompressed data

  • You can uncompress a previously compressed table or fragment.
  • Uncompressing a table or fragment deactivates compression for new inserts and updates, uncompress all compressed rows, and marks the compression dictionary as inactive.
  • Once a dictionary is marked as inactive, IDS will no longer use it to compress new or changed rows.
  • The uncompress operation normally needs to allocate new pages for rows that no longer fit on their original pages
  • You can uncompress online or offline, using the uncompress or uncompress_offline argument.
  • An uncompress_offline operation is the same as the uncompress operation, except this operation is performed while holding an exclusive lock on the fragment, preventing all other access to the fragment data until the operation is completed

Uncompress Data- API’s

  • EXECUTE FUNCTION task(“table uncompress”, “table_name”, “database_name”, “owner_name”);
  • EXECUTE FUNCTION task(“table uncompress_offline”, “table_name”, “database_name”, “owner_name”);

Monitoring Interface

  • System-Monitoring Interface (SMI)
  • IDS stores compression dictionaries, both active and inactive, in a dictionary repository called rsccompdict.
  • Each dbspace that contains a compressed table or fragment has a dictionary repository.
  • The syscompdicts view in the sysmaster database brings together the information from all dictionary repositories. Each row represents a compression dictionary associated with a table or fragment by its part number (partnum).
  • Listing 3 shows an active dictionary that only has creation information (such as a creation timestamp, the log unique id, and the log). An inactive dictionary would have additional drop information.

Monitoring Interface : SMI/Oncheck

Compression Magic

How DBA Can get a good appraisal!

  • Reducing the number of extents, a table contains
  • Move all rows to the beginning of a table
  • Return unused space at the end of a table to the system
  • Shrink a partial used extent at the end of a table
Comment wrap
Further Reading
Data Management | November 19, 2020
Powering Cloud-Native Apps with OneDB
What is your company's strategy to managing the growing demands of continuous data and support cloud-native app development at today's rate? For many leaders they seek to modernize their data platform strategy to meet these challenges.  OneDB is a feature-rich and equally able to serve as the foundation for cloud solutions, embedded applications, and IoT or edge solutions. Whether you're ready to build brand cloud-native apps, rehost or re-platform applications to take advantage of the destination platform, HCL OneDB will set you at ease with it's multi-model, cloud-native capabilities, one step at a time.  HCL OneDB is well known for its reliability, performance, and simplicity. Whether deployed on premise, public cloud, or private cloud, clients will be able to gain further advantages.  Many of the unique advantages include: Always-on Transactions - Keep data available at all times, including zero downtime for maintenance and unplanned outages.  Increased Productivity - Stable multi-model data management allows you to focus and quickly deliver the right type of data for the business solutions you need.  Detecting Patterns - HCL OneDB is optimized to find anomalies and deviations for predictive analytics in Spatio-temporal data. Ease of Use - OneDB Explore, our modern graphical administration and monitoring tool for HCL OneDB gives you the ability to monitor what is critical, and take action on what is necessary to keep your business running smoothly.  To learn about OneDB's key capabilities visit our website or download our datasheet here. 
Data Management | October 1, 2020
How Open Global Transactions Make Your Secondary Stuck in Fast Recovery And How To Terminate Those Global Transactions
What is Global Transaction? A global transaction is a transaction that involves more than one database server. HCL OneDB database servers support two types of global transactions: TP/XA with a transaction manager and two-phase commit. HCL OneDB uses a two-phase commit protocol to ensure that distributed queries are uniformly committed or rolled back across multiple database servers. Global Transaction needs to be terminated when your secondary server is stuck in Fast recovery mode and is not coming online. So, we need to locate and terminate global transactions. Sometimes, GT can be terminated gracefully or not, based on their FLAGs. Scenario 1: Restarting Updateable secondary after a crash will get stuck in fast recovery mode until all open transactions are processed. Global transaction can be terminated gracefully. In this scenario, your Updateable secondary was crashed due to several reasons and upon starting, it gets stuck in fast recovery mode. The message below is in online.log of sds node: 12:13:04 Started processing open transactions on secondary during startup The secondary will not be operational until all the global transactions were cleared. The message above shows it is incomplete. The secondary will allow new sessions only if you see the completed string in the log. 20:10:05 Finished processing open transactions on secondary during startup. Example In the below example SDS was stuck in FR mode for almost 8 hours. We should look for the output of onstat –G from both primary and secondary. They should have different addresses in memory, but they can be identified by the "data" column. The Flag should have ‘H’ at the 3rd position, which means it was heuristically rolling back or rolled back. We can zap them using onmode –H 0x61fbe988 and onmode -H 0x61fbecf0 on SDS node. Immediately you will see your SDS will be in operational...
Data Management | October 1, 2020
Power of duplication & high availability with HCL OneDB
Power of duplication Some businesses assume that a robust backup schedule is enough to keep their business running in the event of a disaster. These same organizations often believe that data replication systems are too: Complex Expensive Simply beyond their requirements. Fortunately, all three of these beliefs (Complex, expensive & beyond requirements) are wrong. Here are the top 5 reasons why every business needs a data replication solution: 1. Maximizing Systems Availability 2. Reduced Time to Recovery 3. Lower Costs in the Event of Outage or Disaster 4. Fully Accurate Data Copies 5. Improved Data Protection High-Availability Data Replication (HDR) HDR is the oldest replication technology since IDS version 6. Consists of a pair of servers — the primary and the HDR secondary. Supports both synchronous and asynchronous replication modes. In synchronous mode, transactions on the primary server will not commit until it receives an acknowledgement from the HDR secondary server. Thus, the HDR secondary is immediately ready to take the place of the primary server — what is called a "hot" standby. In asynchronous mode, only checkpoints are synchronized between the primary and HDR secondary. One characteristic of HDR is that it uses a half-duplex communication protocol and thus is sensitive to network latency HDR - Strength's Shared Disk Secondary (SDS) The primary server and the SD secondary server share the same disk or disk array. Servers must be configured to access shared disk devices that allow concurrent access. If the SD secondary server instance and the primary server instance both are located on a single machine, then both servers can access local disks. If they are on separate physical machines, then they must be configured to access shared disk devices that appear locally attached. SD secondary servers can be used in conjunction with HDR secondary servers, with RS...
a/icon/common/search Created with Sketch.