What is Compression

  • By minimizing the disk space that is used by your data and indexes, it’s easy to save money
  • Helps improving I/O
  • Ability to store data rows in compressed format on disk
  • Saves up to 90% of row storage space
  • Ability to estimate possible compression ratio
  • Fits more data onto a page
  • Fits more data into buffer pool
  • Reduces logical log usage

How IDS Storage Optimization works!

  • By considering the entire row and all its columns
  • IDS looks for repeating patterns and stores those patterns as symbols in a compression dictionary

  • By considering the entire row and all its columns
  • IDS looks for repeating patterns and stores those patterns as symbols in a compression dictionary

Creating a compression dictionary

Compressing the data in table

  • After creating the dictionary, IDS starts a background process that goes through the table or fragment and compresses the table or fragment rows.
  • The process compresses each row and leaves it in the page where it was compressed. Any new rows that are inserted or updated are also compressed.
  • This compress operation runs while other transactions and queries are occurring on the table.
  • Therefore, IDS performs the operation in small transactions and holds locks on the rows being actively compressed for only a short duration.

Reclaiming free space

  • After all the rows have been repacked, the shrink operation removes the unused table or fragment space and returns free space to the dbspace that contains the table or fragment.

What we are using behind the scene!

  • Lempel-Ziv (LZ) based algorithm – static dictionary, built by random sampling
  • Frequently repeating patterns replaced with 12-bit symbol numbers
  • Any byte that does not match a pattern is also replaced with a 12-bit reserved symbol number
  • Patterns can be up to 15 bytes long
  • Max possible compression = 90% (15 bytes replaced with 1.5 bytes = 12 bits)

Performance Impact of Compression

  • IO-bound workloads
    • Compression may improve performance by reducing IOs (both data page and logical log)
    • More data fits on a page, therefore more in buffer pool
    • Log records are smaller, therefore less logging
  • For CPU-bound workloads
    • Additional CPU used to compress and expand rows
    • Should not be a large impact

What cannot be compressed

  • You cannot compress data in rows in the following database objects: tables or fragments that are in the sysmaster, sysutils, sysuser, syscdr, and syscdcv1 databases
  • Catalogs
  • Temporary tables
  • The tblspace tblspace
  • Internal partition tables
  • Dictionary tables (these tables, one per dbspace, hold compression dictionaries for the fragments or tables that are compressed in that dbspace and metadata about the dictionaries.)
  • You cannot compress a table while an online index build is occurring on the table
  • Encrypted data, data that is already compressed by another algorithm
  • Small tables are not good candidates for compression, because you might not be able to gain back enough space from compressing the rows to offset the storage cost of the compression dictionary.

Compression is a 3 Phase Operation

  • Compress
  • Repack
  • Shrink

And it has some prerequisites:

  • There must be at least 2,000 rows in each fragment of the table, not just a total of 2,000 rows in the whole table.
  • You must be able to connect to the sysadmin database (by default only user OneDB), and you must be a DBSA.
  • Logical and physical logs are large enough to handle normal processing and compression operations. Compression, repacking, and shrinking, operations can use large amounts of logs.

Storage optimization methods

  • You can use the COMPRESSED option in the CREATE TABLE statement to enable automatic compression of the table when the table has at least 2000 rows.
  • You can use the COMPRESSED option in the CREATE INDEX statement to enable automatic compression of the index if the index has 2000 or more keys. Compression is not enabled if the index has fewer than 2000 keys.
  • You can use the SQL administration API task or admin function to perform any type of storage optimization on a table, fragment, or index.

Compression with replication

  • All are supported on compressed tables
    • Tables will be compressed on secondary if they are compressed on primary
  • ER
    • Compression status of tables is independent between source and target, specified by user

Compression Benefits

  • Reduce the space occupied by the row
  • Compressing a table can be done online
  • Compress either a table or fragment
  • Smaller Archives
  • More data in the buffer pool
  • Fewer long/forwarded rows
  • Few I/O for same amount of data read/written
  • You have saved your money

Uncompressed data

  • You can uncompress a previously compressed table or fragment.
  • Uncompressing a table or fragment deactivates compression for new inserts and updates, uncompress all compressed rows, and marks the compression dictionary as inactive.
  • Once a dictionary is marked as inactive, IDS will no longer use it to compress new or changed rows.
  • The uncompress operation normally needs to allocate new pages for rows that no longer fit on their original pages
  • You can uncompress online or offline, using the uncompress or uncompress_offline argument.
  • An uncompress_offline operation is the same as the uncompress operation, except this operation is performed while holding an exclusive lock on the fragment, preventing all other access to the fragment data until the operation is completed

Uncompress Data- API’s

  • EXECUTE FUNCTION task(“table uncompress”, “table_name”, “database_name”, “owner_name”);
  • EXECUTE FUNCTION task(“table uncompress_offline”, “table_name”, “database_name”, “owner_name”);

Monitoring Interface

  • System-Monitoring Interface (SMI)
  • IDS stores compression dictionaries, both active and inactive, in a dictionary repository called rsccompdict.
  • Each dbspace that contains a compressed table or fragment has a dictionary repository.
  • The syscompdicts view in the sysmaster database brings together the information from all dictionary repositories. Each row represents a compression dictionary associated with a table or fragment by its part number (partnum).
  • Listing 3 shows an active dictionary that only has creation information (such as a creation timestamp, the log unique id, and the log). An inactive dictionary would have additional drop information.

Monitoring Interface : SMI/Oncheck

Compression Magic

How DBA Can get a good appraisal!

  • Reducing the number of extents, a table contains
  • Move all rows to the beginning of a table
  • Return unused space at the end of a table to the system
  • Shrink a partial used extent at the end of a table
Comment wrap
Further Reading
Data Management | April 7, 2021
Data Security
The discipline of data security manifests through many capabilities that protect information at rest, in motion, and in use.   The average cost of a data breach in 2019 was calculated at $3.92 million which involved 25,575 accounts that year according to a report by the Ponemon Institute and IBM Security. High-profile companies such as Capital One, Evite, and Zynga experienced data breaches that exposed more than 100 million customer accounts each. Data breeches must be disclosed to customers, so they can be costly events that result in multi-million dollar lawsuits and settlements.   Many organizations realize two things, the value of data and the cost to protect data are increasing simultaneously. This makes protecting data with patches a very cost-prohibitive solution. More security layers and patches only add an equal cost to the overall situation. Instead, IT teams must design and implement the right data management strategy from the beginning and, select the right solutions.   To make matters worse, the median data volume that companies currently have under management – in both structured and unstructured formats – is now greater than 630TB, which is expected to exceed 820TB within two years. Data breeches must be disclosed to customers, so they can be costly events that result in multimillion-dollar lawsuits and settlements.   One best practice area for any data security strategy is database protection. This includes such things as monitoring database activity to detect unusual user activity, as well as conducting regular access reviews to identify old and unnecessary permissions. Both of which OneDB  permits through its OneDB  Explore front-end UX tool. It includes encrypting data, which OneDB does at various levels: column level; all or selected database storage units – dbspaces and subspaces; backup archives; client-server & server-server communication. And of course, enforcing the least amount of privilege possible to carry out a function policy is also prudent.    OneDB is a safe, cost-effective and efficient choice for data management. To learn more about how the HCL OneDB data platform will support your data security strategy, visit us at https://hcltechsw.com/products/onedb.   
Data Management | March 23, 2021
Becoming Data Driven
We are living in the data age. Enterprises today are generating data at an unprecedented pace and require the ability to store and utilize data like never before.   But where is the data going? Everywhere. For many enterprises, data sprawl is a real challenge. Thanks to modern technology advancements, the complexity created by data sprawl is compounded by an enterprise’s continuing need to manage legacy systems, deploy modern systems, and respond to changing business conditions in a timely fashion.   Enterprises are proactively undertaking strategic efforts to use data to make more informed business decisions, operational improvements, organizational changes, and enhancements to the customer experience.  Consider all the different data types that enterprises are analyzing (image below). The median data volume that companies currently have under management – in both structured and unstructured formats – is now greater than 630TB, with that number expected to exceed 820TB within two years.    Moving Past the Barriers     The systems, data types, and the analytical processes enterprises needed to execute against their data, are likely to evolve. While it would be difficult to predict what will be popular in the future, we do know that machine learning will see significant adoption over the next few years. Machine learning thirsts for data; multiple data types from numerical, text, images and more are used to train and build machine learning models.    For enterprises, it’s imperative to take the long view on data and data platform systems. It is rarely feasible to adopt and implement every new technology or system that comes along. It’s been proven repeatedly that polyglot persistence might be good for one-off workloads, but that model soon breaks down at scale. HCL’s cloud native, multi-model database, OneDB, is designed to overcome those limitations and scale to meet future data demands, while significantly lowering TCO.   Why Choose HCL OneDB   OneDB provides a rich multi-model data platform for your...
Cloud Native, Data Management | November 19, 2020
Powering Cloud-Native Apps with OneDB
What is your company's strategy to managing the growing demands of continuous data and support cloud-native app development at today's rate? For many leaders they seek to modernize their data platform strategy to meet these challenges.  OneDB is a feature-rich and equally able to serve as the foundation for cloud solutions, embedded applications, and IoT or edge solutions. Whether you're ready to build brand cloud-native apps, rehost or re-platform applications to take advantage of the destination platform, HCL OneDB will set you at ease with it's multi-model, cloud-native capabilities, one step at a time.  HCL OneDB is well known for its reliability, performance, and simplicity. Whether deployed on premise, public cloud, or private cloud, clients will be able to gain further advantages.  Many of the unique advantages include: Always-on Transactions - Keep data available at all times, including zero downtime for maintenance and unplanned outages.  Increased Productivity - Stable multi-model data management allows you to focus and quickly deliver the right type of data for the business solutions you need.  Detecting Patterns - HCL OneDB is optimized to find anomalies and deviations for predictive analytics in Spatio-temporal data. Ease of Use - OneDB Explore, our modern graphical administration and monitoring tool for HCL OneDB gives you the ability to monitor what is critical, and take action on what is necessary to keep your business running smoothly.  To learn about OneDB's key capabilities visit our website or download our datasheet here.