Salin dan Bagikan
MariaDB ColumnStore: Analytics Storage Engine untuk Big Data - Panduan lengkap menggunakan MariaDB ColumnStore untuk analytics dan big data dengan columnar …

MariaDB ColumnStore: Analytics Storage Engine untuk Big Data

MariaDB ColumnStore: Analytics Storage Engine untuk Big Data

MariaDB ColumnStore adalah storage engine columnar yang dirancang khusus untuk analytical workloads dan big data. Setelah mengimplementasikan ColumnStore untuk data warehouse dengan miliaran rows, saya akan berbagi setup dan best practices.

Apa itu ColumnStore?

Columnar vs Row Storage

Row Storage (InnoDB/MyISAM):

  • Data disimpan per row
  • Optimal untuk OLTP (transactional)
  • Fast untuk point lookups
  • Efficient untuk writes

Columnar Storage (ColumnStore):

  • Data disimpan per column
  • Optimal untuk OLAP (analytical)
  • Fast untuk aggregations
  • Superior compression

Use Cases ColumnStore

  • Data warehousing
  • Analytics dan reporting
  • Big data aggregations
  • Historical data analysis
  • Business intelligence

Instalasi ColumnStore

1. Install dari Repository

# Add MariaDB repository
sudo apt-get install software-properties-common
sudo apt-key adv --fetch-keys 'https://mariadb.org/mariadb_release_signing_key.asc'
sudo add-apt-repository 'deb [arch=amd64] http://downloads.mariadb.com/ColumnStore/ubuntu jammy main'

# Install ColumnStore
sudo apt-get update
sudo apt-get install mariadb-columnstore-server

# Start services
sudo systemctl start mariadb-columnstore
sudo systemctl enable mariadb-columnstore

2. Konfigurasi

# Configure ColumnStore
sudo /usr/local/mariadb/columnstore/bin/postConfigure

# Setup storage
# Local storage atau shared storage

Membuat Tabel ColumnStore

1. Basic ColumnStore Table

-- Gunakan ColumnStore engine
CREATE TABLE sales_data (
    id BIGINT,
    sale_date DATE,
    product_id INT,
    customer_id INT,
    quantity INT,
    unit_price DECIMAL(10,2),
    total_amount DECIMAL(12,2),
    region VARCHAR(50),
    sales_rep VARCHAR(100)
) ENGINE=ColumnStore;

-- ColumnStore tables tidak mendukung:
-- - Primary keys
-- - Auto-increment
-- - Foreign keys
-- - Unique indexes

2. Partitioning

-- Partitioned ColumnStore table
CREATE TABLE large_sales (
    sale_date DATE,
    product_id INT,
    amount DECIMAL(12,2)
) ENGINE=ColumnStore
PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

Loading Data ke ColumnStore

1. Bulk Import dengan cpimport

# Prepare data file (CSV)
cat > sales_data.csv << EOF
2024-01-01,101,1,1001,5,99.99,499.95,North,John
2024-01-01,102,1,1002,3,149.99,449.97,South,Jane
EOF

# Import menggunakan cpimport
/usr/local/mariadb/columnstore/bin/cpimport \
    -s ',' \
    -e 1 \
    mydb \
    sales_data \
    sales_data.csv

2. INSERT dengan SELECT

-- Load dari InnoDB table
INSERT INTO sales_analytics_columnstore
SELECT * FROM sales_transactions_innodb
WHERE sale_date >= '2024-01-01';

3. Parallel Loading

# Split file dan load parallel
split -l 1000000 large_file.csv part_

for file in part_*; do
    /usr/local/mariadb/columnstore/bin/cpimport mydb sales_data "$file" &
done
wait

Query Optimization

1. Aggregation Queries

-- ColumnStore optimal untuk aggregations
SELECT 
    region,
    YEAR(sale_date) as year,
    SUM(total_amount) as total_sales,
    AVG(quantity) as avg_quantity,
    COUNT(*) as transaction_count
FROM sales_data
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region, YEAR(sale_date);

2. Filtered Aggregations

-- WHERE clause optimization
SELECT 
    product_id,
    SUM(total_amount) as revenue
FROM sales_data
WHERE region = 'North'
  AND sale_date >= '2024-01-01'
GROUP BY product_id
HAVING revenue > 10000;

3. Joins dengan ColumnStore

-- Join antara ColumnStore dan InnoDB
SELECT 
    cs.product_id,
    p.name,
    SUM(cs.total_amount) as revenue
FROM sales_data cs
INNER JOIN products p ON cs.product_id = p.id
WHERE cs.sale_date >= '2024-01-01'
GROUP BY cs.product_id, p.name;

Compression dan Storage

1. Compression Ratio

ColumnStore memberikan compression ratio 5:1 hingga 20:1 tergantung data.

-- Check table size
SELECT 
    TABLE_NAME,
    ENGINE,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) as data_mb,
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) as index_mb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'mydb';

2. Storage Management

# Check ColumnStore storage usage
df -h /usr/local/mariadb/columnstore/

# Monitor disk space
/usr/local/mariadb/columnstore/bin/mcsadmin getSystemDiskUsage

Maintenance

1. Statistics Update

-- Update statistics
ANALYZE TABLE sales_data;

-- ColumnStore mengupdate statistics automatic

2. Backup dan Restore

# Backup dengan mysqldump
mysqldump --single-transaction mydb sales_data > backup.sql

-- Restore
mysql mydb < backup.sql

-- Atau gunakan ColumnStore backup tools

Best Practices

  1. Gunakan untuk analytics only: Jangan gunakan untuk OLTP
  2. Bulk load: Avoid individual INSERTs
  3. Partition large tables: By date atau region
  4. Denormalize: Analytics tables tidak perlu fully normalized
  5. Monitor compression: Verify storage savings

Kesimpulan

ColumnStore adalah pilihan tepat untuk:

  • Large-scale analytics
  • Data warehousing
  • Historical data analysis
  • Compression requirements
  • Aggregations dan reporting

Dengan ColumnStore, analytical queries menjadi orders of magnitude lebih cepat.

Artikel Terkait

Link Postingan : https://www.tirinfo.com/mariadb-columnstore-analytics-engine-big-data/

Hendra WIjaya
Tirinfo
3 minutes.
3 February 2026