Salin dan Bagikan
Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis - Panduan lengkap import data CSV ke MySQL dengan LOAD DATA INFILE, MySQL Workbench, dan script Python …

Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis

Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis

Import data dari CSV adalah task yang sering dilakukan dalam migrasi atau data integration. Setelah mengimport puluhan juta rows dari berbagai sumber, saya akan berbagi metode yang paling efisien dan reliable.

Persiapan CSV

1. Format CSV yang Ideal

id,name,email,created_at
1,John Doe,john@example.com,2024-01-15
2,Jane Smith,jane@example.com,2024-01-16
3,Bob Johnson,bob@example.com,2024-01-17

2. Pre-processing CSV

# Check encoding
file -i data.csv

# Convert ke UTF-8 jika perlu
iconv -f ISO-8859-1 -t UTF-8 data.csv > data_utf8.csv

# Remove BOM (Byte Order Mark)
sed '1s/^\xEF\xBB\xBF//' data_utf8.csv > data_clean.csv

# Check line endings
dos2unix data_clean.csv

# Validate CSV structure
head -5 data_clean.csv
wc -l data_clean.csv

3. Handle Special Characters

# Escape quotes
sed 's/"//g' data.csv > data_noquotes.csv

# Atau properly escape
awk -F, '{gsub(/"/,"\"\""); print}' data.csv > data_escaped.csv

# Handle newlines dalam fields
# Use proper CSV parser seperti csvkit
csvformat -U 1 data.csv > data_formatted.csv

Metode 1: LOAD DATA INFILE (Fastest)

1. Enable File Permissions

-- Cek current setting
SHOW VARIABLES LIKE 'secure_file_priv';

-- Jika hasil /var/lib/mysql-files/
-- Copy CSV ke directory tersebut
# Copy CSV ke MySQL data directory
sudo cp data.csv /var/lib/mysql-files/
sudo chown mysql:mysql /var/lib/mysql-files/data.csv

2. Create Table

CREATE TABLE customers_import (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at DATE,
    INDEX idx_email (email),
    INDEX idx_created (created_at)
) ENGINE=InnoDB;

3. Basic LOAD DATA

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id, name, email, created_at);

4. Advanced LOAD DATA dengan Transformasi

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = @id,
    name = TRIM(@name),
    email = LOWER(TRIM(@email)),
    created_at = STR_TO_DATE(@created_at, '%Y-%m-%d');

5. LOAD DATA dengan Validasi

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = @id,
    name = NULLIF(TRIM(@name), ''),
    email = CASE 
        WHEN @email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' 
        THEN LOWER(TRIM(@email)) 
        ELSE NULL 
    END,
    created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));

6. LOAD DATA untuk Large Files (Chunked)

-- Bagi file besar jadi chunks
-- Import chunk 1 (rows 1-1000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk1.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Import chunk 2 (rows 1000001-2000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk2.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

7. LOAD DATA dari Remote Client (Local)

# Dari client machine, upload file via MySQL client
mysql -u root -p -e "
LOAD DATA LOCAL INFILE '/path/to/local/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
"

Metode 2: MySQL Workbench

1. Table Data Import Wizard

Steps:
1. Connect ke database
2. Right-click schema → Table Data Import Wizard
3. Select CSV file
4. Configure:
   - Destination table (existing atau new)
   - Encoding: utf8mb4
   - Field separator: ,
   - Line separator: \n
   - Enclose strings: "
   - Escape character: \
   - First line is header: ✓
5. Column mapping
6. Review dan Import

2. Import dengan Stored Procedure

DELIMITER $$

CREATE PROCEDURE ImportCSVData(
    IN p_file_path VARCHAR(255),
    IN p_table_name VARCHAR(64)
)
BEGIN
    SET @sql = CONCAT('
        LOAD DATA INFILE ''', p_file_path, '''
        INTO TABLE ', p_table_name, '
        FIELDS TERMINATED BY '',''
        OPTIONALLY ENCLOSED BY ''"''
        LINES TERMINATED BY ''\n''
        IGNORE 1 ROWS
    ');
    
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
END$$

DELIMITER ;

-- Usage
CALL ImportCSVData('/var/lib/mysql-files/data.csv', 'customers_import');

Metode 3: Python Script dengan mysql-connector

1. Basic Python Import

import csv
import mysql.connector
from mysql.connector import Error

def import_csv_to_mysql(csv_file, table_name):
    try:
        # Connect ke database
        conn = mysql.connector.connect(
            host='localhost',
            database='your_database',
            user='your_username',
            password='your_password'
        )
        
        cursor = conn.cursor()
        
        # Read CSV
        with open(csv_file, 'r', encoding='utf-8') as file:
            csv_reader = csv.DictReader(file)
            
            # Prepare INSERT statement
            columns = csv_reader.fieldnames
            placeholders = ', '.join(['%s'] * len(columns))
            query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})"
            
            # Batch insert untuk performance
            batch_size = 1000
            batch = []
            
            for row in csv_reader:
                values = tuple(row.values())
                batch.append(values)
                
                if len(batch) >= batch_size:
                    cursor.executemany(query, batch)
                    conn.commit()
                    print(f"Inserted {cursor.rowcount} rows")
                    batch = []
            
            # Insert remaining rows
            if batch:
                cursor.executemany(query, batch)
                conn.commit()
                print(f"Inserted {cursor.rowcount} rows")
        
        print(f"Import completed: {csv_file}")
        
    except Error as e:
        print(f"Error: {e}")
    finally:
        if conn.is_connected():
            cursor.close()
            conn.close()

# Usage
import_csv_to_mysql('data.csv', 'customers_import')

2. Advanced Python Import dengan Validasi

import csv
import mysql.connector
import re
from datetime import datetime

class CSVImporter:
    def __init__(self, host, database, user, password):
        self.conn = mysql.connector.connect(
            host=host,
            database=database,
            user=user,
            password=password
        )
        self.cursor = self.conn.cursor()
        self.stats = {
            'imported': 0,
            'skipped': 0,
            'errors': 0
        }
    
    def validate_email(self, email):
        pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
        return re.match(pattern, email) is not None
    
    def validate_date(self, date_str, fmt='%Y-%m-%d'):
        try:
            datetime.strptime(date_str, fmt)
            return True
        except ValueError:
            return False
    
    def transform_row(self, row):
        """Transform dan validate data"""
        transformed = {}
        
        # ID
        try:
            transformed['id'] = int(row['id'])
        except (ValueError, KeyError):
            return None, "Invalid ID"
        
        # Name
        name = row.get('name', '').strip()
        if not name:
            return None, "Empty name"
        transformed['name'] = name[:100]  # Truncate
        
        # Email
        email = row.get('email', '').strip().lower()
        if not self.validate_email(email):
            return None, f"Invalid email: {email}"
        transformed['email'] = email
        
        # Date
        date_str = row.get('created_at', '')
        if not self.validate_date(date_str):
            return None, f"Invalid date: {date_str}"
        transformed['created_at'] = date_str
        
        return transformed, None
    
    def import_file(self, csv_file, table_name, batch_size=1000):
        with open(csv_file, 'r', encoding='utf-8') as file:
            reader = csv.DictReader(file)
            
            query = f"INSERT INTO {table_name} (id, name, email, created_at) VALUES (%s, %s, %s, %s)"
            batch = []
            
            for row_num, row in enumerate(reader, start=2):  # start=2 karena header row 1
                transformed, error = self.transform_row(row)
                
                if error:
                    print(f"Row {row_num}: {error}")
                    self.stats['skipped'] += 1
                    continue
                
                batch.append((
                    transformed['id'],
                    transformed['name'],
                    transformed['email'],
                    transformed['created_at']
                ))
                
                if len(batch) >= batch_size:
                    self._insert_batch(query, batch)
                    batch = []
            
            if batch:
                self._insert_batch(query, batch)
        
        self._print_stats()
    
    def _insert_batch(self, query, batch):
        try:
            self.cursor.executemany(query, batch)
            self.conn.commit()
            self.stats['imported'] += len(batch)
            print(f"Inserted batch of {len(batch)} rows")
        except Exception as e:
            self.conn.rollback()
            self.stats['errors'] += len(batch)
            print(f"Batch insert error: {e}")
    
    def _print_stats(self):
        print("\n=== Import Statistics ===")
        print(f"Imported: {self.stats['imported']}")
        print(f"Skipped: {self.stats['skipped']}")
        print(f"Errors: {self.stats['errors']}")
    
    def close(self):
        self.cursor.close()
        self.conn.close()

# Usage
importer = CSVImporter('localhost', 'mydb', 'user', 'pass')
importer.import_file('data.csv', 'customers_import')
importer.close()

Metode 4: mysqlimport Command Line

# Format CSV harus sesuai nama tabel
# Tabel: customers_import → File: customers_import.txt

# Convert CSV ke format mysqlimport
# (Tab-separated, no header)
csvformat -T data.csv > customers_import.txt

# Import
mysqlimport --local \
    --user=root \
    --password \
    --host=localhost \
    --fields-terminated-by='\t' \
    --lines-terminated-by='\n' \
    mydb \
    customers_import.txt

Handling Import Errors

1. Duplicate Key Errors

-- Use IGNORE untuk skip duplicates
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
IGNORE INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Atau REPLACE untuk update existing
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
REPLACE INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

2. Data Type Mismatch

-- Handle dengan SET clause
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = NULLIF(@id, ''),
    name = NULLIF(TRIM(@name), ''),
    email = NULLIF(TRIM(@email), ''),
    created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));

3. Large File Performance

-- Disable indexes selama import
ALTER TABLE customers_import DISABLE KEYS;

-- Import data
LOAD DATA INFILE ...

-- Re-enable indexes
ALTER TABLE customers_import ENABLE KEYS;

-- Atau untuk InnoDB
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
-- Import
SET GLOBAL innodb_flush_log_at_trx_commit = 1;

Best Practices

  1. Always backup sebelum bulk import
  2. Test dengan sample data dulu
  3. Use transactions untuk data integrity
  4. Monitor server resources (CPU, memory, disk)
  5. Import selama low-traffic hours
  6. Validate data sebelum dan setelah import
  7. Use staging table untuk complex transformations

Kesimpulan

Pilih metode berdasarkan kebutuhan:

  • LOAD DATA INFILE: Tercepat untuk large datasets
  • MySQL Workbench: GUI-friendly untuk small-medium data
  • Python Script: Most flexible dengan custom validation
  • mysqlimport: Command line convenience

Dengan preparation yang tepat, import CSV bisa smooth dan reliable.

Artikel Terkait

Link Postingan : https://www.tirinfo.com/cara-import-csv-ke-mysql-load-data-infile-workbench/

Hendra WIjaya
Tirinfo
6 minutes.
3 February 2026