Salin dan Bagikan
Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis
Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis
Import data dari CSV adalah task yang sering dilakukan dalam migrasi atau data integration. Setelah mengimport puluhan juta rows dari berbagai sumber, saya akan berbagi metode yang paling efisien dan reliable.
Persiapan CSV
1. Format CSV yang Ideal
id,name,email,created_at
1,John Doe,john@example.com,2024-01-15
2,Jane Smith,jane@example.com,2024-01-16
3,Bob Johnson,bob@example.com,2024-01-17
2. Pre-processing CSV
# Check encoding
file -i data.csv
# Convert ke UTF-8 jika perlu
iconv -f ISO-8859-1 -t UTF-8 data.csv > data_utf8.csv
# Remove BOM (Byte Order Mark)
sed '1s/^\xEF\xBB\xBF//' data_utf8.csv > data_clean.csv
# Check line endings
dos2unix data_clean.csv
# Validate CSV structure
head -5 data_clean.csv
wc -l data_clean.csv
3. Handle Special Characters
# Escape quotes
sed 's/"//g' data.csv > data_noquotes.csv
# Atau properly escape
awk -F, '{gsub(/"/,"\"\""); print}' data.csv > data_escaped.csv
# Handle newlines dalam fields
# Use proper CSV parser seperti csvkit
csvformat -U 1 data.csv > data_formatted.csv
Metode 1: LOAD DATA INFILE (Fastest)
1. Enable File Permissions
-- Cek current setting
SHOW VARIABLES LIKE 'secure_file_priv';
-- Jika hasil /var/lib/mysql-files/
-- Copy CSV ke directory tersebut
# Copy CSV ke MySQL data directory
sudo cp data.csv /var/lib/mysql-files/
sudo chown mysql:mysql /var/lib/mysql-files/data.csv
2. Create Table
CREATE TABLE customers_import (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at DATE,
INDEX idx_email (email),
INDEX idx_created (created_at)
) ENGINE=InnoDB;
3. Basic LOAD DATA
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id, name, email, created_at);
4. Advanced LOAD DATA dengan Transformasi
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
id = @id,
name = TRIM(@name),
email = LOWER(TRIM(@email)),
created_at = STR_TO_DATE(@created_at, '%Y-%m-%d');
5. LOAD DATA dengan Validasi
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
id = @id,
name = NULLIF(TRIM(@name), ''),
email = CASE
WHEN @email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
THEN LOWER(TRIM(@email))
ELSE NULL
END,
created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));
6. LOAD DATA untuk Large Files (Chunked)
-- Bagi file besar jadi chunks
-- Import chunk 1 (rows 1-1000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk1.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Import chunk 2 (rows 1000001-2000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk2.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
7. LOAD DATA dari Remote Client (Local)
# Dari client machine, upload file via MySQL client
mysql -u root -p -e "
LOAD DATA LOCAL INFILE '/path/to/local/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
"
Metode 2: MySQL Workbench
1. Table Data Import Wizard
Steps:
1. Connect ke database
2. Right-click schema → Table Data Import Wizard
3. Select CSV file
4. Configure:
- Destination table (existing atau new)
- Encoding: utf8mb4
- Field separator: ,
- Line separator: \n
- Enclose strings: "
- Escape character: \
- First line is header: ✓
5. Column mapping
6. Review dan Import
2. Import dengan Stored Procedure
DELIMITER $$
CREATE PROCEDURE ImportCSVData(
IN p_file_path VARCHAR(255),
IN p_table_name VARCHAR(64)
)
BEGIN
SET @sql = CONCAT('
LOAD DATA INFILE ''', p_file_path, '''
INTO TABLE ', p_table_name, '
FIELDS TERMINATED BY '',''
OPTIONALLY ENCLOSED BY ''"''
LINES TERMINATED BY ''\n''
IGNORE 1 ROWS
');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;
-- Usage
CALL ImportCSVData('/var/lib/mysql-files/data.csv', 'customers_import');
Metode 3: Python Script dengan mysql-connector
1. Basic Python Import
import csv
import mysql.connector
from mysql.connector import Error
def import_csv_to_mysql(csv_file, table_name):
try:
# Connect ke database
conn = mysql.connector.connect(
host='localhost',
database='your_database',
user='your_username',
password='your_password'
)
cursor = conn.cursor()
# Read CSV
with open(csv_file, 'r', encoding='utf-8') as file:
csv_reader = csv.DictReader(file)
# Prepare INSERT statement
columns = csv_reader.fieldnames
placeholders = ', '.join(['%s'] * len(columns))
query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})"
# Batch insert untuk performance
batch_size = 1000
batch = []
for row in csv_reader:
values = tuple(row.values())
batch.append(values)
if len(batch) >= batch_size:
cursor.executemany(query, batch)
conn.commit()
print(f"Inserted {cursor.rowcount} rows")
batch = []
# Insert remaining rows
if batch:
cursor.executemany(query, batch)
conn.commit()
print(f"Inserted {cursor.rowcount} rows")
print(f"Import completed: {csv_file}")
except Error as e:
print(f"Error: {e}")
finally:
if conn.is_connected():
cursor.close()
conn.close()
# Usage
import_csv_to_mysql('data.csv', 'customers_import')
2. Advanced Python Import dengan Validasi
import csv
import mysql.connector
import re
from datetime import datetime
class CSVImporter:
def __init__(self, host, database, user, password):
self.conn = mysql.connector.connect(
host=host,
database=database,
user=user,
password=password
)
self.cursor = self.conn.cursor()
self.stats = {
'imported': 0,
'skipped': 0,
'errors': 0
}
def validate_email(self, email):
pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
return re.match(pattern, email) is not None
def validate_date(self, date_str, fmt='%Y-%m-%d'):
try:
datetime.strptime(date_str, fmt)
return True
except ValueError:
return False
def transform_row(self, row):
"""Transform dan validate data"""
transformed = {}
# ID
try:
transformed['id'] = int(row['id'])
except (ValueError, KeyError):
return None, "Invalid ID"
# Name
name = row.get('name', '').strip()
if not name:
return None, "Empty name"
transformed['name'] = name[:100] # Truncate
# Email
email = row.get('email', '').strip().lower()
if not self.validate_email(email):
return None, f"Invalid email: {email}"
transformed['email'] = email
# Date
date_str = row.get('created_at', '')
if not self.validate_date(date_str):
return None, f"Invalid date: {date_str}"
transformed['created_at'] = date_str
return transformed, None
def import_file(self, csv_file, table_name, batch_size=1000):
with open(csv_file, 'r', encoding='utf-8') as file:
reader = csv.DictReader(file)
query = f"INSERT INTO {table_name} (id, name, email, created_at) VALUES (%s, %s, %s, %s)"
batch = []
for row_num, row in enumerate(reader, start=2): # start=2 karena header row 1
transformed, error = self.transform_row(row)
if error:
print(f"Row {row_num}: {error}")
self.stats['skipped'] += 1
continue
batch.append((
transformed['id'],
transformed['name'],
transformed['email'],
transformed['created_at']
))
if len(batch) >= batch_size:
self._insert_batch(query, batch)
batch = []
if batch:
self._insert_batch(query, batch)
self._print_stats()
def _insert_batch(self, query, batch):
try:
self.cursor.executemany(query, batch)
self.conn.commit()
self.stats['imported'] += len(batch)
print(f"Inserted batch of {len(batch)} rows")
except Exception as e:
self.conn.rollback()
self.stats['errors'] += len(batch)
print(f"Batch insert error: {e}")
def _print_stats(self):
print("\n=== Import Statistics ===")
print(f"Imported: {self.stats['imported']}")
print(f"Skipped: {self.stats['skipped']}")
print(f"Errors: {self.stats['errors']}")
def close(self):
self.cursor.close()
self.conn.close()
# Usage
importer = CSVImporter('localhost', 'mydb', 'user', 'pass')
importer.import_file('data.csv', 'customers_import')
importer.close()
Metode 4: mysqlimport Command Line
# Format CSV harus sesuai nama tabel
# Tabel: customers_import → File: customers_import.txt
# Convert CSV ke format mysqlimport
# (Tab-separated, no header)
csvformat -T data.csv > customers_import.txt
# Import
mysqlimport --local \
--user=root \
--password \
--host=localhost \
--fields-terminated-by='\t' \
--lines-terminated-by='\n' \
mydb \
customers_import.txt
Handling Import Errors
1. Duplicate Key Errors
-- Use IGNORE untuk skip duplicates
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
IGNORE INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Atau REPLACE untuk update existing
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
REPLACE INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
2. Data Type Mismatch
-- Handle dengan SET clause
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
id = NULLIF(@id, ''),
name = NULLIF(TRIM(@name), ''),
email = NULLIF(TRIM(@email), ''),
created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));
3. Large File Performance
-- Disable indexes selama import
ALTER TABLE customers_import DISABLE KEYS;
-- Import data
LOAD DATA INFILE ...
-- Re-enable indexes
ALTER TABLE customers_import ENABLE KEYS;
-- Atau untuk InnoDB
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
-- Import
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
Best Practices
- Always backup sebelum bulk import
- Test dengan sample data dulu
- Use transactions untuk data integrity
- Monitor server resources (CPU, memory, disk)
- Import selama low-traffic hours
- Validate data sebelum dan setelah import
- Use staging table untuk complex transformations
Kesimpulan
Pilih metode berdasarkan kebutuhan:
- LOAD DATA INFILE: Tercepat untuk large datasets
- MySQL Workbench: GUI-friendly untuk small-medium data
- Python Script: Most flexible dengan custom validation
- mysqlimport: Command line convenience
Dengan preparation yang tepat, import CSV bisa smooth dan reliable.
Artikel Terkait
Link Postingan : https://www.tirinfo.com/cara-import-csv-ke-mysql-load-data-infile-workbench/
Editor : Hendra WIjaya
Publisher :
Tirinfo
Read : 6 minutes.
Update : 3 February 2026