Law Firm CMS Migration Tool
A comprehensive Python-based migration toolkit built to transition a law firm’s legacy database system to a modern WordPress platform with Advanced Custom Fields (ACF), preserving complex relationships and ensuring data integrity across 600+ records and associated media assets.
Key Features
- Multi-Stage Migration Pipeline - Orchestrated sequential migration of attorneys, cases, news articles, blog posts, and investigations while maintaining referential integrity across all entities and relationships
- Intelligent Relationship Mapping - Preserved complex many-to-many relationships between attorneys, cases, practice areas, and offices through sophisticated ID mapping and PHP serialization for ACF relationship fields
- Media Asset Migration - Automated upload of 200+ documents and images to WordPress media library via REST API, with proper attachment linking to parent posts and ACF file fields
- Taxonomy Synchronization - Migrated custom taxonomies (offices, positions, case statuses, practice areas) with slug generation and term relationship preservation across all post types
- ACF Field Population - Direct database insertion of 50+ ACF field types including text, WYSIWYG, dates, relationships, repeaters, and file uploads with proper field key mapping
- Incremental Migration Support - Built-in duplicate detection using tracking metadata, enabling safe re-runs and updates without creating duplicate content or breaking existing relationships
- Data Transformation Layer - Implemented slug generation, date formatting (MySQL → ACF Ymd format), HTML sanitization, and field truncation to match WordPress schema constraints
- Mail Recipient Integration - Created custom post type migration for notification recipients with attorney linkage and email validation, preserving communication workflows
Technical Implementation
Migration Architecture
The system employs a phased migration approach that respects entity dependencies:
Phase 1: Foundation - Taxonomies and categories are migrated first, establishing the classification structure (offices, positions, case statuses, practice areas). This creates the term IDs needed for later relationship assignments.
Phase 2: Core Entities - Attorneys are migrated with their profiles, photos, and biographical data, populating the attorney_id_map that serves as the cornerstone for all subsequent relationship mappings.
Phase 3: Content Migration - Cases, news, blog posts, and investigations are migrated sequentially, each script querying the old database for relationships and using the ID maps to establish connections in the new system.
Phase 4: Media & Documents - Files are uploaded through the WordPress REST API during content migration, with attachment IDs stored in ACF fields and post metadata.
Database Operations & Data Integrity
The toolkit performs direct MySQL operations on both source and destination databases using PyMySQL with dictionary cursors for readable field access. Each migration script implements:
- Transaction Safety - All operations are committed atomically after successful completion, with proper connection cleanup in finally blocks
- Duplicate Prevention - Before creating posts, the system checks for
_migrated_from_[entity]_idpostmeta entries to skip already-migrated content - Relationship Resolution - Attorney and practice area relationships are resolved by querying the old database for junction table entries, then mapping old IDs to new WordPress post IDs through cached dictionaries
For ACF relationship fields, the system generates PHP-serialized arrays in the format a:N:{i:0;i:POST_ID;i:1;i:POST_ID;...} which WordPress’s ACF expects for proper deserialization.
ACF Integration Strategy
Rather than using the WordPress REST API for post creation (which has limited ACF support), the migration directly inserts into wp_posts and wp_postmeta tables. For each ACF field:
- The field value is inserted with the field name as meta_key (e.g.,
ticker_case) - A corresponding field key reference is inserted with
_{field_name}as meta_key pointing to the ACF field key (e.g.,_ticker_case→field_683f50a99396c)
This dual-entry approach ensures ACF recognizes and properly loads the field values in the WordPress admin interface.
For complex fields like repeaters and groups:
- Group sub-fields use the format
parent_field_sub_field(e.g.,class_period_case_start_date_class_period_case) - Repeater items use indexed naming:
repeater_field_{index}_sub_fieldwith a parent meta entry storing the total count - All date fields are converted from MySQL datetime to ACF’s Ymd format (e.g.,
20250620)
WordPress REST API Media Handling
Media uploads utilize WordPress’s REST API with HTTP Basic Authentication (Application Passwords). The upload_media_to_wp() function:
# Constructs multipart/form-data request with file content
files = {'file': (file_name, file_object)}
response = requests.post(f"{WP_API_URL}/media", auth=auth, files=files)
The API returns the attachment ID and URL, which are then stored in post metadata. For featured images, the attachment ID is saved as _thumbnail_id. For ACF file fields, the ID is saved directly to the field’s meta entry.
Error handling includes file existence checks, request exception catching, and detailed logging of upload failures with HTTP response details for debugging.
Business Impact & Migration Outcomes
Successful Data Migration
The migration successfully transferred 600+ content records across five distinct post types (attorneys, cases, news, blog, investigations) while preserving all relationships and metadata. This included:
- 50+ attorney profiles with biographical data, photos, and practice area associations
- 300+ legal case records with class period dates, lead plaintiff deadlines, case documents, and attorney assignments
- 150+ news articles with attorney attributions and supporting documents
- 100+ blog posts with proper category and author assignments
Data Integrity & Validation
Zero data loss was achieved through comprehensive validation checks at each migration stage. The system maintained:
- All attorney-case relationships through junction table migration
- Practice area associations for cases, news, and attorneys
- Office and position taxonomy assignments
- Document-to-post linkages with proper ACF repeater structure
- Mail recipient notification configurations for automated case updates
Time Savings & Efficiency
Manual migration of 600+ records with complex relationships would have required an estimated 8-10 days of manual data entry, verification, and relationship establishment. The automated toolkit completed the migration in under 2 hours of execution time, representing a 95% time reduction.
The incremental migration capability also enabled iterative testing on staging environments, with the final production migration running smoothly after thorough validation.
Modernization Benefits
The migration enabled the law firm to:
- Transition from a legacy custom CMS to WordPress, gaining access to a vast ecosystem of plugins and themes
- Implement Advanced Custom Fields for structured content management, improving content editor experience
- Establish proper taxonomy hierarchies for better content organization and discovery
- Integrate with modern WordPress tools for SEO, analytics, and marketing automation
Development Approach & Quality Assurance
Script Organization & Modularity
The toolkit is organized into separate migration scripts for each content type (migrate_cases.py, migrate_news.py, migrate_blog.py, etc.), each following a consistent pattern:
- Configuration section with database credentials and ACF field key mappings
- Helper functions for database connections, slug generation, and metadata insertion
- ID map population functions that query for already-migrated content
- Taxonomy/dependency migration functions
- Main content migration function with relationship handling
- Script execution with proper connection management
This modular approach allowed parallel development and independent testing of each migration component.
Field Mapping Documentation
Comprehensive field mapping documentation (migration_field_mapping.md) was maintained throughout development, serving as the source of truth for:
- Old database field to new ACF field mappings
- Taxonomy associations and their WordPress equivalents
- Relationship table structures and how they map to ACF relationship fields
- Media handling approaches for different content types
This documentation ensured consistency across scripts and provided a reference for validation and troubleshooting.
Testing Strategy
Migration testing followed an iterative approach:
- Development Database Testing - Initial runs against a local copy of the source database to a fresh WordPress installation
- Staging Environment Validation - Full migration runs on staging with manual verification of content, relationships, and media
- Data Integrity Audits - SQL queries to compare record counts between old and new databases, ensuring no content was missed
- Relationship Verification - Spot-checking of attorney-case links, practice area associations, and taxonomy assignments in WordPress admin
- Production Migration - Final execution with real-time monitoring and immediate post-migration validation
The incremental migration support meant that any issues discovered could be fixed and the migration re-run without duplicating content.
Technical Challenges Overcome
ACF Repeater Field Complexity
ACF repeater fields require a specific metadata structure that isn’t well-documented. Through reverse-engineering of WordPress’s ACF plugin code and database inspection, I determined the correct format:
- Parent meta entry:
field_namewith value = total row count - Row entries:
field_name_{row_index}_sub_field_namewith indexed sub-field values - Field key references:
_field_name_{row_index}_sub_field_namepointing to ACF field keys
This enabled successful migration of document collections, where each case/news item had multiple associated PDF files with titles and file attachments.
PHP Serialization for Relationship Fields
WordPress ACF relationship fields store related post IDs as PHP-serialized arrays. Rather than using a PHP serialization library, I implemented a lightweight serializer specific to the array structure needed:
def php_serialize(ids):
return f"a:{len(ids)}:{{" + ''.join([f"i:{i};i:{id};" for i, id in enumerate(ids)]) + "}"
This generates the exact format ACF expects (a:2:{i:0;i:123;i:1;i:456;}), ensuring proper deserialization in WordPress.
Relationship Resolution Across Databases
Many relationships existed in junction tables (e.g., cases_attorneys, news_attorneys) that referenced IDs from the old system. The migration maintains ID mapping dictionaries that translate old IDs to new WordPress post IDs:
attorney_id_map = {} # old_id -> new_wp_post_id
# Populated during attorney migration
attorney_id_map[old_attorney_id] = new_post_id
# Used during case migration
for attorney_link in related_attorneys:
new_attorney_id = attorney_id_map[attorney_link['attorney_id']]
This approach ensures referential integrity even when WordPress assigns completely different post IDs than the original system.
Date Format Transformation
ACF date picker fields store dates in Ymd format (e.g., 20250620), while the source database used MySQL datetime. The migration converts dates during field population:
case['class_period_start'].strftime('%Y%m%d')
This ensures dates display correctly in ACF date picker fields without manual reformatting.
Project Technology Stack
Backend & Scripting:
- Python 3.x - Core scripting language for migration logic
- PyMySQL - MySQL database connectivity with dictionary cursor support
- Requests - HTTP library for WordPress REST API interactions
Target Platform:
- WordPress 6.x - Modern CMS with extensive customization capabilities
- Advanced Custom Fields Pro - Structured content management with custom post types, taxonomies, and relationship fields
- MySQL 8.x - Relational database for WordPress content storage
Development Tools:
- Git - Version control for migration scripts and documentation
- MySQL Workbench - Database inspection and validation queries
- Local WordPress development environment - Testing and validation
Data Transformation:
- Regular expressions for slug generation and text sanitization
- Unicode normalization for ASCII-safe URL slugs
- PHP serialization format generation for ACF compatibility
Lessons Learned & Best Practices
Direct Database Access vs. REST API
While WordPress REST API is excellent for post creation, ACF support is limited. For complex migrations involving custom fields, direct database insertion provides:
- Full control over field key mappings
- Support for all ACF field types including repeaters and groups
- Better performance by avoiding HTTP overhead for each field
However, this approach requires deep understanding of WordPress and ACF database schemas.
Incremental Migration Strategy
Building duplicate detection from day one was crucial. The ability to re-run migrations without creating duplicates enabled:
- Iterative development and testing
- Quick recovery from errors without full database resets
- Confidence in production migration execution
Comprehensive Documentation
Maintaining detailed field mapping documentation alongside code proved invaluable for:
- Ensuring consistency across multiple migration scripts
- Onboarding others to understand the migration logic
- Troubleshooting field population issues
- Serving as reference for future content updates
Relationship Migration Order
Migrating entities in dependency order (taxonomies → attorneys → content) was critical. This prevented situations where:
- Posts reference non-existent taxonomy terms
- Relationships point to attorneys not yet migrated
- ACF fields lack proper field key references
The phased approach with ID mapping ensured all references remained valid throughout the migration.
Future Enhancements
If extended for additional use cases, potential enhancements could include:
- CLI Interface - Add command-line argument parsing for config file paths, selective content type migration, and dry-run mode
- Progress Reporting - Real-time progress bars and estimated completion times for large migrations
- Validation Reporting - Automated post-migration validation with detailed reports of missing relationships or failed media uploads
- Rollback Capability - Transaction logging to enable full migration rollback if critical errors occur
- Config File Support - External JSON/YAML configuration for database credentials and field mappings instead of hardcoded values
- Parallel Processing - Concurrent migration of independent content types to reduce total execution time