In this article lets understand some of the nuances involved while loading data from External systems into the Marketing Cloud. Some of the questions that are always asked include how long does it take to load data into Marketing Cloud. As part of the Import process, any data file that exceeds 20 Mb has to be loaded through the File Transfer Protocol(Ftp) process.
The first step includes Extraction of data from sources such as Teradata/Oracle/Vertica into your internal Ftp site. This could take anywhere from few mins to a few hours depending on the file size. In general, always split the file into multiple files if it has got more number of records. It is advisable to have 6 million to 8 million records in each file and the size would be anywhere from 2 gb to 2.5 gb per file. This extraction process/time depends on the environment you are generating. If you are generating in Unix box it could take more time compared to generating in Amazon S3 server. On an S3 server it could take 1 hour to generate 10 files with 80 million rows(8 million per file).
After the extraction, We should move the file from our ftp server to SFMC site using Filezilla/Mft process depending on the organization’s internal protocols/process. Depending on the session configuration of Filezilla/Mft process it could take an hour to transfer 20gb of data from Internal ftp server to SFMC. There might be limitations on a number of files that could be transferred with each run(3 to 5 files) depending on the organization’s settings/internal process.
Processing in SFMC:
Once the files are moved to SFMC from our internal ftp server, Run the import process to load data into desired Data extensions. Alternatively, we can schedule the import activity to occur once a file is dropped into SFMC. We can map the import activity to load data into Data extensions based on the Column header of the incoming files. This import activity could load 30 million records into data extension in 60 mins. You could monitor this file runs in Automation Studio.
Parallel Runs: SFMC does not support the parallel running of multiple files into the Same Data Extension. I assume it might be having a table lock while one of the file is processing. One Great news is SFMC supports the parallel running of files into different data extensions at once.
Frequently Asked Questions:
1. Can we have one large file instead of multiple files during extraction from Source?
Ans. You could definitely have one large file instead of multiple files. We have extracted 50 gb file and transferred it to SFMC
2. Can we reduce FileZilla/Mft processing times by using smaller files?
Ans. Its advisable to move smaller files quickly through FileZilla/Mft process rather than huge files. It took 3 hours to move 50 gb file(single file) from our internal ftp server to SFMC ftp server
3. Is there any better way to optimize Extraction and filezilla/mft process?
Ans. In general filezilla/mft process could be designed to look at specific file formats instead of picking all the files(complete/incomplete) from the specified folder. One approach could be generating the file with a different file format so that filezilla/mft process will not pick the file until the file is complete. You could rename it once the file extraction is complete. While mft process is initiated to move to sfmc then start the extraction of other files with a different name and rename this file once the mft process has completed file transfer to sfmc.
4. How long it takes for sfmc to load data into data extension?
Ans. It could load 30 million records in an hour. This could vary based on number of columns
5. Can we load multiple files into Same Data extension in Parallel?
Ans. No, sfmc does not load multiple files into same data extension in parallel. It will queue up and load one after the other
6. Can we load multiple files into different Data extensions in Parallel?
Ans. Yes, sfmc loads multiple files into different Data extensions in parallel.
7. Can we work with Account Executive or Success Manager to Expedite the loads into Marketing Cloud?
Ans. No, Success manager could not expedite your loads into Marketing cloud.
8. Can we monitor files loads at SFMC?
Ans. You can check the file progress in the Automation studio. It will display file status/success/ Errors.
9. How can we view error records in SFMC?
Ans. You will receive an email notification once the import activity is complete with Record counts and attachment in case of errors. Configure the notification to be sent to the Distribution list instead of a single person.
10. Do you suggest partial loads before running full loads?
Ans. It is always advisable to test your file run times and data validations with partial loads(5 percent data) instead of Full loads. Your learnings from partial loads will help you save much more production time(Saving time during Extraction/Load)
Note: There are many factors involved while processing files and data and run times provided here are for guidance purposes only as it could change from organization to organization based on network/server capability/Number of columns in a file/etc.