Introduction:
Its a common scenario when you have a huge flat file in GB containing millions of records and you want to split it into small files. There are different ways of doing the same. I will demonstrate you how you can spit the file by using SSIS.Method-1: By using For each loop, Conditional split and Row Number
Control Flow
As you can see there are two Data Flow Tasks, one for loop container and two flat file connection managers are present.
Data Flow Task 1: DFT To get Source Count
This data flow is just to get the record count of the Big flat file. It has a Flat file source that reads data from the Big flat file and counts the row by using a row count transformation.
Variables
These are 4 variables:
- Counter: Used as the Counter in for loop
- endCounter: Upper bound of the counter in for loop
- reCount: Store the record count of the Big flat file
- splitCount: Count of record to be present in each split files
For Loop Container
This the condition placed inside for loop container.
Data Flow Task 2: Data File Splitter
As you can see it fetches the data from the Big flat file and passes it through a Script Component which attach a row number to each of the records.
This is the code present inside the Script component:
Once the row number is attached a conditional split transformation spits the file according to the row number. Following is the condition present inside the conditional split transformation:
After the file get spitted for the first cut it is being saved by a flat file destination. For each cut the flat file connection string changes dynamically and the flat file is saved with a different name. Following is the expression written inside the flat file destination for the Connection string property.
Pro
Simple easy to develop.
Con
Takes a lot of time.
Method-2: By using Script Component
Control Flow
Here as you can see there is only one data flow and inside that one flat file source is present. This flat file source fetches all the records from the file and passes it to a script component. The Script component splits the file into multiple flat files. Following code is present inside that:
#region Namespaces
using System;
using System.Data;
using
Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.IO;
#endregion
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain :
UserComponent
{
string
copiedAddressFile;
private StreamWriter textWriter;
private string columnDelimiter = ",";
private string filepath = @"C:\Work
Area\Innovation\Flatfile spit on a fly\test.txt";
private string[] columns;
public override void
PreExecute()
{
IDTSInput100 input = ComponentMetaData.InputCollection[0];
columns = new string[input.InputColumnCollection.Count];
for (int i = 0; i
< input.InputColumnCollection.Count; i++)
columns[i] = input.InputColumnCollection[i].Name;
}
public override void
PostExecute()
{
}
public
override void
AcquireConnections(object Transaction)
{
}
public override void
ProcessInput(int InputID,
Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer
buffer)
{
while (buffer.NextRow())
{
if (buffer[columns.Length-1].ToString() == "1")
{
filepath = @"C:\Work Area\Innovation\Flatfile
spit on a fly\test" + Convert.ToString(Convert.ToInt32(buffer[columns.Length - 1]) - 1) +
".txt";
textWriter = new StreamWriter(filepath,
false);
for (int i = 0;
i < columns.Length; i++)
{
textWriter.Write(columns[i]);
if (i != columns.Length - 1)
textWriter.Write(columnDelimiter);
}
textWriter.WriteLine();
}
if (Convert.ToInt32(buffer[columns.Length
- 1]) % 2000000 == 1)
{
textWriter.Close();
filepath = @"C:\Work Area\Innovation\Flatfile
spit on a fly\test" + Convert.ToString(Convert.ToInt32(buffer[columns.Length - 1]) - 1) +
".txt";
textWriter = new StreamWriter(filepath,
false);
for (int i = 0;
i < columns.Length; i++)
{
textWriter.Write(columns[i]);
if (i != columns.Length - 1)
textWriter.Write(columnDelimiter);
}
textWriter.WriteLine();
}
for (int j = 0;
j < columns.Length; j++)
{
textWriter.Write(buffer[j]);
if (j != columns.Length-1)
textWriter.Write(columnDelimiter);
else
textWriter.WriteLine();
}
if (buffer[columns.Length - 1].ToString() == "10000000000000")
textWriter.Close();
}
}
}
Pros
Faster to split the file.
Hi you are doing an excellent job. I was looking for this information. I discovered it on your web page it’s really awesome. I like that information. I am sure that these are your own views. They are really awesome.
ReplyDeleteProperty in Nehru Nagar Bhopal
Thanks Bhopal R.
DeleteGood POC Shakti. It is really helpful.
ReplyDeleteThanks Vijay..
DeleteThis comment has been removed by the author.
Deletedo you have this sample package available for learning
ReplyDeletePlease refer : http://radhajyotsna0.blogspot.in/2014/09/execute-sql-task.html
DeleteHi Shakti
DeletePlease send me the sample of above solutions. It would be a great help to me
Here you go..
Deletehttps://drive.google.com/file/d/0BzfpMhz6kh76cXI4NVpGdUVLMzg/view?usp=sharing
Hi Shakti,
DeleteThanks for the reply.But package given by you is not opening.
Thanks,Prabhat
Are you using SSIS 2012 version as it is created in SSIS 2012? I can able to open it using SSDT 2012.
DeleteI am using SQL 2008 R2
DeleteIn that case change the C# code present above to .Net and use it inside a script component of SSIS 2008 R2 package. That should work.
DeleteThank you so much for providing information about SSIS and other utilities.I really feel that their is a need to utilise more of its aspects.
ReplyDeleteSSIS Upsert
That's just what I need. I like to understand Method 2. What does it do ? How many files will it split into ? Is it based on size of the file of the number of rows ?
ReplyDeleteHello, Thanks for the post. We should add 1 to the endCounter variable expression so that the extraction will have batch file for the remaining files. For example, if we have total records 105,000 & split count 50,000 we will have only 2 batch files with the expression. If we add 1 in the expression, the 3rd batch file will be created with 5000 records.
ReplyDelete