Sunday, October 5, 2008

Dataset v/s DataReader

In this post, I will be discussing the behavioral characteristics and the performance differences of the DataSet and the DataReader as well as indicate the suitability of use of these objects in various scenarios.

The Dataset is a "disconnected" data store. What this means is that the DataSet object need not maintain a connection with the database at all times, a connection is needed only at the time of fetching data and updating it. The DataSet can be populated with data using something like this ,

SqlConnection conn=new SqlConnection();
conn.ConnectionString="Data Source=.;Database=TempDB;Integrated Security=true;";
SqlDataAdapter da=new SqlDataAdapter("select * from Temp",conn);
DataSet ds=new DataSet();
//Process data in the DataSet ds

As can be seen from the above snippet, once the data has been read into the DataSet, the connection can be closed immediately. The data can still be accessed from within the DataSet.

DataReader, on the other hand, is a "connected" data store which means that there needs to be a connection maintained to the database in order to be able to access the values in the DataReader. The DataReader can be populated with data using something like this ,

SqlConnection conn = new SqlConnection();
conn.ConnectionString = "Data Source=.;Database=TempDB;Integrated Security=true;";
SqlCommand cmd = conn.CreateCommand();
cmd.CommandText = "select * from Temp";
cmd.CommandType = CommandType.Text;
SqlDataReader rdr = cmd.ExecuteReader();
while (rdr.Read())
//Process the data read from the DB

Notice here that the connection is closed only after iterating through the entire record set returned by the query.

As can be seen from above, a DataSet, although providing a lot of flexibility in terms of usage scenarios, is memory intensive. Since it is a disconnected data store, it stores all the data read from the database in memory. As can be inferred, this can really slow down the application if the DataSet is populated with millions of rows of data. On the other hand, the DataSet provides "random" access : any record stored in it can be directly accessed and records can be accessed in any order desired. One can also go back and forth through the DataSet records. Another important flexibility with DataSets is that data within it can be modified and the updates will be percolated to the database automatically. Thus, a DataSet is suitable for scenarios in which data update is needed or where random access is needed but should be used cautiously when huge data is being fetched from the database.

As for the DataReader, it is almost an opposite of the DataSet. Since it maintains an open connection to the database at all times, it needn't store data in local memory. Instead records are read in chunks on a need basis. Thus, it proves to be pretty efficient in terms of memory usage. However, this efficiency comes at a cost : records within a DataReader can be traversed only forward and that too, only once. If a record needs to be read a second time, the query needs to be executed again as there is no provision to move backwards through the DataReader. Also the data read through the DataReader is read only. Thus, a DataReader lends itself to scenarios where huge amounts of data are being read without having the need to update them or have random access over that data.


  1. Some other thing you might want to talk about here is that, if you have a table that is being updated by some other process at the databse end, you might as well get the records using the datareader, as the connection is maintained, in the case of a dataset it doesn't happen, as it is disconnected data architecture..

    Correct me if i am wrong.

  2. Hi nuclear reactor. I tried out the scenario you mention in your comment. What I did was to put a breakpoint immediately after the ExecuteReader() statement and when that breakpoint is hit, I updated the table and then ran the program till the end. However, I did not see the changes reflected in the reader. This might be because the reader might read certain number of records into a buffer so if you modify something in the records which have already been read into the buffer, the changes might not be reflected. If however, you modify a record which has not yet been read, it might reflect in the reader for the reason you mentioned : the connection is live and records are read on a need basis

  3. UPDATE : @nuclear : I tried the experiment with 10K records and it did take the updated values