⚠ Deprecated: Other
Name space refactored to match the features better.
Suggested alternative: Parquet.Data.Ado
A high-performance ADO.NET provider for Parquet files enabling seamless integration with .NET applications. Implements standard DbConnection/DbCommand/DbDataReader patterns for working with Parquet data through familiar ADO.NET abstractions. Features include SQL query support with filtering and projection, parallel reading of row groups, virtual column support, batch processing capabilities, and full async/await compatibility. Ideal for data analytics, ETL operations, and big data processing within the .NET ecosystem.
$ dotnet add package Parquet.Data.ReaderA .NET library that provides ADO.NET support for Parquet files, enabling seamless integration of Parquet data into .NET applications through familiar ADO.NET abstractions. This library is part of the Data Automation framework SQLFlow.
Parquet.Data.Ado bridges the gap between the Parquet file format and .NET applications by implementing standard ADO.NET interfaces. This allows developers to work with Parquet files using the same patterns they use for traditional database access.
dotnet add package Parquet.Data.Ado
// Connect to a Parquet file
using var connection = new ParquetConnection("path/to/file.parquet");
connection.Open();
// Create a command
using var command = connection.CreateCommand();
command.CommandText = "SELECT * FROM data";
// Execute and read the data
using var reader = command.ExecuteReader();
while (reader.Read())
{
// Access data by column index or name
var value = reader["column_name"];
// Process the data...
}
using var connection = new ParquetConnection("path/to/file.parquet");
connection.Open();
// Create an SQL command
using var sqlCommand = connection.CreateSqlCommand("SELECT column1, column2 FROM data WHERE column3 > 100");
// Execute the query and get a DataTable
var dataTable = await ParquetExtensions.ToDataTableAsync(sqlCommand.ExecuteReader());
// Create a virtual column definition
var virtualColumn = new VirtualColumn("ComputedColumn", typeof(decimal), 0.0m);
// Open a reader with the virtual column
using var reader = ParquetDataReaderFactory.CreateWithVirtualColumns(
"path/to/file.parquet",
new[] { virtualColumn });
// Now the reader includes the virtual column
while (reader.Read())
{
decimal computedValue = reader.GetDecimal(reader.GetOrdinal("ComputedColumn"));
// Use the virtual column value...
}
// Create a batch reader for large files
using var batchReader = new ParquetBatchReader("path/to/large_file.parquet");
// Process batches asynchronously
await foreach (var batch in batchReader.ReadAllAsync())
{
Console.WriteLine($"Processing batch {batch.RowGroupIndex} with {batch.RowCount} rows");
// Process the batch...
foreach (var column in batch.Columns)
{
// Access column data...
}
}
// Create or obtain a DataTable
var dataTable = new DataTable("MyData");
// ... populate the table ...
// Export to Parquet
await dataTable.ExportToParquetAsync("output.parquet");
The library includes a SQL parser that supports:
SELECT statements with column selectionWHERE clauses with complex conditions=, <>, <, >, <=, >=)AND, OR, NOT)LIKEIN, BETWEEN, and IS NULLThe GitHub repository includes extensive tests that demonstrate the library's functionality. These tests can be used as examples and modified to suit your specific requirements.
This library is a component of SQLFlow, a comprehensive Data Automation framework that enables seamless data integration, transformation, and analysis across various data sources.
This project is licensed under the MIT License - see the LICENSE file for details.