17 packages tagged with “dataprocessing”
Open source bioinformatics and computational biology toolbox written in F#.
BioFSharp.IO contains read/write functions for a diverse set of biological file formats such as Fasta, FastQ, GeneBank or GFF, as well as helper function for searching on or transforming the input data
Core of RT Processing
BatchFlow is a .NET library that takes care of all the plumbing you need around large batch processing, especially the producer/consumer pattern that you (probably) should use.
A versatile pipelining library, created with media organization in mind and meant to be run as a service.
SmartParser is an open-source utility library designed to transform unstructured data into structured, strongly-typed objects using LLMs.
Make your workflow ML ready with BioFSharp.ML. Currently contains helper functionf for CNTK and a pre-trained model we used in our publication about predicting peptide observability.
BioFSharp.BioContainers gives you the possibility to leverage containerized applications without leaving you F# environment. We build on the fondation of Docker.DotNet to programmatically access the the REST API on top of the docker daemon. We provide special functions to use with biocontainers, which is a standardized way to create containerized bioinformatic software.
Image recognition and analysis using wavelet transformations
GPU parallelized functions from BioFSharp
API access to powerful popular bioinformatic databases
VersaTul Pipeline Infrastructure offers a powerful and elegant solution for object processing and transformation. This project fully implements the Pipeline and Filter pattern in a generic and flexible manner, allowing you to efficiently apply a series of filters to convert objects into their desired state. Ideal for scenarios where complex processing sequences are needed, VersaTul Pipeline Infrastructure simplifies and enhances your development workflow with its intuitive design and robust capabilities.
The VersaTul Extensions project provides a variety of methods for manipulating arrays, performing conversions and other common functionalities.
Data preprocessing library for CSV
DuckDB provider for ServiceStack.OrmLite - A fast, simple, and typed ORM for .NET. Enables high-performance analytical queries and data processing with DuckDB's columnar storage engine. NEW: BulkInsertWithDeduplication for massive datasets (845M+ rows)! Type-safe LINQ expressions, staging tables, BulkInsert (10-100x faster), multi-database, async/await. Community-maintained, not officially endorsed by ServiceStack.
WebSvc.UnifiedMessaging.Core is a .NET library for building scalable and extensible unified messaging workflows. It provides interfaces, repositories, and services for outbound message queuing, rate limiting, and quota enforcement. Designed with a pluggable architecture, it supports multiple data sources, delivery channels, and processing pipelines for validation, transformation, and enrichment.