Changelog¶
All notable changes to SecActPy will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
[0.3.1] - 2026-05-14¶
Fixed¶
- Docker
with-r/gpu-with-rimages: pinBiRewire 3.40.0from the Bioconductor 3.21 archive.BiRewirewas deprecated in Bioc 3.22 and removed in 3.23; the R 4.6 base image now ships Bioc 3.23, soBiocManager::install('BiRewire')returnedpackage 'BiRewire' is not available for Bioconductor version '3.23'and the R-image builds aborted at the verification step.SpaCET 1.4.0still hard-importsBiRewire(Imports: BiRewire), so dropping it isn't an option — the Dockerfile now installsBiRewire 3.40.0(last release, 2025-04-15) frombioconductor.org/packages/3.21/bioc/src/contrib/viaremotes::install_url(), with CRAN deps (igraph,slam,Rtsne,Matrix) resolving from RSPM.BiRewireis no longer in the Bioc 3.23 bulk-install list. Verified end-to-end:BiRewire 3.40.0 OK→All 38 required R packages verified OK→SpaCET 1.4.0 OKon amd64 and arm64 R variants.
Notes¶
- Python package contents are unchanged from v0.3.0. The patch is
Dockerfile-only:
with-randgpu-with-rimage builds were broken on v0.3.0 because of the upstream Bioconductor deprecation. CPU/GPU Python-only images and the PyPI wheel are unaffected.
[0.3.0] - 2026-05-14¶
Added¶
cuda_nativeridge backend: ctypes wrapper around RidgeCuda's compiled CUDA kernel (libridgecuda_native.so, vendored atsecactpy/_libs/). OnecudaLaunchKernelfor the full perm sweep — ~14× faster at smallmthan the per-iter CuPy path.resolve_backend('auto')preferscuda_native > cupy > numpywhen the library is present.- Real sparse path via
cusparseSpMM(ridge_cuda_sparse); no host densify, no CuPy fallback. In-kernel column normalization (β = (β_raw − c⊗μ)/σ) meanscol_center=True, col_scale=Trueis handled in C; older.sobuilds without the sparse symbol fall back to CuPy automatically. - C-side
build_inv_perm_table_srand()forrng_method='srand': byte-identical to the PythonCStdlibRNGbuilder, ~200× faster (<50 ms vs ~11 s atn=8141, n_rand=1000). End-to-endcuda_nativeis 5.6× faster thancupyon the GSE100093 fixture while staying bit-equivalent on β/SE/z/p. - Dash web app (
secactpy-appCLI, optional[app]extra): - Spatial tab — Visium / CosMx / Xenium upload, SecAct inference,
visualization via
secactpy.visualizationandspatial-gpuI/O. - Single-cell tab — pseudo-bulk inference and activity plots.
- Bulk tab — activity change and Kaplan-Meier cohort survival.
- Cache eviction on each new upload, button disable during inference, temp-file cleanup after upload reads.
secactpy.visualization(optional[viz]extra): nine SecAct-specific plotly functions —activity_distribution,celltype_activity_boxplot,activity_correlation,gene_expression_stats,celltype_expression_boxplot,celltype_distribution,spatial_density,activity_change_bar,risk_lollipop. Re-exported from package root.secactpy.downstream(optional[downstream]extra): post-inference analyses mirroring RSecAct/R/downstream.R.coxph_regression— delegates tospatial-gpu'ssecact_coxph_regressionwhen available; standalonelifelinesfallback otherwise.signaling_pattern/signaling_pattern_gene— NMF pattern discovery; delegates tospatial-gpuor falls back to sklearn NMF. KDTree replacesO(n²)cdistin the standalone path.ccc_scrnaseq— bulk scRNA-seq cell-cell communication (SecActpy-unique, no spatial-gpu equivalent).ccc_spatial— thin wrapper aroundspatial-gpu'ssecact_spatial_ccc(spatial-gpu required, no standalone fallback).logistic_regressionandlogit(fromsecactpy.glm) now exported from the package root.- Bulk vignette replication examples:
bulkChangeandbulkCohort.
Changed¶
- Docker / R stack: Replaced legacy
beibeiru/RidgeR(now archived) with optional acceleratorsdata2intelligence/RidgeFast(CPU, cross-platform) anddata2intelligence/RidgeCuda(GPU, Linux+NVIDIA only). INSTALL_R=truenow installs SecAct + RidgeFast by default. Thewith-rimage gains RidgeFast; thegpu-with-rimage gains RidgeFast + RidgeCuda. The legacy RidgeR is no longer installed in any image.- New build args
INSTALL_RIDGEFASTandINSTALL_RIDGECUDA(defaultauto) let users force-disable the accelerators to test SecAct's pure-R fallback. apptainer/build_sif.shverification now checks forRidgeFast(cpu-r) orRidgeFast+RidgeCuda(gpu-r) instead ofRidgeR.- Documentation: added native R install matrix for Linux / macOS / Windows in Installation.
- Docstrings across
secactpy/updated from "RidgeR" → "R SecAct" / "RidgeFast" where appropriate. Behavior is unchanged. - New build arg
CUPY_PACKAGEdecouples the CuPy version from the hardcodedcupy-cuda11x, so a future CUDA base-image bump to 12.x only needs--build-arg CUPY_PACKAGE=cupy-cuda12x. rng_methoddefault flipped fromNone(use_gsl_rng=True → CStdlibRNG) to explicit'srand'across all ridge entry points (ridge,ridge_with_precomputed_T, the four high-level inference functions,ridge_batch, and the three streaming entry points). Behaviorally equivalent by default; matches RidgeFast / RidgeCuda alignment.use_gsl_rngkept as legacy fallback whenrng_method=Noneis passed explicitly.ridgeNumPy backend now auto-picks Y-row vs T-col permutation bymvsp:m < ppermutes Y rows (β = T @ Y[fwd_perm[i], :]), else permutes T columns (existing path). 3.3× speedup on the GSE100093 fixture (m=17, p=1248). Operand-order difference moves cross-impl drift from bit-identical to ulp-level (still within 1e-10 tolerance).ridgeCuPy backend: perm-table H2D copy hoisted out of the permutation loop; per-batchmempool.free_all_blocks()dropped (final cleanup only). Output bit-identical.- Streaming inference: pass 2 (cross-term) and pass 3 (inference) now
share a single H5AD read (3 reads → 2).
normalize_chunkandaccumulatevectorized; H5AD string decoding vianp.char.decode._free_gpu_memory()and_format_ridge_results()helpers consolidate 15 duplicated blocks acrossridge/batch/streaming. resolve_backend()extracted inridge.py(dedupes 3 inline blocks);_validate_batch_inputs()inbatch.py;_load_sig_matrix()ininference.py;_get_h5_index()and_read_h5_sparse_matrix()incli.py. Type hints modernized (List/Tuple→list/tuple).rng.pyperm-table cache moved from a hardcoded path to XDG-compliant~/.cache/secactpy/.- CI: GitHub Actions bumped to Node 24-compatible majors
(
actions/checkout v6,setup-python v6,upload-artifact v7,download-artifact v8,docker/build-push-action v7,docker/login-action v4,docker/setup-buildx-action v4,peter-evans/dockerhub-description v5).
Fixed¶
Dockerfile: addedlibuv1-devso the Rfspackage builds from source (uncovered by thecpu-with-rbuild). Without it,fsand six transitive CRAN deps (networkD3,scatterpie,shiny,plotly,DT,factoextra) failed to install.H5ADChunkReader:read_obs_names()/read_var_names()handle H5AD files where the index column name is stored inobs.attrs['_index'](common in large consortium datasets like the Inflammation Atlas). Added"symbol"to the gene-column fallback list; negative categorical codes handled inread_var_column(); vectorized categorical reconstruction inread_obs_column().- Streaming: replaced
Y_chunk.T @ row_meanswithrow_means @ Y_chunkto avoid an unnecessary CSC→CSR copy on every chunk. secactpy.visualization.activity_correlation: first-subplot annotation now usesx domaininstead ofx1 domainfor the correct xref.- Dash app: temp files from uploaded payloads are now cleaned up after
the read; spatial callback no longer imports the unused
UI_COLORS. secactpy.glm: renamed Fisher information matrixI/I_inv→info/info_invto avoid the visually ambiguousI(ruff E741); documents intent explicitly.
Migration notes¶
- Reference H5AD files under
dataset/output/signature/*were generated with legacy RidgeR. They remain numerically valid (RidgeFast matches RidgeR to better than2e-14), but to fully switch the source of truth, re-runsbatch scripts/regenerate_r_reference.shagainst the new image once it's published. The script now installs RidgeFast automatically. - The R reference fixture under
tests/was regenerated on Biowulf (R 4.5.2, glibc 2.28) — the previous fixture was generated on a different platform with a differentrand()implementation, which caused SE/zscore/pvalue mismatches on test machines. All 37 tests now pass with exact numerical agreement (SEmax diff9.99e-17).
0.2.5 - 2026-02-26¶
Added¶
- Streaming H5AD processing for datasets with >5M cells that exceed available RAM. Two-pass chunk-reading algorithm via h5py reads CSR rows from H5AD without loading the full matrix. Pass 1 accumulates row/column statistics; pass 2 performs inference in chunks. Peak memory reduced from ~200 GB to ~3 GB for 5M-cell datasets. Results are numerically identical to the non-streaming path.
streaming=Trueandstreaming_chunk_size=50_000parameters onsecact_activity_inference_scrnaseq()andsecact_activity_inference_st()- New
H5ADChunkReaderclass for memory-efficient H5AD chunk reading - New
ridge_batch_streaming()orchestrator for two-pass streaming inference
Fixed¶
H5ADChunkReader.read_obs_names()andread_var_names()now handle H5AD files where the index column name is stored inobs.attrs['_index'](e.g.,'cellID') rather than as a literal_indexdataset. This is common in large consortium datasets like the Inflammation Atlas.
0.2.4 - 2026-02-20¶
Added¶
col_centerandcol_scaleparameters for independent control of sparse in-flight column normalization inridge_batch().
0.2.3 - 2026-02-15¶
Added¶
rng_methodparameter for explicit RNG backend selection ('srand','gsl','numpy') on all high-level inference functions.is_group_sig=Trueas the default (previouslyFalse).
0.2.2 - 2026-02-08¶
Added¶
sparse_mode=Trueparameter inridge(),ridge_batch(), and all high-level inference functions for memory-efficient processing of sparse Y matrices. Uses(Y.T @ T.T).Tto computeT @ Ywithout densifying Y, with column z-scoring applied as corrections on the small output matrix.- End-to-end sparse pipeline in
secact_activity_inference_scrnaseq()andsecact_activity_inference_st(): whensparse_mode=True, CPM normalization and log2 transform are applied directly on sparse matrices (both are zero-preserving), bypassing the densesecact_activity_inference()path. row_center=Trueparameter inridge_batch()for in-flight row-mean centering without densifying Y. Computes row-centered column statistics from sparse Y analytically and appliesT @ row_meanscorrection per permutation.
Fixed¶
from .ridge import ridge_batchininference.py--ridge_batchis defined inbatch.py, notridge.py. This causedImportErrorwhen callingsecact_activity_inference()orsecact_activity_inference_st()withbatch_sizeset.
0.2.1 - 2026-02-08¶
Added¶
- Streaming output (
output_path,output_compression) in all high-level inference functions:secact_activity_inference(),secact_activity_inference_scrnaseq(), andsecact_activity_inference_st() use_gsl_rngparameter inridge_batch()-- enables the ~70x faster NumPy RNG path for batch processing (previously hardcoded to GSL RNG)
Fixed¶
use_gsl_rngwas accepted bysecact_activity_inferencebut silently ignored byridge_batch, which always used the slower GSL RNG. Nowridge_batch(both dense and sparse paths) respects the flag.
Changed¶
- Expanded README batch processing documentation: explains what batch processing is, in-memory vs streaming modes, dense vs sparse handling, and includes downloadable example data
0.2.0 - 2025-01-06¶
Changed¶
- Official Release: Migrated to
data2intelligence - PyPI Package: Now available via
pip install secactpy - Updated all documentation and URLs to point to official repository
- Docker images now published to
psychemistz/secactpy
Added¶
- Comprehensive CI/CD pipeline with GitHub Actions
- Automated PyPI publishing on releases
- Automated Docker image builds (CPU, GPU, with-R variants)
- Enhanced test suite covering all major functionality
0.1.2 - 2024-12-XX¶
Added¶
- Ridge regression with permutation-based significance testing
- GPU acceleration via CuPy backend (9-34x speedup)
- Batch processing with streaming H5AD output for million-sample datasets
- Automatic sparse matrix handling in
ridge_batch() - Built-in SecAct and CytoSig signature matrices
- GSL-compatible RNG for R/RidgeR reproducibility
- Support for Bulk RNA-seq, scRNA-seq, and Spatial Transcriptomics
- Cell type resolution for ST data (
cell_type_col,is_spot_level) - Optional permutation table caching (
use_cache) - Command-line interface for common workflows
- Docker support with CPU, GPU, and R variants
Features¶
- High-Level API:
secact_activity_inference()- Bulk RNA-seq inferencesecact_activity_inference_st()- Spatial transcriptomics inference-
secact_activity_inference_scrnaseq()- scRNA-seq inference -
Core API:
ridge()- Single-call ridge regressionridge_batch()- Batch processing for large datasets-
load_signature()- Load built-in signature matrices -
Performance:
- GPU acceleration achieving 9-34x speedup
- Memory-efficient sparse matrix processing
-
Streaming output for very large datasets
-
Compatibility:
- Produces identical results to R SecAct/RidgeR
- GSL-compatible random number generator
- Cross-platform support (Linux, macOS, Windows)