YODA is a small set of data analysis (specifically histogramming) classes
being developed by MCnet members as
a lightweight common system for MC event generator validation analyses,
particularly as the core histogramming system in Rivet.
YODA is a refreshingly clean, natural and powerful way to do histogramming... and
there are plenty of improvements still to come. Our mission is to make the most
powerful, expressive, and focused approach to binned computational data handling,
with the nicest possible balance of power and simplicity in the user interface.
We hope you'll agree it's a good thing, but if not (or even if so) please get in
touch and let us know about your thoughts, problems, and feature requests.
2018-09-24: YODA release 1.7.3
The 1.7 YODA release series sets in place several forward compatibility features with the
eventual YODA 2.0 series, such as an explicit yoda1 Python module, use of YAML for
persistent annotations, and a persistency versioning system (now on v2). YODA now also
supports zipped reading and writing of data files/streams. In addition there have been
many small bugfixes and script/API improvements. YODA 1.7.1 supports multiple error bars
on Scatter points; version 1.7.2 improves yodadiff and enables use of the binAt() method on
const histograms; 1.7.3 fixes various minor bugs and adds more Pythonic accessors to aid
plotting via matplotlib.
See the ChangeLog
We recommend this version of YODA for immediate use. Please let us know of any issues
or improvement suggestions, and we will try to get them into an new version asap.
Storage of all information needed for statistically correct run combination and
reweighting up to second-order correlations (e.g. variances, std devs, etc.) not just in
the number of entries in a bin, but also the correlations of that with the x and y fill
Separation of statistics and data handling from presentation. YODA is primarily a
library for doing the data part correctly: while we love really high
quality data presentation, that's a separate goal.
A sensible class hierarchy for histogramming, recognising that a histogram
contains details of fill history beyond the pure visual height of a bin, and that just
counting weights, or binning arbitrary types on an axis are valuable operations.
Flexible data format support, including a new text-based, compact, and human-readable
Proper and convenience treatment of "details" like irregular bin widths, gaps
in contiguous binning, and overflows/underflows/etc. (incuding how they impact
normalisation and calculation of histo-wide stat quantities)
Carefully designed programming interfaces in C++ and Python. We are very welcoming of
feedback and design evolution, too!
Several feature areas are planned for extension and redevelopment. Please get in
touch if you can contribute code or design ideas to these:
Plotting: plotting is currently included as a rudimentary matplotlib
interface in yoda.plot, but is mostly performed via the Rivet make-plots
scripts. These will be incorporated into Rivet, and the YODA type interfaces optimised
for plotting via matplotlib, e.g. by adding methods which return exactly the plot-point
arrays required for various mpl structures. For now, this
script may be a helpful starting point for matplotlib plotting of YODA data types.
Generalised binned data types: histograms are currently the only binned
data type, but can only be assembled through weighted fills. This means that data
types such as reference data (and error) storage need to use Scatters and then manually
match bin edges to scatter errors: yuck. YODA will be redeveloped so that histograms
are built on a generic "binned storage" type, which can contain any sort of object --
and perform higher-level operations such as bin-merging if the stored type supports a
First-class overflows: overflow bins are not currently treated as bins,
since they have no meaningful width and therefore no meaningful "height" measure.
Overflows are also second-class citizens in 2D (and higher) histograms, since a full
indexing scheme is needed for them, in order to combine them with "real" bins when
projecting or profiling in either variable. YODA will be extended with a binning
scheme which always covers the whole real-number space in each variable, making
overflows fully fledged bins and enabling more complex operations in N-dim space.