YODA is hosted by Hepforge, IPPP Durham

YODA — Yet more Objects for Data Analysis

small, mean and full of Jedi magic

YODA is a small set of data analysis (specifically histogramming) classes being developed by MCnet members as a lightweight common system for MC event generator validation analyses, particularly as the core histogramming system in Rivet.

YODA is a refreshingly clean, natural and powerful way to do histogramming... and there are plenty of improvements still to come. Our mission is to make the most powerful, expressive, and focused approach to binned computational data handling, with the nicest possible balance of power and simplicity in the user interface. We hope you'll agree it's a good thing, but if not (or even if so) please get in touch and let us know about your thoughts, problems, and feature requests.

2018-08-14: YODA release 1.7.1

The 1.7 YODA release series sets in place several forward compatibility features with the eventual YODA 2.0 series, such as an explicit yoda1 Python module, use of YAML for persistent annotations, and a persistency versioning system (now on v2). YODA now also supports zipped reading and writing of data files/streams. In addition there have been many small bugfixes and script/API improvements. YODA 1.7.1 supports multiple error bars on Scatter points. See the ChangeLog for details.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into an new version asap.

Get YODA now!

Features

The key features of YODA are as follows:

  • Storage of all information needed for statistically correct run combination and reweighting up to second-order correlations (e.g. variances, std devs, etc.) not just in the number of entries in a bin, but also the correlations of that with the x and y fill values.
  • Separation of statistics and data handling from presentation. YODA is primarily a library for doing the data part correctly: while we love really high quality data presentation, that's a separate goal.
  • A sensible class hierarchy for histogramming, recognising that a histogram contains details of fill history beyond the pure visual height of a bin, and that just counting weights, or binning arbitrary types on an axis are valuable operations.
  • Flexible data format support, including a new text-based, compact, and human-readable YODA format.
  • Proper and convenience treatment of "details" like irregular bin widths, gaps in contiguous binning, and overflows/underflows/etc. (incuding how they impact normalisation and calculation of histo-wide stat quantities)
  • Carefully designed programming interfaces in C++ and Python. We are very welcoming of feedback and design evolution, too!

Development areas

Several feature areas are planned for extension and redevelopment. Please get in touch if you can contribute code or design ideas to these:

  • Plotting: plotting is currently included as a rudimentary matplotlib interface in yoda.plot, but is mostly performed via the Rivet make-plots scripts. These will be incorporated into Rivet, and the YODA type interfaces optimised for plotting via matplotlib, e.g. by adding methods which return exactly the plot-point arrays required for various mpl structures. For now, this script may be a helpful starting point for matplotlib plotting of YODA data types.
  • Generalised binned data types: histograms are currently the only binned data type, but can only be assembled through weighted fills. This means that data types such as reference data (and error) storage need to use Scatters and then manually match bin edges to scatter errors: yuck. YODA will be redeveloped so that histograms are built on a generic "binned storage" type, which can contain any sort of object -- and perform higher-level operations such as bin-merging if the stored type supports a merge operation.
  • First-class overflows: overflow bins are not currently treated as bins, since they have no meaningful width and therefore no meaningful "height" measure. Overflows are also second-class citizens in 2D (and higher) histograms, since a full indexing scheme is needed for them, in order to combine them with "real" bins when projecting or profiling in either variable. YODA will be extended with a binning scheme which always covers the whole real-number space in each variable, making overflows fully fledged bins and enabling more complex operations in N-dim space.

Previous releases

2017-12-21: YODA release 1.7.0

2017-06-18: YODA release 1.6.7

2016-12-13: YODA release 1.6.6
2016-09-28: YODA release 1.6.5
2016-09-25: YODA release 1.6.4
2016-08-09: YODA release 1.6.3
2016-07-06: YODA release 1.6.2
2016-04-20: YODA release 1.6.1
2016-04-20: YODA release 1.6.0

The 1.6 YODA release series moves the codebase to use C++11 and eliminate dependence on the Boost library. We also now return NaNs from invalid statistical computations, to allow the user to choose how to handle the result -- matplotlib will by default mask plot elements with NaN values, for example. The C++ I/O interface has been generalised in neat ways, and several minor bug fixes have also made their way in. See the ChangeLog for details.

2016-03-09: YODA release 1.5.9

2015-12-21: YODA release 1.5.8
2015-12-13: YODA release 1.5.7
2015-11-22: YODA release 1.5.6
2015-10-07: YODA release 1.5.5
2015-10-06: YODA release 1.5.4
2015-09-23: YODA release 1.5.3
2015-09-11: YODA release 1.5.2
2015-09-03: YODA release 1.5.1
2015-08-28: YODA release 1.5.0

The 1.5 release adds several new convenient ways to read and write generic collections of analysis objects, simplifies and improves the YODA and FLAT format parsers (you can now read Scatter3Ds... at last!), and fixes a few rare issues in histogram division and the Python interface. The new I/O methods require Boost 1.48 or later.

The latest patch release also includes major speed improvements in the new parser, improved 1D axis rebinning tools, and better conversion routines between YODA and ROOT objects. See the ChangeLog for details.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into an new version asap.

2015-07-01: YODA release 1.4.0

YODA 1.4.0 is now available!

This release cleans up some – but not all! – design errors that we made early on in YODA development, such as arithmetic operations on Scatters, which assumed special meanings of the X and Y axes. We've also improved many mappings of functions from C++ to the Python interface and increased the function coverage: much thanks to Adrian Buzatu for providing a comprehensive list of unmapped functions. The yodamerge script has also been improved following a lot of discussion, and the Python read() functions now allow "patterns" and "antipatterns" optional arguments to only load a subset of the analysis objects in a file, via path regexes.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into a version 1.4.1 in time for the Rivet 2.3.0 release by the end of July 2015.

2015-03-19: YODA release 1.3.1

YODA 1.3.1 is now available!

This is mainly a bugfix and minor improvement release, affecting internals such as how overflow bin filling is triggered, bin edge treatment, Python interface improvements, and better script functionality. A major change is a new yoda.plotting Python sub-module which adds preliminary plotting functionality via the matplotlib library -- we expect to extend and improve this substantially in future releases. New yodacmp and yodaplot scripts are provided, which make use of this module. The yodascale script is now also much more powerful, allowing scaling and normalisation specific to histogram path patterns and optional bin ranges.

Compared to the 1.2.x releases, 1.3.x also provides an efficiency method for 2D histograms, fixes statistical sanity-checking logic in efficiency calculating routines, adds two-arg setting methods for 3D points, and a few other changes.

2014-09-30: YODA release 1.3.0

YODA 1.3.0 is now available!

This release provides an efficiency method for 2D histograms, fixes statistical sanity checking logic in efficiency calculating routines, adds two-arg setting methods for 3D points, and a few other changes. The major version number reflects the still-growing API. Have fun!

2014-09-01: YODA release 1.2.1

This release includes a significant bug fix for a problem introduced in 1.2.0 for binnings starting below zero. It should only have turned up as a performance hit, but did appear when running code in some special instrumented modes. This version also restricts direct access to bins to be read-only, to avoid direct calls to fill() leaving the histogram in an inconsistent state. The 1.2 series also contains more improvements to the API, a Scatter1D type, and read/write support for Scatter1D and Counter. I/O support for Scatter3D and for 2D outflows will come as soon as possible.


Currently everything is being done on the bug tracker and wiki, and is documented via Doxygen (class documentation, sure, but also design discussion and motivation). See the left-hand nav menu for links.