****************** Analyzing PHA data ****************** As mentioned in :ref:`filter_enhance`, PHA datasets add extra features to the standard Sherpa data class. These are: * creating a PHA data set requires significantly more metadata than other data classes, which is automatically handled when reading the data from a FITS file that follows the :term:`OGIP` standards, but have to be explicitly set when creating a :py:class:`~sherpa.astro.data.DataPHA` object; * the :py:class:`sherpa.astro.data.DataPHA` class is derived from :py:class:`sherpa.data.Data1D`, since the "raw" data being fit is channel number and counts, but much analysis requires an "integrated" data set; * a source region can also have one or more associated background regions which provide an estimate of the background contamination in the source region and this can either be subtracted from the data or a separate model fit to it; * analysis can be done using one of three "coordinates" - `"channel"`, `"energy"`, or `"wavelength"` - that control the plot appearance and units for the :py:meth:`~sherpa.astro.data.DataPHA.notice` and :py:meth:`~sherpa.astro.data.DataPHA.ignore` calls; * data can be dynamically regrouped to change the signal-to-noise during the analysis; * and model evaluation uses classes related to the :term:`ARF` and :term:`RMF` to convert between channels and energies. The :ref:`model_evaluate_example_pha` section of the :doc:`model evaluation <../evaluation/index>` page shows how many of these features work, while below we focus on an overview of these changes. The following imports have been made:: >>> import numpy as np >>> from matplotlib import pyplot as plt >>> from sherpa.astro.data import DataPHA Creating a DataPHA object ========================= The :py:func:`sherpa.astro.io.read_pha` routine will read in a PHA FITS file:: >>> from sherpa.astro.io import read_pha >>> pha = read_pha('3c273.pi') WARNING: systematic errors were not found in file '3c273.pi' statistical errors were found in file '3c273.pi' but not used; to use them, re-read with use_errors=True read ARF file 3c273.arf read RMF file 3c273.rmf WARNING: systematic errors were not found in file '3c273_bg.pi' statistical errors were found in file '3c273_bg.pi' but not used; to use them, re-read with use_errors=True read background file 3c273_bg.pi >>> print(pha) name = 3c273.pi channel = Float64[1024] counts = Float64[1024] staterror = None syserror = None bin_lo = None bin_hi = None grouping = Int16[1024] quality = Int16[1024] exposure = 38564.608926889 backscal = 2.5264364698914e-06 areascal = 1.0 grouped = True subtracted = False units = energy rate = True plot_fac = 0 response_ids = [1] background_ids = [1] .. note:: The :py:mod:`sherpa.astro.io` module requires that a FITS backend is available. The :term:`Astropy` package can be used for this. As well as reading in the data it has also automatically loaded in the background data and response information (:term:`ARF` and :term:`RMF`) that are set in this file's FITS metadata:: >>> print(pha.get_background()) name = 3c273_bg.pi channel = Float64[1024] counts = Float64[1024] staterror = None syserror = None bin_lo = None bin_hi = None grouping = Int16[1024] quality = Int16[1024] exposure = 38564.608926889 backscal = 1.872535141462e-05 areascal = 1.0 grouped = True subtracted = False units = energy rate = True plot_fac = 0 response_ids = [1] background_ids = [] >>> print(pha.get_arf()) name = 3c273.arf energ_lo = Float64[1090] energ_hi = Float64[1090] specresp = Float64[1090] bin_lo = None bin_hi = None exposure = 38564.141454905 ethresh = 1e-10 >>> print(pha.get_rmf()) name = 3c273.rmf detchans = 1024 energ_lo = Float64[1090] energ_hi = Float64[1090] n_grp = UInt64[1090] f_chan = UInt64[2002] n_chan = UInt64[2002] matrix = Float64[61834] offset = 1 e_min = Float64[1024] e_max = Float64[1024] ethresh = 1e-10 A PHA object can also be created directly - all that is needed at first are the channel and counts arrays as other metadata can be added after the :py:class:`~sherpa.astro.data.DataPHA` object has been created:: >>> chans = np.arange(1, 1025, dtype=int) >>> counts = np.ones(1024, dtype=int) >>> test = DataPHA('example', chans, counts) >>> print(test) name = example channel = Int64[1024] counts = Int64[1024] staterror = None syserror = None bin_lo = None bin_hi = None grouping = None quality = None exposure = None backscal = None areascal = None grouped = False subtracted = False units = channel rate = True plot_fac = 0 response_ids = [] background_ids = [] Visualizing the data ==================== The :py:mod:`sherpa.astro.plot` module contains classes for visualizing the data (the :doc:`visualization section <../plots/index>` provides more information on how to use the plot classes), so we can visualize the data with:: >>> from sherpa.astro.plot import DataPHAPlot >>> plot = DataPHAPlot() >>> plot.histo_prefs['linestyle'] = '-' >>> plot.prepare(pha) >>> plot.plot() .. image:: ../_static/data/pha_initial.png .. note:: The :py:mod:`sherpa.astro.plot` module requires that a plotting backend is available. The :term:`matplotlib` package can be used for this. The default plot style has been adjusted to also include the bin edges (by setting the matplotlib ``linestyle`` option). .. _analysis_units: Analysis units ============== The "native" units for analyzing PHA data is in channel space, but this is often not the most instructive, so normally the analysis is done with energy or wavelength units. If a PHA file is loaded in from disk and contains a response then it will default to energy units, otherwise it will use channel units. For this example, the ``3c273.pi`` file includes response information in its header - via the ``RESPFILE`` and ``ANCRFILE`` keywords - and so the selected units are ``energy``, which can be found using either the :py:meth:`~sherpa.astro.data.DataPHA.get_analysis` method or the :py:attr:`~sherpa.astro.data.DataPHA.units` attribute:: >>> pha.get_analysis() 'energy' >>> print(pha.units) energy The :py:attr:`~sherpa.astro.data.DataPHA.set_analysis` method or :py:attr:`~sherpa.astro.data.DataPHA.units` attribute can be used to switch between ``channel``, ``energy``, and ``wavelength`` (the latter two will raise an error if no response has been set). .. _pha_filtering: Filtering ========= The default is for all the data to be included and, because the file contained grouping information, the data has been grouped (this is shown in the plot above, where there are only of order 50 data points shown rather than the 1024 channels this data set has):: >>> print(pha.mask) True >>> print(pha.grouped) True The "dependent axis" - so in this case, the counts - can be retrieved with the :py:meth:`~sherpa.astro.data.DataPHA.get_dep` method, and we can see the difference the ``filter`` flag makes:: >>> y1 = pha.get_dep() >>> y2 = pha.get_dep(filter=True) >>> print(y1.size) 1024 >>> print(y2.size) 46 So when `filter=False` (the default) then the ungrouped data is returned, but when `filter=True` the grouped data is returned. Since the analysis units for this data set are `energy`, we can select a subset of the data, which means that the grouped data size is reduced, as the points with energies below 0.5 keV or above 7 keV have been removed:: >>> pha.notice(0.5, 7) >>> print(pha.get_dep(filter=True).size) 42 >>> plot.prepare(pha) >>> plot.plot() .. image:: ../_static/data/pha_filtered.png Although the requested range was 0.5 to 7.0 keV, the selected range is wider, as shown above and with the :py:meth:`~sherpa.astro.data.DataPHA.get_filter` method:: >>> print(pha.get_filter()) 0.467200011015:9.869600296021 .. note:: Each channel covers a finite energy range, and so when determining what value to display, the :py:meth:`~sherpa.astro.data.DataPHA.get_filter` call uses the full range (this was changed in Sherpa 4.14.0, in earlier versions the mid-point was used so the expression would appear to cover a smaller range but would still reflect the same filter). The reason for this change is two fold: * as mentioned, each channel has a finite energy range, so the selected energy range is unlikely to exactly match the requested range, * and thanks to grouping, the selected channel is unlikely to fall at the start (for the low limit) and end (for the high limit) values for the groups, so this further changes the selected limit range. Consider the following highly-simplified case where there are 7 channels that have been grouped into 4 bins. +---------+-------+--------------------+ | Channel | Group | Energy range (keV) | +=========+=======+====================+ | 1 | 1 | 0.10 - 0.11 | +---------+ +--------------------+ | 2 | | 0.11 - 0.14 | +---------+-------+--------------------+ | 3 | 2 | 0.14 - 0.16 | +---------+ +--------------------+ | 4 | | 0.16 - 0.20 | +---------+-------+--------------------+ | 5 | 3 | 0.20 - 0.22 | +---------+ +--------------------+ | 6 | | 0.22 - 0.24 | +---------+-------+--------------------+ | 7 | 4 | 0.24 - 0.26 | +---------+-------+--------------------+ In this case a filter to notice the range 0.15 to 0.21 keV would select channels 3 to 5 and then end up selecting groups 2 and 3, with the final channel selection being 3 to 6. A call to then ignore the range 0.18 to 0.19 keV would select channel 4 and hence group 2, so resulting in a final filter of just group 3 (channels 5 to 6). We can switch temporarily to channel units and see differences in the ``get_filter`` call and the plot:: >>> pha.units = 'channel' >>> print(pha.get_filter()) 33:676 >>> plot.prepare(pha) >>> plot.plot(xlog=True, ylog=True) >>> pha.units = 'energy' .. image:: ../_static/data/pha_filtered_channel.png .. note:: The counts (dependent) axis is drawn with a logarithmic scale primarily because the values are small enough that the Y-axis label would disappear with a linear scale. The channel (independent) axis has been drawn with a log scale because the effective area of this instrument is higher at lower energies which tends to result in smaller groups at low channel values. Filtering and limits -------------------- When using energy or wavelength units - e.g. with :py:meth:`~sherpa.astro.data.DataPHA.set_analysis`, as :ref:`described above ` - the meaning of the arguments to :py:meth:`~sherpa.astro.data.DataPHA.notice` and :py:meth:`~sherpa.astro.data.DataPHA.ignore` are slightly different than when using channel units. When using channels: - the low and high limits must be integers, - and the limits are inclusive on both ends (so ``[lo, hi]``) whereas for energy and wavelength analysis - the low and high limits must be >= 0, - and the limits are only inclusive on the lower value (so ``[lo, hi)``, that is a `half-open interval `_). That is (for an ungrouped dataset):: >>> pha.notice() >>> pha.units = 'channel' >>> pha.notice(20, 200) will select the channels 20 to 200 (inclusive), but >>> pha.notice() >>> pha.units = 'energy' >>> pha.notice(0.5, 7.0) will only select those bins which lie in the range ``0.5 <= energy < 7.0`` (although as :ref:`just explained `, the finite width of each channel in energy or wavelength units means that the distinction between ``0.5 <= energy < 7.0`` and ``0.5 <= energy <= 7.0`` makes no difference unless the bin edge is at 7 keV). .. note:: The PHA filtering in Sherpa 4.14.0 has been updated to fix a number of corner cases which can result in filter expressions changing the first or last selected bin. This can then cause fit differences as the number of degrees of freedom and the fit parameters can change slightly. Grouping ======== The dynamic grouping can be changed by setting the :py:attr:`~sherpa.astro.data.DataPHA.grouping` attribute - and then calling the :py:meth:`~sherpa.astro.data.DataPHA.group` if necessary - or with one of the dynamic-routines methods: :py:meth:`~sherpa.astro.data.DataPHA.group_adapt`, :py:meth:`~sherpa.astro.data.DataPHA.group_adapt_snr`, :py:meth:`~sherpa.astro.data.DataPHA.group_bins`, :py:meth:`~sherpa.astro.data.DataPHA.group_counts`, :py:meth:`~sherpa.astro.data.DataPHA.group_snr`, and :py:meth:`~sherpa.astro.data.DataPHA.group_width`. For this example we will compare the same data with different grouping schemes by loading the data in three times. The :py:class:`~sherpa.utils.logging.SherpaVerbosity` class is used to temporarily hide the output of :py:func:`~sherpa.astro.io.read_pha`:: >>> from sherpa.utils.logging import SherpaVerbosity >>> with SherpaVerbosity('ERROR'): ... pha1 = read_pha('3c273.pi') ... pha2 = read_pha('3c273.pi') ... pha3 = read_pha('3c273.pi') The same energy range will be used for all three data sets:: >>> pha1.notice(0.5, 7) >>> pha2.notice(0.5, 7) >>> pha3.notice(0.5, 7) The first data set is to be ungrouped, the second data set will use the on-disk grouping settings, and the third data set is grouped so that each bin contains at least 40 counts:: >>> pha1.ungroup() >>> pha3.group_counts(40) For display the ungrouped data is shown in a separate plot as it makes it easier to compare:: >>> plt.subplot(2, 1, 1) >>> plot.prepare(pha1) >>> plot.plot(clearwindow=False) >>> plt.subplot(2, 1, 2) >>> plot.prepare(pha2) >>> plot.plot(xlog=True, alpha=0.7, clearwindow=False) >>> plot.prepare(pha3) >>> plot.overplot(alpha=0.7) >>> plt.title('') >>> plt.subplots_adjust(hspace=0.4) .. image:: ../_static/data/pha_grouping_comparison.png In general, as the grouped bins become larger then the difference of the filtered range to the requested range becomes larger:: >>> print(pha1.get_filter()) 0.467200011015:9.869600296021 >>> print(pha2.get_filter()) 0.467200011015:9.869600296021 >>> print(pha3.get_filter()) 0.394199997187:14.950400352478 Manipulating data ================= Methods like :py:meth:`~sherpa.astro.data.DataPHA.get_dep` will apply the necessary grouping and filters to the data. It can be useful to convert other arrays - such as counts or energy bins - directly, which can be done with :py:meth:`~sherpa.astro.data.DataPHA.apply_filter` and :py:meth:`~sherpa.astro.data.DataPHA.apply_grouping`. The default behavior for ``apply_filter`` is to sum the data values with each group, so we can re-create the ``get_dep`` call:: >>> d1 = pha.get_dep(filter=True) >>> d2 = pha.apply_filter(pha.counts) >>> np.all(d1 == d2) True The behavior can be changed with the ``groupfunc`` argument, which takes a limited set of functions that describe how the data within a group is combined (the default is ``np.sum``). For instance, the first and last channel value of each group can be calculated with:: >>> clo = pha.apply_filter(pha.channel, groupfunc=pha._min) >>> chi = pha.apply_filter(pha.channel, groupfunc=pha._max) >>> clo[0:7] [33. 40. 45. 49. 52. 55. 57.] >>> chi[0:7] [39. 44. 48. 51. 54. 56. 59.] The :py:meth:`~sherpa.astro.data.DataPHA.apply_grouping` method is similar but it does not apply any filter, so all channels are used. So to get the group boundaries for all channels, not just the filtered ones, we can say:: >>> alo = pha.apply_grouping(pha.channel, pha._min) >>> ahi = pha.apply_grouping(pha.channel, pha._max) >>> alo[0:7] [ 1. 18. 22. 33. 40. 45. 49.] >>> ahi[0:7] [17. 21. 32. 39. 44. 48. 51.] Background ========== A PHA data set may have one or more associated background data sets. For this example there is one, and the :py:meth:`~sherpa.astro.data.DataPHA.get_background` method will return a :py:class:`~sherpa.astro.data.DataPHA` object representing the background region. >>> print(pha.background_ids) [1] >>> bkg = pha.get_background() >>> print(bkg) name = 3c273_bg.pi channel = Float64[1024] counts = Float64[1024] staterror = None syserror = None bin_lo = None bin_hi = None grouping = Int16[1024] quality = Int16[1024] exposure = 38564.608926889 backscal = 1.872535141462e-05 areascal = 1.0 grouped = True subtracted = False units = energy rate = True plot_fac = 0 response_ids = [1] background_ids = [] .. note:: In this example the background data has the same exposure time as the source, which is often the case (the source and background spectra are extracted from the same event file), but this does not need to hold. Often all that is done is to subtract the background from the source data, which is achieved with the :py:meth:`~sherpa.astro.data.DataPHA.subtract` method, but you can instead fit a model to just the background data, and have this then included in the source region (with appropriate scaling to account for differences in the source and background apertures). Filtering and grouping changes to a source region are automatically propagated to the associated background regions, but they can be applied to the background data set directly if needed. For example, the un-subtracted group counts are:: >>> pha.get_dep(filter=True) [15. 16. 15. 18. 18. 15. 18. 15. 15. 19. 15. 15. 17. 16. 16. 17. 15. 19. 15. 16. 15. 16. 17. 15. 18. 16. 15. 15. 16. 15. 15. 15. 16. 16. 15. 15. 16. 16. 15. 16. 15. 15.] and after background subtractuion the values are:: >>> pha.subtract() >>> pha.get_dep(filter=True) [14.86507936 15.86507936 15. 18. 18. 14.86507936 17.86507936 15. 14.59523807 18.86507936 14.86507936 14.86507936 17. 16. 15.73015871 16.73015871 15. 18.59523807 14.86507936 15.59523807 14.86507936 16. 16.59523807 15. 17.59523807 16. 15. 14.46031742 16. 14.59523807 14.59523807 14.46031742 15.86507936 15.59523807 14.32539678 14.32539678 15.32539678 15.19047614 14.32539678 15.19047614 14.05555549 12.43650777]