-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Is your feature request related to a problem? Please describe.
Jupyter notebooks support multiple output formats for cell execution; depending on what's desired, one or more different entries in the data section, identified by MIME type, will contain the output in the corresponding format.
This is useful in general, but becomes critical when using any document production system that uses Jupyter kernels as the engine to create different types of documents (i.e. most of the solutions that fall within the "literate programming" approach): they will pick HTML if the desired output is HTML, and pick e.g. a PNG representation of a plot if the output is a PDF.
I will use Quarto as an example, but the idea is generic and applicable to other solutions. Consider the following source document:
---
title: "Quarto and SAS"
format:
html:
code-fold: true
engine: jupyter
---
Define a dataset:
```{sas}
*| output: false;
*| echo: true;
data grade;
input subject gender $
exam1 exam2 hwgrade $;
datalines;
10 m 80 84 a
7 . 85 89 a
4 f 90 . b
20 m 82 85 b
26 f 94 94 a
11 f 88 84 c
;
run;
```
Print it:
```{sas}
proc print data=grade;
var subject gender; * print student id and gender;
run;
```
Plot it:
```{sas}
ods graphics on / width=3in;
proc sgplot data=grade;
hbar gender / response=exam1 stat=mean datalabel categoryorder=respdesc;
run;
```This works without any change for HTML output, because sas_kernel uses HTML(output), which automatically creates a text/html entry in the outputs array, and the HTML target makes use of HTML.
$ quarto preview sas.qmd --to htmlThis doesn't work if we specify PDF as the output, because the toolchain (in the case of Quarto, using pandoc and LaTeX) will have no way to render the HTML, and there is no alternative representation:
$ quarto preview sas.qmd --to pdfThe table works because Quarto has some automation that parses HTML tables and converts them to LaTeX, but it isn't able to convert the HTML plot into something that can be included in the PDF.
Quarto here is, and I must stress this, just an example: in general, the ability to have more outputs in the MIME bundle of the Jupyter cell outputs will be usable by any other tool.
Describe the solution you'd like
The solution I would like is different from the one I have prototyped: the one I think would likely be better would be to change things at the SASpy level to make use of the extremely rich capabilities of ODS, allowing specifying other output formats at that level, which would then be used in sas_kernel.
That said, I've quickly made something to show how this could work by making changes solely on sas_kernel, after studying the code and the use of MetaKernel: MetaKernel has some plumbing in place at the _formatter method to go through methods of an object and create the necessary outputs. I've created a SASOutput class that implements _repr_png_ and _repr_latex.
class SASOutput(object):
def __init__(self, data):
self.data = data
def __repr__(self):
try:
soup = BS(self.data)
return soup.get_text()
except:
return HTML(self.data)
def _repr_html_(self):
return self.data
def _repr_png_(self):
d = self.data
try:
soup = BS(d, 'html.parser')
img_tag = soup.find('img')
base64_data = img_tag['src'].split(',')[1]
return base64_data
except:
return None
def _repr_latex_(self):
start_marker = r'\\documentclass\[10pt\]{article}'
end_marker = r'\\end{document}'
match = re.search(f'{start_marker}(.*?){end_marker}', self.data, re.DOTALL)
if match:
latex_output = match.group()
return latex_output
else:
return None(consider this code a MVP and not something that I am proposing as a PR, this is to illustrate the possibilities more than anything)
This assumes that the input is HTML, which seems to always be the case in SASpy. With this, the previous PDF example works, because the sas.ipynb that is created by Quarto contains a text/png with the plot (it would also output it for "regular" Jupyter notebook usage, but Jupyter would prefer the HTML version).
The .ipynb will contain the different formats, so the existing behaviour (text/html) would be unchanged:
{
"cell_type": "code",
"execution_count": 3,
"id": "a6bf524a",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAA(...rest of base64 data)=",
"text/html": [
"<!DOCTYPE html>\n",
"<html lang=\"en\" xml:lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\">\n",
(...)
]
},Describe alternatives you've considered
As mentioned, this approach makes use of how SASpy currently works, which seems to hardcode ods html5 for non-text output (I could be completely wrong here, I'm basing my assertion from this documentation). The LaTeX parser above cuts the return LaTeX that is present in the middle of a lot of HTML code, for example. Making it possible to specify the desired output format through SASpy would likely be better, given some way of specifying the desired format. Currently, using things like ods latex in the cells will give the expected output in the middle of HTML code, but I haven't tested this extensively.
Ideally, we would also be able to pass additional formatting options down the line: things like the plot title, the image width, they are generally implemented in a consistent way in this sort of tools so that the same syntax can be used. For Quarto, execution options support things like fig-width that are applied to R, Python, and Julia - this must be supported in Quarto itself, but it requires a way to pass that information.
Additional context
This is something that I've been using/following for a while and has several previous references, and there seems to be a growing interest.
- Initial discussion on SASpy about supporting non-HTML targets: Support for Markdown, Asciidoc and other document markup languages saspy#412
- Another comment in sas_kernel related to PDF export: Enabling inline for SAS outputs in notebook #13
- Request for specific outputs for use with Quarto: Extending the SAS kernel to collapse the log like Quarto can do with code chunks #86


