diff --git a/HISTORY.rst b/HISTORY.rst index 925cd95be..a51e27043 100644 --- a/HISTORY.rst +++ b/HISTORY.rst @@ -3,6 +3,104 @@ Release History --------------- +0.8.5 (2015-02-21) +++++++++++++++++++ + +- Fix #149: KeyError on Document.add_table() +- Fix #78: feature: add_table() sets cell widths +- Add #106: feature: Table.direction (i.e. right-to-left) +- Add #102: feature: add CT_Row.trPr + + +0.8.4 (2015-02-20) +++++++++++++++++++ + +- Fix #151: tests won't run on PyPI distribution +- Fix #124: default to inches on no TIFF resolution unit + + +0.8.3 (2015-02-19) +++++++++++++++++++ + +- Add #121, #135, #139: feature: Font.color + + +0.8.2 (2015-02-16) +++++++++++++++++++ + +- Fix #94: picture prints at wrong size when scaled +- Extract `docx.document.Document` object from `DocumentPart` + + Refactor `docx.Document` from an object into a factory function for new + `docx.document.Document object`. Extract methods from prior `docx.Document` + and `docx.parts.document.DocumentPart` to form the new API class and retire + `docx.Document` class. + +- Migrate `Document.numbering_part` to `DocumentPart.numbering_part`. The + `numbering_part` property is not part of the published API and is an + interim internal feature to be replaced in a future release, perhaps with + something like `Document.numbering_definitions`. In the meantime, it can + now be accessed using ``Document.part.numbering_part``. + + +0.8.1 (2015-02-10) +++++++++++++++++++ + +- Fix #140: Warning triggered on Document.add_heading/table() + + +0.8.0 (2015-02-08) +++++++++++++++++++ + +- Add styles. Provides general capability to access and manipulate paragraph, + character, and table styles. + +- Add ParagraphFormat object, accessible on Paragraph.paragraph_format, and + providing the following paragraph formatting properties: + + + paragraph alignment (justfification) + + space before and after paragraph + + line spacing + + indentation + + keep together, keep with next, page break before, and widow control + +- Add Font object, accessible on Run.font, providing character-level + formatting including: + + + typeface (e.g. 'Arial') + + point size + + underline + + italic + + bold + + superscript and subscript + +The following issues were retired: + +- Add feature #56: superscript/subscript +- Add feature #67: lookup style by UI name +- Add feature #98: Paragraph indentation +- Add feature #120: Document.styles + +**Backward incompatibilities** + +Paragraph.style now returns a Style object. Previously it returned the style +name as a string. The name can now be retrieved using the Style.name +property, for example, `paragraph.style.name`. + + +0.7.6 (2014-12-14) +++++++++++++++++++ + +- Add feature #69: Table.alignment +- Add feature #29: Document.core_properties + + +0.7.5 (2014-11-29) +++++++++++++++++++ + +- Add feature #65: _Cell.merge() + + 0.7.4 (2014-07-18) ++++++++++++++++++ diff --git a/MANIFEST.in b/MANIFEST.in index 2c4f97c0d..6419bc8a0 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,5 +1,5 @@ include HISTORY.rst LICENSE README.rst tox.ini -include tests/*.py +recursive-include tests *.py recursive-include features * recursive-include docx/templates * recursive-include tests/test_files * diff --git a/docs/api/dml.rst b/docs/api/dml.rst new file mode 100644 index 000000000..79b314844 --- /dev/null +++ b/docs/api/dml.rst @@ -0,0 +1,16 @@ + +.. _dml_api: + +DrawingML objects +================= + +Low-level drawing elements like color that appear in various document +contexts. + + +|ColorFormat| objects +--------------------- + +.. autoclass:: docx.dml.color.ColorFormat() + :members: + :undoc-members: diff --git a/docs/api/document.rst b/docs/api/document.rst index accab05b3..8ab9ecfe4 100644 --- a/docs/api/document.rst +++ b/docs/api/document.rst @@ -7,24 +7,111 @@ Document objects The main Document and related objects. -.. currentmodule:: docx.api +|Document| constructor +---------------------- + +.. autofunction:: docx.Document |Document| objects ------------------ - -.. autoclass:: Document +.. autoclass:: docx.document.Document() :members: - :exclude-members: numbering_part, styles_part + :exclude-members: styles_part -.. currentmodule:: docx.parts.document +|CoreProperties| objects +------------------------- +Each |Document| object provides access to its |CoreProperties| object via its +:attr:`core_properties` attribute. A |CoreProperties| object provides +read/write access to the so-called *core properties* for the document. The +core properties are author, category, comments, content_status, created, +identifier, keywords, language, last_modified_by, last_printed, modified, +revision, subject, title, and version. -|Sections| objects ------------------- +Each property is one of three types, |str|, |datetime|, or |int|. String +properties are limited in length to 255 characters and return an empty string +('') if not set. Date properties are assigned and returned as |datetime| +objects without timezone, i.e. in UTC. Any timezone conversions are the +responsibility of the client. Date properties return |None| if not set. +|docx| does not automatically set any of the document core properties other +than to add a core properties part to a presentation that doesn't have one +(very uncommon). If |docx| adds a core properties part, it contains default +values for the title, last_modified_by, revision, and modified properties. +Client code should update properties like revision and last_modified_by +if that behavior is desired. -.. autoclass:: Sections - :members: +.. currentmodule:: docx.opc.coreprops + +.. class:: CoreProperties + + .. attribute:: author + + *string* -- An entity primarily responsible for making the content of the + resource. + + .. attribute:: category + + *string* -- A categorization of the content of this package. Example + values might include: Resume, Letter, Financial Forecast, Proposal, + or Technical Presentation. + + .. attribute:: comments + + *string* -- An account of the content of the resource. + + .. attribute:: content_status + + *string* -- completion status of the document, e.g. 'draft' + + .. attribute:: created + + *datetime* -- time of intial creation of the document + + .. attribute:: identifier + + *string* -- An unambiguous reference to the resource within a given + context, e.g. ISBN. + + .. attribute:: keywords + + *string* -- descriptive words or short phrases likely to be used as + search terms for this document + + .. attribute:: language + + *string* -- language the document is written in + + .. attribute:: last_modified_by + + *string* -- name or other identifier (such as email address) of person + who last modified the document + + .. attribute:: last_printed + + *datetime* -- time the document was last printed + + .. attribute:: modified + + *datetime* -- time the document was last modified + + .. attribute:: revision + + *int* -- number of this revision, incremented by Word each time the + document is saved. Note however |docx| does not automatically increment + the revision number when it saves a document. + + .. attribute:: subject + + *string* -- The topic of the content of the resource. + + .. attribute:: title + + *string* -- The name given to the resource. + + .. attribute:: version + + *string* -- free-form version string diff --git a/docs/api/enum/MsoColorType.rst b/docs/api/enum/MsoColorType.rst new file mode 100644 index 000000000..62a94d6aa --- /dev/null +++ b/docs/api/enum/MsoColorType.rst @@ -0,0 +1,23 @@ +.. _MsoColorType: + +``MSO_COLOR_TYPE`` +================== + +Specifies the color specification scheme + +Example:: + + from docx.enum.dml import MSO_COLOR_TYPE + + assert font.color.type == MSO_COLOR_TYPE.THEME + +---- + +RGB + Color is specified by an |RGBColor| value. + +THEME + Color is one of the preset theme colors. + +AUTO + Color is determined automatically be the application. diff --git a/docs/api/enum/MsoThemeColorIndex.rst b/docs/api/enum/MsoThemeColorIndex.rst new file mode 100644 index 000000000..02436f2c1 --- /dev/null +++ b/docs/api/enum/MsoThemeColorIndex.rst @@ -0,0 +1,71 @@ +.. _MsoThemeColorIndex: + +``MSO_THEME_COLOR_INDEX`` +========================= + +Indicates the Office theme color, one of those shown in the color gallery on +the formatting ribbon. + +Alias: ``MSO_THEME_COLOR`` + +Example:: + + from docx.enum.dml import MSO_THEME_COLOR + + font.color.theme_color = MSO_THEME_COLOR.ACCENT_1 + +---- + +NOT_THEME_COLOR + Indicates the color is not a theme color. + +ACCENT_1 + Specifies the Accent 1 theme color. + +ACCENT_2 + Specifies the Accent 2 theme color. + +ACCENT_3 + Specifies the Accent 3 theme color. + +ACCENT_4 + Specifies the Accent 4 theme color. + +ACCENT_5 + Specifies the Accent 5 theme color. + +ACCENT_6 + Specifies the Accent 6 theme color. + +BACKGROUND_1 + Specifies the Background 1 theme color. + +BACKGROUND_2 + Specifies the Background 2 theme color. + +DARK_1 + Specifies the Dark 1 theme color. + +DARK_2 + Specifies the Dark 2 theme color. + +FOLLOWED_HYPERLINK + Specifies the theme color for a clicked hyperlink. + +HYPERLINK + Specifies the theme color for a hyperlink. + +LIGHT_1 + Specifies the Light 1 theme color. + +LIGHT_2 + Specifies the Light 2 theme color. + +TEXT_1 + Specifies the Text 1 theme color. + +TEXT_2 + Specifies the Text 2 theme color. + +MIXED + Indicates multiple theme colors are used. diff --git a/docs/api/enum/WdBuiltinStyle.rst b/docs/api/enum/WdBuiltinStyle.rst new file mode 100644 index 000000000..b7aa682d4 --- /dev/null +++ b/docs/api/enum/WdBuiltinStyle.rst @@ -0,0 +1,415 @@ +.. _WdBuiltinStyle: + +``WD_BUILTIN_STYLE`` +==================== + +alias: **WD_STYLE** + +Specifies a built-in Microsoft Word style. + +Example:: + + from docx import Document + from docx.enum.style import WD_STYLE + + document = Document() + styles = document.styles + style = styles[WD_STYLE.BODY_TEXT] + +---- + +BLOCK_QUOTATION + Block Text. + +BODY_TEXT + Body Text. + +BODY_TEXT_2 + Body Text 2. + +BODY_TEXT_3 + Body Text 3. + +BODY_TEXT_FIRST_INDENT + Body Text First Indent. + +BODY_TEXT_FIRST_INDENT_2 + Body Text First Indent 2. + +BODY_TEXT_INDENT + Body Text Indent. + +BODY_TEXT_INDENT_2 + Body Text Indent 2. + +BODY_TEXT_INDENT_3 + Body Text Indent 3. + +BOOK_TITLE + Book Title. + +CAPTION + Caption. + +CLOSING + Closing. + +COMMENT_REFERENCE + Comment Reference. + +COMMENT_TEXT + Comment Text. + +DATE + Date. + +DEFAULT_PARAGRAPH_FONT + Default Paragraph Font. + +EMPHASIS + Emphasis. + +ENDNOTE_REFERENCE + Endnote Reference. + +ENDNOTE_TEXT + Endnote Text. + +ENVELOPE_ADDRESS + Envelope Address. + +ENVELOPE_RETURN + Envelope Return. + +FOOTER + Footer. + +FOOTNOTE_REFERENCE + Footnote Reference. + +FOOTNOTE_TEXT + Footnote Text. + +HEADER + Header. + +HEADING_1 + Heading 1. + +HEADING_2 + Heading 2. + +HEADING_3 + Heading 3. + +HEADING_4 + Heading 4. + +HEADING_5 + Heading 5. + +HEADING_6 + Heading 6. + +HEADING_7 + Heading 7. + +HEADING_8 + Heading 8. + +HEADING_9 + Heading 9. + +HTML_ACRONYM + HTML Acronym. + +HTML_ADDRESS + HTML Address. + +HTML_CITE + HTML Cite. + +HTML_CODE + HTML Code. + +HTML_DFN + HTML Definition. + +HTML_KBD + HTML Keyboard. + +HTML_NORMAL + Normal (Web). + +HTML_PRE + HTML Preformatted. + +HTML_SAMP + HTML Sample. + +HTML_TT + HTML Typewriter. + +HTML_VAR + HTML Variable. + +HYPERLINK + Hyperlink. + +HYPERLINK_FOLLOWED + Followed Hyperlink. + +INDEX_1 + Index 1. + +INDEX_2 + Index 2. + +INDEX_3 + Index 3. + +INDEX_4 + Index 4. + +INDEX_5 + Index 5. + +INDEX_6 + Index 6. + +INDEX_7 + Index 7. + +INDEX_8 + Index 8. + +INDEX_9 + Index 9. + +INDEX_HEADING + Index Heading + +INTENSE_EMPHASIS + Intense Emphasis. + +INTENSE_QUOTE + Intense Quote. + +INTENSE_REFERENCE + Intense Reference. + +LINE_NUMBER + Line Number. + +LIST + List. + +LIST_2 + List 2. + +LIST_3 + List 3. + +LIST_4 + List 4. + +LIST_5 + List 5. + +LIST_BULLET + List Bullet. + +LIST_BULLET_2 + List Bullet 2. + +LIST_BULLET_3 + List Bullet 3. + +LIST_BULLET_4 + List Bullet 4. + +LIST_BULLET_5 + List Bullet 5. + +LIST_CONTINUE + List Continue. + +LIST_CONTINUE_2 + List Continue 2. + +LIST_CONTINUE_3 + List Continue 3. + +LIST_CONTINUE_4 + List Continue 4. + +LIST_CONTINUE_5 + List Continue 5. + +LIST_NUMBER + List Number. + +LIST_NUMBER_2 + List Number 2. + +LIST_NUMBER_3 + List Number 3. + +LIST_NUMBER_4 + List Number 4. + +LIST_NUMBER_5 + List Number 5. + +LIST_PARAGRAPH + List Paragraph. + +MACRO_TEXT + Macro Text. + +MESSAGE_HEADER + Message Header. + +NAV_PANE + Document Map. + +NORMAL + Normal. + +NORMAL_INDENT + Normal Indent. + +NORMAL_OBJECT + Normal (applied to an object). + +NORMAL_TABLE + Normal (applied within a table). + +NOTE_HEADING + Note Heading. + +PAGE_NUMBER + Page Number. + +PLAIN_TEXT + Plain Text. + +QUOTE + Quote. + +SALUTATION + Salutation. + +SIGNATURE + Signature. + +STRONG + Strong. + +SUBTITLE + Subtitle. + +SUBTLE_EMPHASIS + Subtle Emphasis. + +SUBTLE_REFERENCE + Subtle Reference. + +TABLE_COLORFUL_GRID + Colorful Grid. + +TABLE_COLORFUL_LIST + Colorful List. + +TABLE_COLORFUL_SHADING + Colorful Shading. + +TABLE_DARK_LIST + Dark List. + +TABLE_LIGHT_GRID + Light Grid. + +TABLE_LIGHT_GRID_ACCENT_1 + Light Grid Accent 1. + +TABLE_LIGHT_LIST + Light List. + +TABLE_LIGHT_LIST_ACCENT_1 + Light List Accent 1. + +TABLE_LIGHT_SHADING + Light Shading. + +TABLE_LIGHT_SHADING_ACCENT_1 + Light Shading Accent 1. + +TABLE_MEDIUM_GRID_1 + Medium Grid 1. + +TABLE_MEDIUM_GRID_2 + Medium Grid 2. + +TABLE_MEDIUM_GRID_3 + Medium Grid 3. + +TABLE_MEDIUM_LIST_1 + Medium List 1. + +TABLE_MEDIUM_LIST_1_ACCENT_1 + Medium List 1 Accent 1. + +TABLE_MEDIUM_LIST_2 + Medium List 2. + +TABLE_MEDIUM_SHADING_1 + Medium Shading 1. + +TABLE_MEDIUM_SHADING_1_ACCENT_1 + Medium Shading 1 Accent 1. + +TABLE_MEDIUM_SHADING_2 + Medium Shading 2. + +TABLE_MEDIUM_SHADING_2_ACCENT_1 + Medium Shading 2 Accent 1. + +TABLE_OF_AUTHORITIES + Table of Authorities. + +TABLE_OF_FIGURES + Table of Figures. + +TITLE + Title. + +TOAHEADING + TOA Heading. + +TOC_1 + TOC 1. + +TOC_2 + TOC 2. + +TOC_3 + TOC 3. + +TOC_4 + TOC 4. + +TOC_5 + TOC 5. + +TOC_6 + TOC 6. + +TOC_7 + TOC 7. + +TOC_8 + TOC 8. + +TOC_9 + TOC 9. diff --git a/docs/api/enum/WdLineSpacing.rst b/docs/api/enum/WdLineSpacing.rst new file mode 100644 index 000000000..b03e7dd17 --- /dev/null +++ b/docs/api/enum/WdLineSpacing.rst @@ -0,0 +1,36 @@ +.. _WdLineSpacing: + +``WD_LINE_SPACING`` +=================== + +Specifies a line spacing format to be applied to a paragraph. + +Example:: + + from docx.enum.text import WD_LINE_SPACING + + paragraph = document.add_paragraph() + paragraph.line_spacing_rule = WD_LINE_SPACING.EXACTLY + +---- + +ONE_POINT_FIVE + Space-and-a-half line spacing. + +AT_LEAST + Line spacing is always at least the specified amount. The amount is + specified separately. + +DOUBLE + Double spaced. + +EXACTLY + Line spacing is exactly the specified amount. The amount is specified + separately. + +MULTIPLE + Line spacing is specified as a multiple of line heights. Changing the font + size will change the line spacing proportionately. + +SINGLE + Single spaced (default). diff --git a/docs/api/enum/WdRowAlignment.rst b/docs/api/enum/WdRowAlignment.rst new file mode 100644 index 000000000..4459df5d3 --- /dev/null +++ b/docs/api/enum/WdRowAlignment.rst @@ -0,0 +1,24 @@ +.. _WdRowAlignment: + +``WD_TABLE_ALIGNMENT`` +====================== + +Specifies table justification type. + +Example:: + + from docx.enum.table import WD_TABLE_ALIGNMENT + + table = document.add_table(3, 3) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + +---- + +LEFT + Left-aligned + +CENTER + Center-aligned. + +RIGHT + Right-aligned. diff --git a/docs/api/enum/WdStyleType.rst b/docs/api/enum/WdStyleType.rst new file mode 100644 index 000000000..4a4a3213b --- /dev/null +++ b/docs/api/enum/WdStyleType.rst @@ -0,0 +1,29 @@ +.. _WdStyleType: + +``WD_STYLE_TYPE`` +================= + +Specifies one of the four style types: paragraph, character, list, or +table. + +Example:: + + from docx import Document + from docx.enum.style import WD_STYLE_TYPE + + styles = Document().styles + assert styles[0].type == WD_STYLE_TYPE.PARAGRAPH + +---- + +CHARACTER + Character style. + +LIST + List style. + +PARAGRAPH + Paragraph style. + +TABLE + Table style. diff --git a/docs/api/enum/WdTableDirection.rst b/docs/api/enum/WdTableDirection.rst new file mode 100644 index 000000000..9a7b66c45 --- /dev/null +++ b/docs/api/enum/WdTableDirection.rst @@ -0,0 +1,24 @@ +.. _WdTableDirection: + +``WD_TABLE_DIRECTION`` +====================== + +Specifies the direction in which an application orders cells in the +specified table or row. + +Example:: + + from docx.enum.table import WD_TABLE_DIRECTION + + table = document.add_table(3, 3) + table.direction = WD_TABLE_DIRECTION.RTL + +---- + +LTR + The table or row is arranged with the first column in the leftmost + position. + +RTL + The table or row is arranged with the first column in the rightmost + position. diff --git a/docs/api/enum/index.rst b/docs/api/enum/index.rst index 576f45856..cd0cba8e1 100644 --- a/docs/api/enum/index.rst +++ b/docs/api/enum/index.rst @@ -8,7 +8,14 @@ can be found here: .. toctree:: :titlesonly: + MsoColorType + MsoThemeColorIndex WdAlignParagraph + WdBuiltinStyle + WdLineSpacing WdOrientation WdSectionStart + WdStyleType + WdRowAlignment + WdTableDirection WdUnderline diff --git a/docs/api/section.rst b/docs/api/section.rst index 478f80423..9aeb6ca1a 100644 --- a/docs/api/section.rst +++ b/docs/api/section.rst @@ -1,14 +1,21 @@ .. _section_api: + Section objects =============== Provides access to section properties such as margins and page orientation. +|Sections| objects +------------------ + .. currentmodule:: docx.section +.. autoclass:: Sections + :members: + |Section| objects ----------------- diff --git a/docs/api/shape.rst b/docs/api/shape.rst index 0ce406b3d..200b34977 100644 --- a/docs/api/shape.rst +++ b/docs/api/shape.rst @@ -4,7 +4,7 @@ Shape-related objects ===================== -.. currentmodule:: docx.parts.document +.. currentmodule:: docx.shape |InlineShapes| objects @@ -12,9 +12,7 @@ Shape-related objects .. autoclass:: InlineShapes :members: - - -.. currentmodule:: docx.shape + :exclude-members: add_picture |InlineShape| objects diff --git a/docs/api/shared.rst b/docs/api/shared.rst index 161abfb4f..215e5338c 100644 --- a/docs/api/shared.rst +++ b/docs/api/shared.rst @@ -35,5 +35,25 @@ allowing values to be expressed in the units most appropriate to the context. .. autoclass:: Mm :members: +.. autoclass:: Pt + :members: + +.. autoclass:: Twips + :members: + .. autoclass:: Emu :members: + + +|RGBColor| objects +------------------ + +.. autoclass:: RGBColor(r, g, b) + :members: + :undoc-members: + + *r*, *g*, and *b* are each an integer in the range 0-255 inclusive. Using + the hexidecimal integer notation, e.g. `0x42` may enhance readability + where hex RGB values are in use:: + + >>> lavender = RGBColor(0xff, 0x99, 0xcc) diff --git a/docs/api/style.rst b/docs/api/style.rst new file mode 100644 index 000000000..e1647caac --- /dev/null +++ b/docs/api/style.rst @@ -0,0 +1,97 @@ + +.. _style_api: + +Style-related objects +===================== + +A style is used to collect a set of formatting properties under a single name +and apply those properties to a content object all at once. This promotes +formatting consistency thoroughout a document and across related documents +and allows formatting changes to be made globally by changing the definition +in the appropriate style. + + +|Styles| objects +---------------- + +.. currentmodule:: docx.styles.styles + +.. autoclass:: Styles() + :members: + :inherited-members: + :exclude-members: + get_by_id, get_style_id, part + + +|BaseStyle| objects +------------------- + +.. currentmodule:: docx.styles.style + +.. autoclass:: BaseStyle() + :members: + :inherited-members: + :exclude-members: + part, style_id + + +|_CharacterStyle| objects +------------------------- + +.. autoclass:: _CharacterStyle() + :show-inheritance: + :members: + :inherited-members: + :exclude-members: + element, part, style_id, type + + +|_ParagraphStyle| objects +------------------------- + +.. autoclass:: _ParagraphStyle() + :show-inheritance: + :members: + :inherited-members: + :exclude-members: + element, part, style_id, type + + +|_TableStyle| objects +--------------------- + +.. autoclass:: _TableStyle() + :show-inheritance: + :members: + :inherited-members: + :exclude-members: + element, part, style_id, type + + +|_NumberingStyle| objects +------------------------- + +.. autoclass:: _NumberingStyle() + :members: + + +|LatentStyles| objects +---------------------- + +.. currentmodule:: docx.styles.latent + +.. autoclass:: LatentStyles() + :members: + :inherited-members: + :exclude-members: + part + + +|_LatentStyle| objects +---------------------- + +.. autoclass:: _LatentStyle() + :members: + :inherited-members: + :exclude-members: + part diff --git a/docs/api/table.rst b/docs/api/table.rst index e3c9da952..215bf807c 100644 --- a/docs/api/table.rst +++ b/docs/api/table.rst @@ -15,6 +15,7 @@ Table objects are constructed using the ``add_table()`` method on |Document|. .. autoclass:: Table :members: + :exclude-members: table |_Cell| objects diff --git a/docs/api/text.rst b/docs/api/text.rst index cdb55ff61..180701bb4 100644 --- a/docs/api/text.rst +++ b/docs/api/text.rst @@ -4,18 +4,30 @@ Text-related objects ==================== -.. currentmodule:: docx.text - |Paragraph| objects ------------------- -.. autoclass:: Paragraph +.. autoclass:: docx.text.paragraph.Paragraph() + :members: + + +|ParagraphFormat| objects +------------------------- + +.. autoclass:: docx.text.parfmt.ParagraphFormat() :members: |Run| objects ------------- -.. autoclass:: Run +.. autoclass:: docx.text.run.Run() + :members: + + +|Font| objects +-------------- + +.. autoclass:: docx.text.run.Font() :members: diff --git a/docs/conf.py b/docs/conf.py index 5fb91ca12..ec5367370 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -69,61 +69,107 @@ rst_epilog = """ .. |api-Document| replace:: :class:`docx.api.Document` -.. |_Body| replace:: :class:`_Body` +.. |AttributeError| replace:: :exc:`.AttributeError` -.. |_Cell| replace:: :class:`_Cell` +.. |BaseStyle| replace:: :class:`.BaseStyle` -.. |_Column| replace:: :class:`_Column` +.. |_Body| replace:: :class:`._Body` -.. |_Columns| replace:: :class:`_Columns` +.. |_Cell| replace:: :class:`._Cell` + +.. |_CharacterStyle| replace:: :class:`._CharacterStyle` + +.. |Cm| replace:: :class:`.Cm` + +.. |ColorFormat| replace:: :class:`.ColorFormat` + +.. |_Column| replace:: :class:`._Column` + +.. |_Columns| replace:: :class:`._Columns` + +.. |CoreProperties| replace:: :class:`.CoreProperties` + +.. |datetime| replace:: :class:`.datetime.datetime` .. |Document| replace:: :class:`.Document` +.. |DocumentPart| replace:: :class:`.DocumentPart` + .. |docx| replace:: ``python-docx`` .. |Emu| replace:: :class:`.Emu` -.. |False| replace:: ``False`` +.. |False| replace:: :class:`False` + +.. |float| replace:: :class:`.float` + +.. |Font| replace:: :class:`.Font` + +.. |Inches| replace:: :class:`.Inches` .. |InlineShape| replace:: :class:`.InlineShape` .. |InlineShapes| replace:: :class:`.InlineShapes` -.. |int| replace:: :class:`int` +.. |InvalidSpanError| replace:: :class:`.InvalidSpanError` + +.. |int| replace:: :class:`.int` + +.. |_LatentStyle| replace:: :class:`._LatentStyle` + +.. |LatentStyles| replace:: :class:`.LatentStyles` .. |Length| replace:: :class:`.Length` -.. |OpcPackage| replace:: :class:`OpcPackage` +.. |None| replace:: :class:`.None` + +.. |NumberingPart| replace:: :class:`.NumberingPart` -.. |None| replace:: ``None`` +.. |_NumberingStyle| replace:: :class:`._NumberingStyle` -.. |NumberingPart| replace:: :class:`NumberingPart` +.. |OpcPackage| replace:: :class:`.OpcPackage` .. |Paragraph| replace:: :class:`.Paragraph` -.. |Part| replace:: :class:`Part` +.. |ParagraphFormat| replace:: :class:`.ParagraphFormat` + +.. |_ParagraphStyle| replace:: :class:`._ParagraphStyle` + +.. |Part| replace:: :class:`.Part` + +.. |Pt| replace:: :class:`.Pt` -.. |_Relationship| replace:: :class:`_Relationship` +.. |_Relationship| replace:: :class:`._Relationship` -.. |Relationships| replace:: :class:`_Relationships` +.. |Relationships| replace:: :class:`._Relationships` -.. |_Row| replace:: :class:`_Row` +.. |RGBColor| replace:: :class:`.RGBColor` -.. |_Rows| replace:: :class:`_Rows` +.. |_Row| replace:: :class:`._Row` -.. |Run| replace:: :class:`Run` +.. |_Rows| replace:: :class:`._Rows` + +.. |Run| replace:: :class:`.Run` + +.. |Hyperlink| replace:: :class:`Hyperlink` .. |Section| replace:: :class:`.Section` .. |Sections| replace:: :class:`.Sections` +.. |str| replace:: :class:`.str` + +.. |Styles| replace:: :class:`.Styles` + .. |StylesPart| replace:: :class:`.StylesPart` .. |Table| replace:: :class:`.Table` -.. |Text| replace:: :class:`Text` +.. |_TableStyle| replace:: :class:`._TableStyle` + +.. |_Text| replace:: :class:`._Text` -.. |True| replace:: ``True`` +.. |True| replace:: :class:`True` .. |ValueError| replace:: :class:`ValueError` """ diff --git a/docs/dev/analysis/features/cell-merge.rst b/docs/dev/analysis/features/cell-merge.rst new file mode 100644 index 000000000..2b432dfbf --- /dev/null +++ b/docs/dev/analysis/features/cell-merge.rst @@ -0,0 +1,572 @@ + +Table - Merge Cells +=================== + +Word allows contiguous table cells to be merged, such that two or more cells +appear to be a single cell. Cells can be merged horizontally (spanning +multple columns) or vertically (spanning multiple rows). Cells can also be +merged both horizontally and vertically at the same time, producing a cell +that spans both rows and columns. Only rectangular ranges of cells can be +merged. + + +Table diagrams +-------------- + +Diagrams like the one below are used to depict tables in this analysis. +Horizontal spans are depicted as a continuous horizontal cell without +vertical dividers within the span. Vertical spans are depicted as a vertical +sequence of cells of the same width where continuation cells are separated by +a dashed top border and contain a caret ('^') to symbolize the continuation +of the cell above. Cell 'addresses' are depicted at the column and row grid +lines. This is conceptually convenient as it reuses the notion of list +indices (and slices) and makes certain operations more intuitive to specify. +The merged cell `A` below has top, left, bottom, and right values of 0, 0, 2, +and 2 respectively:: + + \ 0 1 2 3 + 0 +---+---+---+ + | A | | + 1 + - - - +---+ + | ^ | | + 2 +---+---+---+ + | | | | + 3 +---+---+---+ + + +Basic cell access protocol +-------------------------- + +There are three ways to access a table cell: + +* ``Table.cell(row_idx, col_idx)`` +* ``Row.cells[col_idx]`` +* ``Column.cells[col_idx]`` + + +Accessing the middle cell of a 3 x 3 table:: + + >>> table = document.add_table(3, 3) + >>> middle_cell = table.cell(1, 1) + >>> table.rows[1].cells[1] == middle_cell + True + >>> table.columns[1].cells[1] == middle_cell + True + + +Basic merge protocol +-------------------- + +A merge is specified using two diagonal cells:: + + >>> table = document.add_table(3, 3) + >>> a = table.cells(0, 0) + >>> b = table.cells(1, 1) + >>> A = a.merge(b) + +:: + + \ 0 1 2 3 + 0 +---+---+---+ +---+---+---+ + | a | | | | A | | + 1 +---+---+---+ + - - - +---+ + | | b | | --> | ^ | | + 2 +---+---+---+ +---+---+---+ + | | | | | | | | + 3 +---+---+---+ +---+---+---+ + + +Accessing a merged cell +----------------------- + +A cell is accessed by its "layout grid" position regardless of any spans that +may be present. A grid address that falls in a span returns the top-leftmost +cell in that span. This means a span has as many addresses as layout grid +cells it spans. For example, the merged cell `A` above can be addressed as +(0, 0), (0, 1), (1, 0), or (1, 1). This addressing scheme leads to desirable +access behaviors when spans are present in the table. + +The length of Row.cells is always equal to the number of grid columns, +regardless of any spans that are present. Likewise, the length of +Column.cells is always equal to the number of table rows, regardless of any +spans. + +:: + + >>> table = document.add_table(2, 3) + >>> row = table.rows[0] + >>> len(row.cells) + 3 + >>> row.cells[0] == row.cells[1] + False + + >>> a, b = row.cells[:2] + >>> a.merge(b) + + >>> len(row.cells) + 3 + >>> row.cells[0] == row.cells[1] + True + +:: + + \ 0 1 2 3 + 0 +---+---+---+ +---+---+---+ + | a | b | | | A | | + 1 +---+---+---+ --> +---+---+---+ + | | | | | | | | + 2 +---+---+---+ +---+---+---+ + + +Cell content behavior on merge +------------------------------ + +When two or more cells are merged, any existing content is concatenated and +placed in the resulting merged cell. Content from each original cell is +separated from that in the prior original cell by a paragraph mark. An +original cell having no content is skipped in the contatenation process. In +Python, the procedure would look roughly like this:: + + merged_cell_text = '\n'.join( + cell.text for cell in original_cells if cell.text + ) + +Merging four cells with content ``'a'``, ``'b'``, ``''``, and ``'d'`` +respectively results in a merged cell having text ``'a\nb\nd'``. + + +Cell size behavior on merge +--------------------------- + +Cell width and height, if present, are added when cells are merged:: + + >>> a, b = row.cells[:2] + >>> a.width.inches, b.width.inches + (1.0, 1.0) + >>> A = a.merge(b) + >>> A.width.inches + 2.0 + + +Removing a redundant row or column +---------------------------------- + +**Collapsing a column.** When all cells in a grid column share the same +``w:gridSpan`` specification, the spanned columns can be collapsed into +a single column by removing the ``w:gridSpan`` attributes. + + +Word behavior +------------- + +* Row and Column access in the MS API just plain breaks when the table is not + uniform. `Table.Rows(n)` and `Cell.Row` raise `EnvironmentError` when + a table contains a vertical span, and `Table.Columns(n)` and `Cell.Column` + unconditionally raise `EnvironmentError` when the table contains + a horizontal span. We can do better. + +* `Table.Cell(n, m)` works on any non-uniform table, although it uses + a *visual grid* that greatly complicates access. It raises an error for `n` + or `m` out of visual range, and provides no way other than try/except to + determine what that visual range is, since `Row.Count` and `Column.Count` + are unavailable. + +* In a merge operation, the text of the continuation cells is appended to + that of the origin cell as separate paragraph(s). + +* If a merge range contains previously merged cells, the range must + completely enclose the merged cells. + +* Word resizes a table (adds rows) when a cell is referenced by an + out-of-bounds row index. If the column identifier is out of bounds, an + exception is raised. This behavior will not be implemented in |docx|. + + +Glossary +-------- + +layout grid + The regular two-dimensional matrix of rows and columns that determines + the layout of cells in the table. The grid is primarily defined by the + `w:gridCol` elements that define the layout columns for the table. Each + row essentially duplicates that layout for an additional row, although + its height can differ from other rows. Every actual cell in the table + must begin and end on a layout grid "line", whether the cell is merged or + not. + +span + The single "combined" cell occupying the area of a set of merged cells. + +skipped cell + The WordprocessingML (WML) spec allows for 'skipped' cells, where + a layout cell location contains no actual cell. I can't find a way to + make a table like this using the Word UI and haven't experimented yet to + see whether Word will load one constructed by hand in the XML. + +uniform table + A table in which each cell corresponds exactly to a layout cell. + A uniform table contains no spans or skipped cells. + +non-uniform table + A table that contains one or more spans, such that not every cell + corresponds to a single layout cell. I suppose it would apply when there + was one or more skipped cells too, but in this analysis the term is only + used to indicate a table with one or more spans. + +uniform cell + A cell not part of a span, occupying a single cell in the layout grid. + +origin cell + The top-leftmost cell in a span. Contrast with *continuation cell*. + +continuation cell + A layout cell that has been subsumed into a span. A continuation cell is + mostly an abstract concept, although a actual `w:tc` element will always + exist in the XML for each continuation cell in a vertical span. + + +Understanding merge XML intuitively +----------------------------------- + +A key insight is that merged cells always look like the diagram below. +Horizontal spans are accomplished with a single `w:tc` element in each row, +using the `gridSpan` attribute to span additional grid columns. Vertical +spans are accomplished with an identical cell in each continuation row, +having the same `gridSpan` value, and having vMerge set to `continue` (the +default). These vertical continuation cells are depicted in the diagrams +below with a dashed top border and a caret ('^') in the left-most grid column +to symbolize the continuation of the cell above.:: + + \ 0 1 2 3 + 0 +---+---+---+ + | A | | + 1 + - - - +---+ + | ^ | | + 2 +---+---+---+ + | | | | + 3 +---+---+---+ + +.. highlight:: xml + +The table depicted above corresponds to this XML (minimized for clarity):: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +XML Semantics +------------- + +In a horizontal merge, the ```` attribute indicates the +number of columns the cell should span. Only the leftmost cell is preserved; +the remaining cells in the merge are deleted. + +For merging vertically, the ``w:vMerge`` table cell property of the uppermost +cell of the column is set to the value "restart" of type ``w:ST_Merge``. The +following, lower cells included in the vertical merge must have the +``w:vMerge`` element present in their cell property (``w:TcPr``) element. Its +value should be set to "continue", although it is not necessary to +explicitely define it, as it is the default value. A vertical merge ends as +soon as a cell ``w:TcPr`` element lacks the ``w:vMerge`` element. Similarly +to the ``w:gridSpan`` element, the ``w:vMerge`` elements are only required +when the table's layout is not uniform across its different columns. In the +case it is, only the topmost cell is kept; the other lower cells in the +merged area are deleted along with their ``w:vMerge`` elements and the +``w:trHeight`` table row property is used to specify the combined height of +the merged cells. + + +len() implementation for Row.cells and Column.cells +--------------------------------------------------- + +Each ``Row`` and ``Column`` object provides access to the collection of cells +it contains. The length of these cell collections is unaffected by the +presence of merged cells. + +`len()` always bases its count on the layout grid, as though there were no +merged cells. + +* ``len(Table.columns)`` is the number of `w:gridCol` elements, representing + the number of grid columns, without regard to the presence of merged cells + in the table. + +* ``len(Table.rows)`` is the number of `w:tr` elements, regardless of any + merged cells that may be present in the table. + +* ``len(Row.cells)`` is the number of grid columns, regardless of whether any + cells in the row are merged. + +* ``len(Column.cells)`` is the number of rows in the table, regardless of + whether any cells in the column are merged. + + +Merging a cell already containing a span +---------------------------------------- + +One or both of the "diagonal corner" cells in a merge operation may itself be +a merged cell, as long as the specified region is rectangular. + +For example:: + + \ 0 1 2 3 + +---+---+---+---+ +---+---+---+---+ + 0 | a | b | | | a\nb\nC | | + + - - - +---+---+ + - - - - - +---+ + 1 | ^ | C | | | ^ | | + +---+---+---+---+ --> +---+---+---+---+ + 2 | | | | | | | | | | + +---+---+---+---+ +---+---+---+---+ + 3 | | | | | | | | | | + +---+---+---+---+ +---+---+---+---+ + + cell(0, 0).merge(cell(1, 2)) + +or:: + + 0 1 2 3 4 + +---+---+---+---+---+ +---+---+---+---+---+ + 0 | a | b | c | | | abcD | | + + - - - +---+---+---+ + - - - - - - - +---+ + 1 | ^ | D | | | ^ | | + +---+---+---+---+---+ --> +---+---+---+---+---+ + 2 | | | | | | | | | | | | + +---+ - - - +---+---+ +---+---+---+---+---+ + 3 | | | | | | | | | | | | + +---+---+---+---+---+ +---+---+---+---+---+ + + cell(0, 0).merge(cell(1, 2)) + + +Conversely, either of these two merge operations would be illegal:: + + \ 0 1 2 3 4 0 1 2 3 4 + 0 +---+---+---+---+ 0 +---+---+---+---+ + | | | b | | | | | | | + 1 +---+---+ - +---+ 1 +---+---+---+---+ + | | a | ^ | | | | a | | | + 2 +---+---+ - +---+ 2 +---+---+---+---+ + | | | ^ | | | b | | + 3 +---+---+---+---+ 3 +---+---+---+---+ + | | | | | | | | | | + 4 +---+---+---+---+ 4 +---+---+---+---+ + + a.merge(b) + + +General algorithm +~~~~~~~~~~~~~~~~~ + +* find top-left and target width, height +* for each tr in target height, tc.grow_right(target_width) + + +Specimen XML +------------ + +.. highlight:: xml + +A 3 x 3 table where an area defined by the 2 x 2 topleft cells has been +merged, demonstrating the combined use of the ``w:gridSpan`` as well as the +``w:vMerge`` elements, as produced by Word:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Open Issues +----------- + +* Does Word allow "skipped" cells at the beginning of a row (`w:gridBefore` + element)? These are described in the spec, but I don't see a way in the + Word UI to create such a table. + + +Ressources +---------- + +* `Cell.Merge Method on MSDN`_ + +.. _`Cell.Merge Method on MSDN`: + http://msdn.microsoft.com/en-us/library/office/ff821310%28v=office.15%29.aspx + +Relevant sections in the ISO Spec +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* 17.4.17 gridSpan (Grid Columns Spanned by Current Table Cell) +* 17.4.84 vMerge (Vertically Merged Cell) +* 17.18.57 ST_Merge (Merged Cell Type) diff --git a/docs/dev/analysis/features/char-style.rst b/docs/dev/analysis/features/char-style.rst deleted file mode 100644 index 9b62b9983..000000000 --- a/docs/dev/analysis/features/char-style.rst +++ /dev/null @@ -1,138 +0,0 @@ - -Character Style -=============== - -Word allows a set of run-level properties to be given a name. The set of -properties is called a *character style*. All the settings may be applied to -a run in a single action by setting the style of the run. - -Example: - - The normal font of a document is 10 point Times Roman. From time to time, - a Python class name appears in-line in the text. These short runs of - Python text are to appear in 9 point Courier. A character style named "Code" - is defined such that these words or phrases can be set to the distinctive - font and size in a single step. - - Later, it is decided that 10 point Menlo should be used for inline code - instead. The "Code" character style is updated to the new settings and all - instances of inline code in the document immediately appear in the new - font. - - -Protocol --------- - -There are two call protocols related to character style: getting and setting -the character style of a run, and specifying a style when creating a run. - -Getting and setting the style of a run:: - - >>> run = p.add_run() - >>> run.style - None - >>> run.style = 'Emphasis' - >>> run.style - 'Emphasis' - >>> run.style = None - >>> run.style - None - -Assigning |None| to ``Run.style`` causes any applied character style to be -removed. A run without a character style inherits the character style of its -containing paragraph. - -Specifying the style of a run on creation:: - - >>> run = p.add_run() - >>> run.style - None - >>> run = p.add_run(style='Emphasis') - >>> run.style - 'Emphasis' - >>> run = p.add_run('text in this run', 'Strong') - >>> run.style - 'Strong' - - - -Specimen XML ------------- - -.. highlight:: xml - -A baseline regular run:: - - - - This is a regular paragraph. - - - -Adding *Emphasis* character style:: - - - - - - - This paragraph appears in Emphasis character style. - - - -A style that appears in the Word user interface (UI) with one or more spaces -in its name, such as "Subtle Emphasis", will generally have a style ID with -those spaces removed. In this example, "Subtle Emphasis" becomes -"SubtleEmphasis":: - - - - - - - a few words in Subtle Emphasis style - - - - - -Schema excerpt --------------- - -.. highlight:: xml - -:: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/dev/analysis/features/coreprops.rst b/docs/dev/analysis/features/coreprops.rst new file mode 100644 index 000000000..d4100864d --- /dev/null +++ b/docs/dev/analysis/features/coreprops.rst @@ -0,0 +1,199 @@ + +Core Document Properties +======================== + +The Open XML format provides for a set of descriptive properties to be +maintained with each document. One of these is the *core file properties*. +The core properties are common to all Open XML formats and appear in +document, presentation, and spreadsheet files. The 'Core' in core document +properties refers to `Dublin Core`_, a metadata standard that defines a core +set of elements to describe resources. + +The core properties are described in Part 2 of the ISO/IEC 29500 spec, in +Section 11. The names of some core properties in |docx| are changed from +those in the spec to conform to the MS API. + +Other properties such as company name are custom properties, held in +``app.xml``. + + +Candidate Protocol +------------------ + +:: + + >>> document = Document() + >>> core_properties = document.core_properties + >>> core_properties.author + 'python-docx' + >>> core_properties.author = 'Brian' + >>> core_properties.author + 'Brian' + + +Properties +---------- + +15 properties are supported. All unicode values are limited to 255 characters +(not bytes). + +author *(unicode)* + Note: named 'creator' in spec. An entity primarily responsible for making + the content of the resource. (Dublin Core) + +category *(unicode)* + A categorization of the content of this package. Example values for this + property might include: Resume, Letter, Financial Forecast, Proposal, + Technical Presentation, and so on. (Open Packaging Conventions) + +comments *(unicode)* + Note: named 'description' in spec. An explanation of the content of the + resource. Values might include an abstract, table of contents, reference + to a graphical representation of content, and a free-text account of the + content. (Dublin Core) + +content_status *(unicode)* + The status of the content. Values might include “Draft”, “Reviewed”, and + “Final”. (Open Packaging Conventions) + +created *(datetime)* + Date of creation of the resource. (Dublin Core) + +identifier *(unicode)* + An unambiguous reference to the resource within a given context. + (Dublin Core) + +keywords *(unicode)* + A delimited set of keywords to support searching and indexing. This is + typically a list of terms that are not available elsewhere in the + properties. (Open Packaging Conventions) + +language *(unicode)* + The language of the intellectual content of the resource. (Dublin Core) + +last_modified_by *(unicode)* + The user who performed the last modification. The identification is + environment-specific. Examples include a name, email address, or employee + ID. It is recommended that this value be as concise as possible. + (Open Packaging Conventions) + +last_printed *(datetime)* + The date and time of the last printing. (Open Packaging Conventions) + +modified *(datetime)* + Date on which the resource was changed. (Dublin Core) + +revision *(int)* + The revision number. This value might indicate the number of saves or + revisions, provided the application updates it after each revision. + (Open Packaging Conventions) + +subject *(unicode)* + The topic of the content of the resource. (Dublin Core) + +title *(unicode)* + The name given to the resource. (Dublin Core) + +version *(unicode)* + The version designator. This value is set by the user or by the + application. (Open Packaging Conventions) + + +Specimen XML +------------ + +.. highlight:: xml + +core.xml produced by Microsoft Word:: + + + + Core Document Properties Exploration + PowerPoint core document properties + Steve Canny + powerpoint; open xml; dublin core; microsoft office + + One thing I'd like to discover is just how line wrapping is handled + in the comments. This paragraph is all on a single + line._x000d__x000d_This is a second paragraph separated from the + first by two line feeds. + + Steve Canny + 2 + 2013-04-06T06:03:36Z + 2013-06-15T06:09:18Z + analysis + + + +Schema Excerpt +-------------- + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +.. _Dublin Core: + http://en.wikipedia.org/wiki/Dublin_Core diff --git a/docs/dev/analysis/features/par-alignment.rst b/docs/dev/analysis/features/par-alignment.rst deleted file mode 100644 index 48c29cf67..000000000 --- a/docs/dev/analysis/features/par-alignment.rst +++ /dev/null @@ -1,174 +0,0 @@ - -Paragraph alignment -=================== - -In Word, each paragraph has an *alignment* attribute that specifies how to -justify the lines of the paragraph when the paragraph is laid out on the -page. Common values are left, right, centered, and justified. - - -Protocol --------- - -The protocol for getting and setting paragraph alignment is illustrated in -this interactive session:: - - >>> paragraph = body.add_paragraph() - >>> paragraph.alignment - None - >>> paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT - >>> paragraph.alignment - RIGHT (2) - >>> paragraph.alignment = None - >>> paragraph.alignment - None - - -Semantics ---------- - -If the ```` element is not present on a paragraph, the alignment value -for that paragraph is inherited from its style hierarchy. If the element is -present, its value overrides any inherited value. From the API, a value of -|None| on the ``Paragraph.alignment`` property corresponds to no ```` -element being present. If |None| is assigned to ``Paragraph.alignment``, the -```` element is removed. - - -Enumerations ------------- - -WD_ALIGN_PARAGRAPH -~~~~~~~~~~~~~~~~~~ - -`WdParagraphAlignment Enumeration on MSDN`_ - -+--------------+------+----------------+ -| Name | enum | attr | -+==============+======+================+ -| LEFT | 0 | left | -+--------------+------+----------------+ -| CENTER | 1 | center | -+--------------+------+----------------+ -| RIGHT | 2 | right | -+--------------+------+----------------+ -| JUSTIFY | 3 | both | -+--------------+------+----------------+ -| DISTRIBUTE | 4 | distribute | -+--------------+------+----------------+ -| JUSTIFY_MED | 5 | mediumKashida | -+--------------+------+----------------+ -| JUSTIFY_HI | 7 | highKashida | -+--------------+------+----------------+ -| JUSTIFY_LOW | 8 | lowKashida | -+--------------+------+----------------+ -| THAI_JUSTIFY | 9 | thaiDistribute | -+--------------+------+----------------+ - -.. _WdParagraphAlignment Enumeration on MSDN: - http://msdn.microsoft.com/en-us/library/office/ff835817(v=office.15).aspx - - -Specimen XML ------------- - -.. highlight:: xml - -A paragraph with inherited alignment:: - - - - Inherited paragraph alignment. - - - -A right-aligned paragraph:: - - - - - - - Right-aligned paragraph. - - - - -Schema excerpt --------------- - -:: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/dev/analysis/features/styles/character-style.rst b/docs/dev/analysis/features/styles/character-style.rst new file mode 100644 index 000000000..d06046daa --- /dev/null +++ b/docs/dev/analysis/features/styles/character-style.rst @@ -0,0 +1,161 @@ + +Character Style +=============== + +Word allows a set of run-level properties to be given a name. The set of +properties is called a *character style*. All the settings may be applied to +a run in a single action by setting the style of the run. + + +Protocol +-------- + +There are two call protocols related to character style: getting and setting +the character style of a run, and specifying a style when creating a run. + +Get run style:: + + >>> run = p.add_run() + + >>> run.style + + >>> run.style.name + 'Default Paragraph Font' + +Set run style using character style name:: + + >>> run.style = 'Emphasis' + >>> run.style.name + 'Emphasis' + +Set run style using character style object:: + + >>> run.style = document.styles['Strong'] + >>> run.style.name + 'Strong' + +Assigning |None| to :attr:`.Run.style` causes any applied character style to +be removed. A run without a character style inherits the default character +style of the document:: + + >>> run.style = None + >>> run.style.name + 'Default Paragraph Font' + +Specifying the style of a run on creation:: + + >>> run = p.add_run(style='Strong') + >>> run.style.name + 'Strong' + + +Specimen XML +------------ + +.. highlight:: xml + +A baseline regular run:: + + + + This is a regular paragraph. + + + +Adding *Emphasis* character style:: + + + + + + + This paragraph appears in Emphasis character style. + + + +A style that appears in the Word user interface (UI) with one or more spaces +in its name, such as "Subtle Emphasis", will generally have a style ID with +those spaces removed. In this example, "Subtle Emphasis" becomes +"SubtleEmphasis":: + + + + + + + a few words in Subtle Emphasis style + + + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/styles/index.rst b/docs/dev/analysis/features/styles/index.rst new file mode 100644 index 000000000..3cbfcbb27 --- /dev/null +++ b/docs/dev/analysis/features/styles/index.rst @@ -0,0 +1,330 @@ + +Styles +====== + +.. toctree:: + :titlesonly: + + styles + style + paragraph-style + character-style + latent-styles + +Word supports the definition of *styles* to allow a group of formatting +properties to be easily and consistently applied to a paragraph, run, table, +or numbering scheme, all at once. The mechanism is similar to how Cascading +Style Sheets (CSS) works with HTML. + +Styles are defined in the ``styles.xml`` package part and are keyed to +a paragraph, run, or table using the `styleId` string. + +Style visual behavior +--------------------- + +* **Sort order.** Built-in styles appear in order of the effective value of + their `uiPriority` attribute. By default, a custom style will not receive + a `uiPriority` attribute, causing its effective value to default to 0. This + will generlly place custom styles at the top of the sort order. A set of + styles having the same `uiPriority` value will be sub-sorted in + alphabetical order. + + If a `uiPriority` attribute is defined for a custom style, that style is + interleaved with the built-in styles, according to their `uiPriority` + value. The `uiPriority` attribute takes a signed integer, and accepts + negative numbers. Note that Word does not allow the use of negative + integers via its UI; rather it allows the `uiPriority` number of built-in + types to be increased to produce the desired sorting behavior. + +* **Identification.** A style is identified by its name, not its styleId + attribute. The styleId is used only for internal linking of an object like + a paragraph to a style. The styleId may be changed by the application, and + in fact is routinely changed by Word on each save to be a transformation of + the name. + + *Hypothesis.* Word calculates the `styleId` by removing all spaces from the + style name. + +* **List membership.** There are four style list options in the styles panel: + + + *Recommended.* The recommended list contains all latent and defined + styles that have `semiHidden` == |False|. + + + *Styles in Use.* The styles-in-use list contains all styles that have + been applied to content in the document (implying they are defined) that + also have `semiHidden` == |False|. + + + *In Current Document.* The in-current-document list contains all defined + styles in the document having `semiHidden` == |False|. + + + *All Styles.* The all-styles list contains all latent and defined + styles in the document. + +* **Definition of built-in style.** When a built-in style is added to + a document (upon first use), the value of each of the `locked`, + `uiPriority` and `qFormat` attributes from its latent style definition (the + `latentStyles` attributes overridden by those of any `lsdException` + element) is used to override the corresponding value in the inserted style + definition from their built-in defaults. + +* Each built-in style has default attributes that can be revealed by setting + the `latentStyles/@count` attribute to 0 and inspecting the style in the + style manager. This may include default behavioral properties. + +* Anomaly. Style "No Spacing" does not appear in the recommended list even + though its behavioral attributes indicate it should. (Google indicates it + may be a legacy style from Word 2003). + +* Word has 267 built-in styles, listed here: + http://www.thedoctools.com/downloads/DocTools_List_Of_Built-in_Style_English_Danish_German_French.pdf + + Note that at least one other sources has the number at 276 rather than 267. + +* **Appearance in the Style Gallery.** A style appears in the style gallery + when: `semiHidden` == |False| and `qFormat` == |True| + + +Glossary +-------- + +built-in style + One of a set of standard styles known to Word, such as "Heading 1". + Built-in styles are presented in Word's style panel whether or not they + are actually defined in the styles part. + +latent style + A built-in style having no definition in a particular document is known + as a *latent style* in that document. + +style definition + A ```` element in the styles part that explicitly defines the + attributes of a style. + +recommended style list + A list of styles that appears in the styles toolbox or panel when + "Recommended" is selected from the "List:" dropdown box. + + +Word behavior +------------- + +If no style having an assigned style id is defined in the styles part, the +style application has no effect. + +Word does not add a formatting definition (```` element) for a +built-in style until it is used. + +Once present in the styles part, Word does not remove a built-in style +definition if it is no longer applied to any content. The definition of each +of the styles ever used in a document are accumulated in its ``styles.xml``. + + +Related MS API *(partial)* +-------------------------- + +* Document.Styles +* Styles.Add, .Item, .Count, access by name, e.g. Styles("Foobar") +* Style.BaseStyle +* Style.Builtin +* Style.Delete() +* Style.Description +* Style.Font +* Style.Linked +* Style.LinkStyle +* Style.LinkToListTemplate() +* Style.ListLevelNumber +* Style.ListTemplate +* Style.Locked +* Style.NameLocal +* Style.NameParagraphStyle +* Style.NoSpaceBetweenParagraphsOfSameStyle +* Style.ParagraphFormat +* Style.Priority +* Style.QuickStyle +* Style.Shading +* Style.Table(Style) +* Style.Type +* Style.UnhideWhenUsed +* Style.Visibility + + +Enumerations +------------ + +* WdBuiltinStyle + + +Example XML +----------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Schema excerpt +-------------- + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/styles/latent-styles.rst b/docs/dev/analysis/features/styles/latent-styles.rst new file mode 100644 index 000000000..1423e5303 --- /dev/null +++ b/docs/dev/analysis/features/styles/latent-styles.rst @@ -0,0 +1,266 @@ + +Latent Styles +============= + +Latent style definitions are a "stub" style definition specifying behavioral +(UI display) attributes for built-in styles. + + +Latent style collection +----------------------- + +The latent style collection for a document is accessed using the +:attr:`~.Styles.latent_styles` property on |Styles|:: + + >>> latent_styles = document.styles.latent_styles + >>> latent_styles + + +**Iteration.** |LatentStyles| should support iteration of contained +|_LatentStyle| objects in document order. + +**Latent style access.** A latent style can be accessed by name using +dictionary-style notation. + +**len().** |LatentStyles| supports :meth:`len`, reporting the number of +|_LatentStyle| objects it contains. + + +|LatentStyles| properties +------------------------- + + +default_priority +~~~~~~~~~~~~~~~~ + +**XML semantics**. According to ISO 29500, the default value if the +`w:defUIPriority` attribute is omitted is 99. 99 is explictly set in the +default Word `styles.xml`, so will generally be what one finds. + +**Protocol**:: + + >>> # return None if attribute is omitted + >>> latent_styles.default_priority + None + >>> # but expect is will almost always be explicitly 99 + >>> latent_styles.default_priority + 99 + >>> latent_styles.default_priority = 42 + >>> latent_styles.default_priority + 42 + + +load_count +~~~~~~~~~~ + +**XML semantics**. No default is stated in the spec. Don't allow assignment +of |None|. + +**Protocol**:: + + >>> latent_styles.load_count + 276 + >>> latent_styles.load_count = 242 + >>> latent_styles.load_count + 242 + + +Boolean properties +~~~~~~~~~~~~~~~~~~ + +There are four boolean properties that all share the same protocol: + +* default_to_hidden +* default_to_locked +* default_to_quick_style +* default_to_unhide_when_used + +**XML semantics**. Defaults to |False| if the attribute is omitted. However, +the attribute should always be written explicitly on update. + +**Protocol**:: + + >>> latent_styles.default_to_hidden + False + >>> latent_styles.default_to_hidden = True + >>> latent_styles.default_to_hidden + True + + +Specimen XML +~~~~~~~~~~~~ + +.. highlight:: xml + +The `w:latentStyles` element used in the default Word 2011 template:: + + + + +|_LatentStyle| properties +------------------------- + +.. highlight:: python + +:: + + >>> latent_style = latent_styles.latent_styles[0] + + >>> latent_style.name + 'Normal' + + >>> latent_style.priority + None + >>> latent_style.priority = 10 + >>> latent_style.priority + 10 + + >>> latent_style.locked + None + >>> latent_style.locked = True + >>> latent_style.locked + True + + >>> latent_style.quick_style + None + >>> latent_style.quick_style = True + >>> latent_style.quick_style + True + + +Latent style behavior +--------------------- + +* A style has two categories of attribute, *behavioral* and *formatting*. + Behavioral attributes specify where and when the style should appear in the + user interface. Behavioral attributes can be specified for latent styles + using the ```` element and its ```` child + elements. The 5 behavioral attributes are: + + + locked + + uiPriority + + semiHidden + + unhideWhenUsed + + qFormat + +* **locked**. The `locked` attribute specifies that the style should not + appear in any list or the gallery and may not be applied to content. This + behavior is only active when restricted formatting is turned on. + + Locking is turned on via the menu: Developer Tab > Protect Document > + Formatting Restrictions (Windows only). + +* **uiPriority**. The `uiPriority` attribute acts as a sort key for + sequencing style names in the user interface. Both the lists in the styles + panel and the Style Gallery are sensitive to this setting. Its effective + value is 0 if not specified. + +* **semiHidden**. The `semiHidden` attribute causes the style to be excluded + from the recommended list. The notion of *semi* in this context is that + while the style is hidden from the recommended list, it still appears in + the "All Styles" list. This attribute is removed on first application of + the style if an `unhideWhenUsed` attribute set |True| is also present. + +* **unhideWhenUsed**. The `unhideWhenUsed` attribute causes any `semiHidden` + attribute to be removed when the style is first applied to content. Word + does *not* remove the `semiHidden` attribute just because there exists an + object in the document having that style. The `unhideWhenUsed` attribute is + not removed along with the `semiHidden` attribute when the style is + applied. + + The `semiHidden` and `unhideWhenUsed` attributes operate in combination to + produce *hide-until-used* behavior. + + *Hypothesis.* The persistance of the `unhideWhenUsed` attribute after + removing the `semiHidden` attribute on first application of the style is + necessary to produce appropriate behavior in style inheritance situations. + In that case, the `semiHidden` attribute may be explictly set to |False| to + override an inherited value. Or it could allow the `semiHidden` attribute + to be re-set to |True| later while preserving the hide-until-used behavior. + +* **qFormat**. The `qFormat` attribute specifies whether the style should + appear in the Style Gallery when it appears in the recommended list. + A style will never appear in the gallery unless it also appears in the + recommended list. + +* Latent style attributes are only operative for latent styles. Once a style + is defined, the attributes of the definition exclusively determine style + behavior; no attributes are inherited from its corresponding latent style + definition. + + +Specimen XML +------------ + +.. highlight:: xml + +:: + + + + + + + + + + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/styles/paragraph-style.rst b/docs/dev/analysis/features/styles/paragraph-style.rst new file mode 100644 index 000000000..cc134b236 --- /dev/null +++ b/docs/dev/analysis/features/styles/paragraph-style.rst @@ -0,0 +1,142 @@ + +Paragraph Style +=============== + +A paragraph style provides character formatting (font) as well as paragraph +formatting properties. Character formatting is inherited from +|_CharacterStyle| and is predominantly embodied in the :attr:`font` property. +Likewise, most paragraph-specific properties come from the |ParagraphFormat| +object available on the :attr:`paragraph_format` property. + +A handful of other properties are specific to a paragraph style. + + +next_paragraph_style +-------------------- + +The `next_paragraph_style` property provides access to the style that will +automatically be assigned by Word to a new paragraph inserted after +a paragraph with this style. This property is most useful for a style that +would normally appear only once in a sequence, such as a heading. + +The default is to use the same style for an inserted paragraph. This +addresses the most common case; for example, a body paragraph having `Body +Text` style would normally be followed by a paragraph of the same style. + + +Expected usage +~~~~~~~~~~~~~~ + +The priority use case for this property is to provide a working style that +can be assigned to a paragraph. The property will always provide a valid +paragraph style, defaulting to the current style whenever a more specific one +cannot be determined. + +While this obscures some specifics of the situation from the API, it +addresses the expected most common use case. Developers needing to detect, +for example, missing styles can readily use the oxml layer to inspect the +XML and further features can be added if those use cases turn out to be more +common than expected. + + +Behavior +~~~~~~~~ + +**Default.** The default next paragraph style is the same paragraph style. + +The default is used whenever the next paragraph style is not specified or is +invalid, including these conditions: + +* No `w:next` child element is present +* A style having the styleId specified in `w:next/@w:val` is not present in + the document. +* The style specified in `w:next/@w:val` is not a paragraph style. + +In all these cases the current style (`self`) is returned. + + +Example XML +~~~~~~~~~~~ + +.. highlight:: xml + +paragraph_style.next_paragraph_style is styles['Bar']:: + + + + + + +**Semantics.** The `w:next` child element is optional. + +* When omitted, the next style is the same as the current style. +* If no style with a matching styleId exists, the `w:next` element is ignored + and the next style is the same as the current style. +* If a style is found but is of a style type other than paragraph, the + `w:next` element is ignored and the next style is the same as the current + style. + + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +.. highlight:: python + +:: + + >>> styles = document.styles + + >>> paragraph_style = styles['Foo'] + >>> paragraph_style.next_paragraph_style == paragraph_style + True + + >>> paragraph_style.next_paragraph_style = styles['Bar'] + >>> paragraph_style.next_paragraph_style == styles['Bar'] + True + + >>> paragraph_style.next_paragraph_style = None + >>> paragraph_style.next_paragraph_style == paragraph_style + True + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/styles/style.rst b/docs/dev/analysis/features/styles/style.rst new file mode 100644 index 000000000..5121b074b --- /dev/null +++ b/docs/dev/analysis/features/styles/style.rst @@ -0,0 +1,493 @@ + +Style objects +============= + +A style is one of four types; character, paragraph, table, or numbering. All +style objects have behavioral properties and formatting properties. The set of +formatting properties varies depending on the style type. In general, +formatting properties are inherited along this hierarchy: character -> +paragraph -> table. A numbering style has no formatting properties and does +not inherit. + +Behavioral properties +--------------------- + +There are six behavior properties: + +hidden + Style operates to assign formatting properties, but does not appear in + the UI under any circumstances. Used for *internal* styles assigned by an + application that should not be under the control of an end-user. + +priority + Determines the sort order of the style in sequences presented by the UI. + +semi-hidden + The style is hidden from the so-called "main" user interface. In Word + this means the *recommended list* and the style gallery. The style still + appears in the *all styles* list. + +unhide_when_used + Flag to the application to set semi-hidden False when the style is next + used. + +quick_style + Show the style in the style gallery when it is not hidden. + +locked + Style is hidden and cannot be applied when document formatting protection + is active. + + +hidden +------ + +The `hidden` attribute doesn't work on built-in styles and its behavior on +custom styles is spotty. Skipping this attribute for now. Will reconsider if +someone requests it and can provide a specific use case. + +Behavior +~~~~~~~~ + +**Scope.** `hidden` doesn't work at all on 'Normal' or 'Heading 1' style. It +doesn't work on Salutation either. There is no `w:defHidden` attribute on +`w:latentStyles`, lending credence to the hypothesis it is not enabled for +built-in styles. *Hypothesis:* Doesn't work on built-in styles. + +**UI behavior.** A custom style having `w:hidden` set |True| is hidden from +the gallery and all styles pane lists. It does however appear in the "Current +style of selected text" box in the styles pane when the cursor is on +a paragraph of that style. The style can be modified by the user from this +current style UI element. The user can assign a new style to a paragraph +having a hidden style. + + +priority +-------- + +The `priority` attribute is the integer primary sort key determining the +position of a style in a UI list. The secondary sort is alphabetical by name. +Negative values are valid, although not assigned by Word itself and appear to +be treated as 0. + +Behavior +~~~~~~~~ + +**Default.** Word behavior appears to default priority to 0 for custom +styles. The spec indicates the effective default value is conceptually +infinity, such that the style appears at the end of the styles list, +presumably alphabetically among other styles having no priority assigned. + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +:: + + >>> style = document.styles['Foobar'] + >>> style.priority + None + >>> style.priority = 7 + >>> style.priority + 7 + >>> style.priority = -42 + >>> style.priority + 0 + + +semi-hidden +----------- + +The `w:semiHidden` element specifies visibility of the style in the so-called +*main* user interface. For Word, this means the style gallery and the +recommended, styles-in-use, and in-current-document lists. The all-styles +list and current-style dropdown in the styles pane would then be considered +part of an *advanced* user interface. + +Behavior +~~~~~~~~ + +**Default.** If the `w:semiHidden` element is omitted, its effective value is +|False|. There is no inheritance of this value. + +**Scope.** Works on both built-in and custom styles. + +**Word behavior.** Word does not use the `@w:val` attribute. It writes +`` for |True| and omits the element for |False|. + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +:: + + >>> style = document.styles['Foo'] + >>> style.hidden + False + >>> style.hidden = True + >>> style.hidden + True + +Example XML +~~~~~~~~~~~ + +.. highlight:: xml + +style.hidden = True:: + + + + + + +style.hidden = False:: + + + + + +Alternate constructions should also report the proper value but not be +used when writing XML:: + + + + + + + + + + + + +unhide-when-used +---------------- + +The `w:unhideWhenUsed` element signals an application that this style should +be made visibile the next time it is used. + +Behavior +~~~~~~~~ + +**Default.** If the `w:unhideWhenUsed` element is omitted, its effective +value is |False|. There is no inheritance of this value. + +**Word behavior.** The `w:unhideWhenUsed` element is not changed or removed +when the style is next used. Only the `w:semiHidden` element is affected, if +present. Presumably this is so a style can be re-hidden, to be unhidden on +the subsequent use. + +Note that this behavior in Word is only triggered by a user actually applying +a style. Merely loading a document having the style applied somewhere in its +contents does not cause the `w:semiHidden` element to be removed. + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +:: + + >>> style = document.styles['Foo'] + >>> style.unhide_when_used + False + >>> style.unhide_when_used = True + >>> style.unhide_when_used + True + +Example XML +~~~~~~~~~~~ + +.. highlight:: xml + +style.unhide_when_used = True:: + + + + + + + +style.unhide_when_used = False:: + + + + + +Alternate constructions should also report the proper value but not be +used when writing XML:: + + + + + + + + + + + + +quick-style +----------- + +The `w:qFormat` element specifies whether Word should display this style in +the style gallery. In order to appear in the gallery, this attribute must be +|True| and `hidden` must be |False|. + +Behavior +~~~~~~~~ + +**Default.** If the `w:qFormat` element is omitted, its effective value is +|False|. There is no inheritance of this value. + +**Word behavior.** If `w:qFormat` is |True| and the style is not hidden, it +will appear in the gallery in the order specified by `w:uiPriority`. + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +:: + + >>> style = document.styles['Foo'] + >>> style.quick_style + False + >>> style.quick_style = True + >>> style.quick_style + True + +Example XML +~~~~~~~~~~~ + +.. highlight:: xml + +style.quick_style = True:: + + + + + + +style.quick_style = False:: + + + + + +Alternate constructions should also report the proper value but not be +used when writing XML:: + + + + + + + + + + + + +locked +------ + +The `w:locked` element specifies whether Word should prevent this style from +being applied to content. This behavior is only active if formatting +protection is turned on. + +Behavior +~~~~~~~~ + +**Default.** If the `w:locked` element is omitted, its effective value is +|False|. There is no inheritance of this value. + +Candidate protocol +~~~~~~~~~~~~~~~~~~ + +:: + + >>> style = document.styles['Foo'] + >>> style.locked + False + >>> style.locked = True + >>> style.locked + True + +Example XML +~~~~~~~~~~~ + +.. highlight:: xml + +style.locked = True:: + + + + + + +style.locked = False:: + + + + + +Alternate constructions should also report the proper value but not be +used when writing XML:: + + + + + + + + + + + + +Candidate protocols +------------------- + +Identification:: + + >>> style = document.styles['Body Text'] + >>> style.name + 'Body Text' + >>> style.style_id + 'BodyText' + >>> style.type + WD_STYLE_TYPE.PARAGRAPH (1) + +`delete()`:: + + >>> len(styles) + 6 + >>> style.delete() + >>> len(styles) + 5 + >>> styles['Citation'] + KeyError: no style with id or name 'Citation' + +Style.base_style:: + + >>> style = styles.add_style('Citation', WD_STYLE_TYPE.PARAGRAPH) + >>> style.base_style + None + >>> style.base_style = styles['Normal'] + >>> style.base_style + + >>> style.base_style.name + 'Normal' + + +Example XML +----------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Schema excerpt +-------------- + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/styles/styles.rst b/docs/dev/analysis/features/styles/styles.rst new file mode 100644 index 000000000..96bdd3243 --- /dev/null +++ b/docs/dev/analysis/features/styles/styles.rst @@ -0,0 +1,222 @@ + +Styles collection +================= + + +Candidate protocols +------------------- + +Access:: + + >>> styles = document.styles # default styles part added if not present + >>> styles + + +Iteration and length:: + + >>> len(styles) + 10 + >>> list_styles = [s for s in styles if s.type == WD_STYLE_TYPE.LIST] + >>> len(list_styles) + 3 + +Access style by name (or style id):: + + >>> styles['Normal'] + + + >>> styles['undefined-style'] + KeyError: no style with id or name 'undefined-style' + +:meth:`.Styles.add_style()`:: + + >>> style = styles.add_style('Citation', WD_STYLE_TYPE.PARAGRAPH) + >>> style.name + 'Citation' + >>> style.type + PARAGRAPH (1) + >>> style.builtin + False + + +Feature Notes +------------- + +* could add a default builtin style from known specs on first access via + WD_BUILTIN_STYLE enumeration:: + + >>> style = document.styles['Heading1'] + KeyError: no style with id or name 'Heading1' + >>> style = document.styles[WD_STYLE.HEADING_1] + >>> assert style == document.styles['Heading1'] + + +Example XML +----------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Schema excerpt +-------------- + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/table-props.rst b/docs/dev/analysis/features/table-props.rst index b2f8fbeba..8485c7bc8 100644 --- a/docs/dev/analysis/features/table-props.rst +++ b/docs/dev/analysis/features/table-props.rst @@ -3,6 +3,23 @@ Table Properties ================ +Alignment +--------- + +Word allows a table to be aligned between the page margins either left, +right, or center. + +The read/write :attr:`Table.alignment` property specifies the alignment for +a table:: + + >>> table = document.add_table(rows=2, cols=2) + >>> table.alignment + None + >>> table.alignment = WD_TABLE_ALIGNMENT.RIGHT + >>> table.alignment + RIGHT (2) + + Autofit ------- @@ -28,12 +45,13 @@ Specimen XML .. highlight:: xml -The following XML is generated by Word when inserting a 2x2 table:: +The following XML represents a 2x2 table:: + @@ -151,6 +169,22 @@ Schema Definitions + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/breaks.rst b/docs/dev/analysis/features/text/breaks.rst similarity index 100% rename from docs/dev/analysis/features/breaks.rst rename to docs/dev/analysis/features/text/breaks.rst diff --git a/docs/dev/analysis/features/text/font-color.rst b/docs/dev/analysis/features/text/font-color.rst new file mode 100644 index 000000000..443bc3af0 --- /dev/null +++ b/docs/dev/analysis/features/text/font-color.rst @@ -0,0 +1,288 @@ + +Font Color +========== + +Color, as a topic, extends beyond the |Font| object; font color is just the +first place it's come up. Accordingly, it bears a little deeper thought than +usual since we'll want to reuse the same objects and protocol to specify +color in the other contexts; it makes sense to craft a general solution that +will bear the expected reuse. + +There are three historical sources to draw from for this API. + +1. The `w:rPr/w:color` element. This is used by default when applying color + directly to text or when setting the text color of a style. This + corresponds to the `Font.Color` property (undocumented, unfortunately). + This element supports RGB colors, theme colors, and a tint or shade of + a theme color. + +2. The `w:rPr/w14:textFill` element. This is used by Word for fancy text like + gradient and shadow effects. This corresponds to the `Font.Fill` property. + +3. The PowerPoint font color UI. This seems like a reasonable compromise + between the prior two, allowing direct-ish access to common color options + while holding the door open for the `Font.fill` operations to be added + later if required. + +Candidate Protocol +~~~~~~~~~~~~~~~~~~ + +:class:`docx.text.run.Run` has a font property:: + + >>> from docx import Document + >>> from docx.text.run import Font, Run + >>> run = Document().add_paragraph().add_run() + >>> isinstance(run, Run) + True + >>> font = run.font + >>> isinstance(font, Font) + True + +:class:`docx.text.run.Font` has a read-only color property, returning +a :class:`docx.dml.color.ColorFormat` object:: + + >>> from docx.dml.color import ColorFormat + >>> color = font.color + >>> isinstance(font.color, ColorFormat) + True + >>> font.color = 'anything' + AttributeError: can't set attribute + + +:class:`docx.dml.color.ColorFormat` has a read-only :attr:`type` property and +read/write :attr:`rgb`, :attr:`theme_color`, and :attr:`brightness` +properties. + +:attr:`ColorFormat.type` returns one of `MSO_COLOR_TYPE.RGB`, +`MSO_COLOR_TYPE.THEME`, `MSO_COLOR_TYPE.AUTO`, or |None|, the latter +indicating font has no directly-applied color:: + + >>> font.color.type + None + +:attr:`ColorFormat.rgb` returns an |RGBColor| object when `type` is +`MSO_COLOR_TYPE.RGB`. It may also report an RGBColor value when `type` is +`MSO_COLOR_TYPE.THEME`, since an RGB color may also be present in that case. +According to the spec, the RGB color value is ignored when a theme color is +specified, but Word writes the current RGB value of the theme color along +with the theme color name (e.g. 'accent1') when assigning a theme color; +perhaps as a convenient value for a file browser to use. The value of `.type` +must be consulted to determine whether the RGB value is operative or +a "best-guess":: + + >>> font.color.type + RGB (1) + >>> font.color.rgb + RGBColor(0x3f, 0x2c, 0x36) + +Assigning an |RGBColor| value to :attr:`ColorFormat.rgb` causes +:attr:`ColorFormat.type` to become `MSO_COLOR_TYPE.RGB`:: + + >>> font.color.type + None + >>> font.color.rgb = RGBColor(0x3f, 0x2c, 0x36) + >>> font.color.type + RGB (1) + >>> font.color.rgb + RGBColor(0x3f, 0x2c, 0x36) + +:attr:`ColorFormat.theme_color` returns a member of :ref:`MsoThemeColorIndex` +when `type` is `MSO_COLOR_TYPE.THEME`:: + + >>> font.color.type + THEME (2) + >>> font.color.theme_color + ACCENT_1 (5) + +Assigning a member of :ref:`MsoThemeColorIndex` to +:attr:`ColorFormat.theme_color` causes :attr:`ColorFormat.type` to become +`MSO_COLOR_TYPE.THEME`:: + + >>> font.color.type + RGB (1) + >>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_2 + >>> font.color.type + THEME (2) + >>> font.color.theme_color + ACCENT_2 (6) + +The :attr:`ColorFormat.brightness` attribute can be used to select a tint or +shade of a theme color. Assigning the value 0.1 produces a color 10% brighter +(a tint); assigning -0.1 produces a color 10% darker (a shade):: + + >>> font.color.type + None + >>> font.color.brightness + 0.0 + >>> font.color.brightness = 0.4 + ValueError: not a theme color + + >>> font.color.theme_color = MSO_THEME_COLOR.TEXT_1 + >>> font.color.brightness = 0.4 + >>> font.color.brightness + 0.4 + + +Specimen XML +------------ + +.. highlight:: xml + +Baseline paragraph with no font color:: + + + + Text with no color. + + + +Paragraph with directly-applied RGB color:: + + + + + + + + + + + + Directly-applied color Blue. + + + +Run with directly-applied theme color:: + + + + + + Theme color Accent 1. + + +Run with 40% tint of Text 2 theme color:: + + + + + + Theme color with 40% tint. + + +Run with 25% shade of Accent 2 theme color:: + + + + + + Theme color with 25% shade. + + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/bool-run-props.rst b/docs/dev/analysis/features/text/font.rst similarity index 65% rename from docs/dev/analysis/features/bool-run-props.rst rename to docs/dev/analysis/features/text/font.rst index d811f2e49..1682b5c76 100644 --- a/docs/dev/analysis/features/bool-run-props.rst +++ b/docs/dev/analysis/features/text/font.rst @@ -1,6 +1,61 @@ -Boolean Run properties -====================== +Font +==== + +Word supports a rich variety of character formatting. Character formatting +can be applied at various levels in the *style hierarchy*. At the lowest +level, it can be applied directly to a run of text content. Above that, it +can be applied to character, paragraph and table styles. It can also be +applied to an abstract numbering definition. At the highest levels it can be +applied via a theme or document defaults. + + +Typeface name +------------- + +Word allows multiple typefaces to be specified for character content in +a single run. This allows different Unicode character ranges such as ASCII +and Arabic to be used in a single run, each being rendered in the typeface +specified for that range. + +Up to eight distinct typefaces may be specified for a font. Four are used to +specify a typeface for a distinct code point range. These are: + +* `w:ascii` - used for the first 128 Unicode code points +* `w:cs` - used for complex script code points +* `w:eastAsia` - used for East Asian code points +* `w:hAnsi` - standing for *high ANSI*, but effectively the catch-all for any + code points not specified by one of the other three. + +The other four, `w:asciiTheme`, `w:csTheme`, `w:eastAsiaTheme`, and +`w:hAnsiTheme` are used to indirectly specify a theme-defined font. This +allows the typeface to be set centrally in the document. These four attributes +have lower precedence than the first four, so for example the value of +`w:asciiTheme` is ignored if a `w:ascii` attribute is also present. + +The typeface name used for a run is specified in the `w:rPr/w:rFonts` +element. There are 8 attributes that in combination specify the typeface to +be used. + +Protocol +~~~~~~~~ + +Initially, only the base typeface name is supported by the API, using the +:attr:`~.Font.name` property. Its value is the that of the `w:rFonts/@w:ascii` +attribute or |None| if not present. Assignment to this property sets both the +`w:ascii` and the `w:hAnsi` attribute to the assigned string or removes them +both if |None| is assigned:: + + >>> font = document.styles['Normal'].font + >>> font.name + None + >>> font.name = 'Arial' + >>> font.name + 'Arial' + + +Boolean run properties +---------------------- Character formatting that is either on or off, such as bold, italic, and small caps. Certain of these properties are *toggle properties* that may @@ -96,6 +151,55 @@ The semantics of the three values are as follows: +-------+---------------------------------------------------------------+ +Toggle properties +----------------- + +Certain of the boolean run properties are *toggle properties*. A toggle +property is one that behaves like a *toggle* at certain places in the style +hierarchy. Toggle here means that setting the property on has the effect of +reversing the prior setting rather than unconditionally setting the property +on. + +This behavior allows these properties to be overridden (turned off) in +inheriting styles. For example, consider a character style `emphasized` that +sets bold on. Another style, `strong` inherits from `emphasized`, but should +display in italic rather than bold. Setting bold off has no effect because it +is overridden by the bold in `strong` (I think). Because bold is a toggle +property, setting bold on in `emphasized` causes its value to be toggled, to +False, achieving the desired effect. See §17.7.3 for more details on toggle +properties. + +The following run properties are toggle properties: + ++----------------+------------+-------------------------------------------+ +| element | spec | name | ++================+============+===========================================+ +| `` | §17.3.2.1 | Bold | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.2 | Complex Script Bold | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.5 | Display All Characters as Capital Letters | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.13 | Embossing | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.16 | Italics | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.17 | Complex Script Italics | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.18 | Imprinting | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.23 | Display Character Outline | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.31 | Shadow | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.33 | Small Caps | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.37 | Single Strikethrough | ++----------------+------------+-------------------------------------------+ +| `` | §17.3.2.41 | Hidden Text | ++----------------+------------+-------------------------------------------+ + + Specimen XML ------------ @@ -103,7 +207,7 @@ Specimen XML :: - + @@ -113,8 +217,7 @@ Specimen XML - bold, italic, small caps, strike, size, and underline, applied in - reverse order but not to paragraph mark + bold, italic, small caps, strike, 14 pt, and underline @@ -128,16 +231,6 @@ times each. Not sure what the semantics of that would be or why one would want to do it, but something to note. Word seems to place them in the order below when it writes the file.:: - - - - - - - - - - @@ -185,10 +278,61 @@ below when it writes the file.:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -200,64 +344,60 @@ below when it writes the file.:: + + + + + -Toggle properties ------------------ - -Certain of the boolean run properties are *toggle properties*. A toggle -property is one that behaves like a *toggle* at certain places in the style -hierarchy. Toggle here means that setting the property on has the effect of -reversing the prior setting rather than unconditionally setting the property -on. - -This behavior allows these properties to be overridden (turned off) in -inheriting styles. For example, consider a character style `emphasized` that -sets bold on. Another style, `strong` inherits from `emphasized`, but should -display in italic rather than bold. Setting bold off has no effect because it -is overridden by the bold in `strong` (I think). Because bold is a toggle -property, setting bold on in `emphasized` causes its value to be toggled, to -False, achieving the desired effect. See §17.7.3 for more details on toggle -properties. - -The following run properties are toggle properties: - -+----------------+------------+-------------------------------------------+ -| element | spec | name | -+================+============+===========================================+ -| `` | §17.3.2.1 | Bold | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.2 | Complex Script Bold | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.5 | Display All Characters as Capital Letters | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.13 | Embossing | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.16 | Italics | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.17 | Complex Script Italics | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.18 | Imprinting | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.23 | Display Character Outline | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.31 | Shadow | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.33 | Small Caps | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.37 | Single Strikethrough | -+----------------+------------+-------------------------------------------+ -| `` | §17.3.2.41 | Hidden Text | -+----------------+------------+-------------------------------------------+ - + + + -Resources ---------- + + + + + + + + + + + + -* `WdBreakType Enumeration on MSDN`_ -* `Range.InsertBreak Method (Word) on MSDN`_ + + + + + + + + + + + + + + + + + + + + + + -.. _WdBreakType Enumeration on MSDN: - http://msdn.microsoft.com/en-us/library/office/ff195905.aspx + + + -.. _Range.InsertBreak Method (Word) on MSDN: - http://msdn.microsoft.com/en-us/library/office/ff835132.aspx + + + + + + + diff --git a/docs/dev/analysis/features/text/hyperlink.rst b/docs/dev/analysis/features/text/hyperlink.rst new file mode 100644 index 000000000..667d0bd6c --- /dev/null +++ b/docs/dev/analysis/features/text/hyperlink.rst @@ -0,0 +1,212 @@ + +Hyperlink +========= + +Word allows hyperlinks to be placed in the document. + +Hyperlink may link to a external location, for example, as an url. It may link to +a location within the document, for example, as a bookmark. These two cases are +handled differently. + +Hyperlinks can contain multiple runs of text. + +Candidate protocol +------------------ + +The hyperlink feature supports only external links today (03/2016). + +Add a simple hyperlink with text and url: + + >>> hyperlink = paragraph.add_hyperlink(text='Google', address='https://google.com') + >>> hyperlink + + >>> hyperlink.text + 'Google' + >>> hyperlink.address + 'https://google.com' + >>> hyperlink.runs + [] + +Add multiple runs to a hyperlink: + + >>> hyperlink = paragraph.add_hyperlink(address='https://github.com') + >>> hyperlink.add_run('A') + >>> hyperlink.add_run('formatted').italic = True + >>> hyperlink.add_run('link').bold = True + >>> hyperlink.runs + [, + , + ] + +Retrieve a paragraph's content: + + >>> paragraph = document.add_paragraph('A plain paragraph having some ') + >>> paragraph.add_run('link such as ') + >>> paragraph.add_hyperlink(address='http://github.com', text='github') + >>> paragraph.iter_p_content(): + [, + ] + + + +Specimen XML +------------ + +.. highlight:: xml + + +External links +~~~~~~~~~~~~~~ + +An external link is specified by the attribute r:id. The location of the link +is defined in the relationships part of the document. + +A simple hyperlink to an external url:: + + + + This is an external link to + + + + + + + Google + + + + + +The r:id="rId4" references the following relationship within the relationships +part for the document document.xml.rels.:: + + + + + +A hyperlink with multiple runs of text:: + + + + + + + + A + + + + + + + formatted + + + + + + + link + + + + + +Internal links +~~~~~~~~~~~~~~ + +An internal link, that link to a location in the document, do not have the r:id attribute +and is specified by the anchor attribute. +The value of the anchor attribute is the name of a bookmark in the document. + +Example:: + + + + This is an + + + + + + + internal link + + + + + ... + + + + This is text with a + + + + bookmark + + + + + +Schema excerpt +-------------- + +.. highlight:: xml + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/text/index.rst b/docs/dev/analysis/features/text/index.rst new file mode 100644 index 000000000..0275dcbbc --- /dev/null +++ b/docs/dev/analysis/features/text/index.rst @@ -0,0 +1,14 @@ + +Text +==== + +.. toctree:: + :titlesonly: + + paragraph-format + font + font-color + underline + run-content + hyperlink + breaks diff --git a/docs/dev/analysis/features/text/paragraph-format.rst b/docs/dev/analysis/features/text/paragraph-format.rst new file mode 100644 index 000000000..febc9300a --- /dev/null +++ b/docs/dev/analysis/features/text/paragraph-format.rst @@ -0,0 +1,473 @@ + +Paragraph formatting +==================== + +WordprocessingML supports a variety of paragraph formatting attributes to +control layout characteristics such as justification, indentation, line +spacing, space before and after, and widow/orphan control. + + +Alignment (justification) +------------------------- + +In Word, each paragraph has an *alignment* attribute that specifies how to +justify the lines of the paragraph when the paragraph is laid out on the +page. Common values are left, right, centered, and justified. + +Protocol +~~~~~~~~ + +Getting and setting paragraph alignment:: + + >>> paragraph = body.add_paragraph() + >>> paragraph.alignment + None + >>> paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT + >>> paragraph.alignment + RIGHT (2) + >>> paragraph.alignment = None + >>> paragraph.alignment + None + +XML Semantics +~~~~~~~~~~~~~ + +If the ```` element is not present on a paragraph, the alignment value +for that paragraph is inherited from its style hierarchy. If the element is +present, its value overrides any inherited value. From the API, a value of +|None| on the ``Paragraph.alignment`` property corresponds to no ```` +element being present. If |None| is assigned to ``Paragraph.alignment``, the +```` element is removed. + + +Paragraph spacing +----------------- + +Spacing between subsequent paragraphs is controlled by the paragraph spacing +attributes. Spacing can be applied either before the paragraph, after it, or +both. The concept is similar to that of *padding* or *margin* in CSS. +WordprocessingML supports paragraph spacing specified as either a length +value or as a multiple of the line height; however only a length value is +supported via the Word UI. Inter-paragraph spacing "overlaps", such that the +rendered spacing between two paragraphs is the maximum of the space after the +first paragraph and the space before the second. + +Protocol +~~~~~~~~ + +Getting and setting paragraph spacing:: + + >>> paragraph_format = document.styles['Normal'].paragraph_format + >>> paragraph_format.space_before + None + >>> paragraph_format.space_before = Pt(12) + >>> paragraph_format.space_before.pt + 12.0 + +XML Semantics +~~~~~~~~~~~~~ + +* Paragraph spacing is specified using the `w:pPr/w:spacing` element, which + also controls line spacing. Spacing is specified in twips. +* If the `w:spacing` element is not present, paragraph spacing is inherited + from the style hierarchy. +* If not present in the style hierarchy, the paragraph will have no spacing. +* If the `w:spacing` element is present but the specific attribute (e.g. + `w:before`) is not, its value is inherited. + +Specimen XML +~~~~~~~~~~~~ + +.. highlight:: xml + +12 pt space before, 0 after:: + + + + + + +Line spacing +------------ + +Line spacing can be specified either as a specific length or as a multiple of +the line height (font size). Line spacing is specified by the combination of +values in `w:spacing/@w:line` and `w:spacing/@w:lineRule`. The +:attr:`.ParagraphFormat.line_spacing` property determines which method to use +based on whether the assigned value is an instance of |Length|. + +Protocol +~~~~~~~~ + +.. highlight:: python + +Getting and setting line spacing:: + + >>> paragraph_format.line_spacing, paragraph_format.line_spacing_rule + (None, None) + + >>> paragraph_format.line_spacing = Pt(18) + >>> paragraph_format.line_spacing, paragraph_format.line_spacing_rule + (228600, WD_LINE_SPACING.EXACTLY (4)) + + >>> paragraph_format.line_spacing = 1 + >>> paragraph_format.line_spacing, paragraph_format.line_spacing_rule + (152400, WD_LINE_SPACING.SINGLE (0)) + + >>> paragraph_format.line_spacing = 0.9 + >>> paragraph_format.line_spacing, paragraph_format.line_spacing_rule + (137160, WD_LINE_SPACING.MULTIPLE (5)) + +XML Semantics +~~~~~~~~~~~~~ + +* Line spacing is specified by the combination of the values in + `w:spacing/@w:line` and `w:spacing/@w:lineRule`. +* `w:spacing/@w:line` is specified in twips. If `@w:lineRule` is 'auto' (or + missing), `@w:line` is interpreted as 240ths of a line. For all other + values of `@w:lineRule`, the value of `@w:line` is interpreted as + a specific length in twips. +* If the `w:spacing` element is not present, line spacing is inherited. +* If `@w:line` is not present, line spacing is inherited. +* If not present, `@w:lineRule` defaults to 'auto'. +* If not present in the style hierarchy, line spacing defaults to single + spaced. +* The 'atLeast' value for `@w:lineRule` indicates the line spacing will be + `@w:line` twips or single spaced, whichever is greater. + +Specimen XML +~~~~~~~~~~~~ + +.. highlight:: xml + +14 points:: + + + + + +double-spaced:: + + + + + + +Indentation +----------- + +Paragraph indentation is specified using the `w:pPr/w:ind` element. Left, +right, first line, and hanging indent can be specified. Indentation can be +specified as a length or in hundredths of a character width. Only length is +supported by |docx|. Both first line indent and hanging indent are specified +using the :attr:`.ParagraphFormat.first_line_indent` property. Assigning +a positive value produces an indented first line. A negative value produces +a hanging indent. + +Protocol +~~~~~~~~ + +.. highlight:: python + +Getting and setting indentation:: + + >>> paragraph_format.left_indent + None + >>> paragraph_format.right_indent + None + >>> paragraph_format.first_line_indent + None + + >>> paragraph_format.left_indent = Pt(36) + >>> paragraph_format.left_indent.pt + 36.0 + + >>> paragraph_format.right_indent = Inches(0.25) + >>> paragraph_format.right_indent.pt + 18.0 + + >>> paragraph_format.first_line_indent = Pt(-18) + >>> paragraph_format.first_line_indent.pt + -18.0 + +XML Semantics +~~~~~~~~~~~~~ + +* Indentation is specified by `w:ind/@w:start`, `w:ind/@w:end`, + `w:ind/@w:firstLine`, and `w:ind/@w:hanging`. + +* `w:firstLine` and `w:hanging` are mutually exclusive, if both are + specified, `w:firstLine` is ignored. + +* All four attributes are specified in twips. + +* `w:start` controls left indent for a left-to-right paragraph or right + indent for a right-to-left paragraph. `w:end` controls the other side. If + mirrorIndents is specified, `w:start` controls the inside margin and + `w:end` the outside. Negative values are permitted and cause the text to + move past the text margin. + +* If `w:ind` is not present, indentation is inherited. + +* Any omitted attributes are inherited. + +* If not present in the style hierarchy, indentation values default to zero. + +Specimen XML +~~~~~~~~~~~~ + +.. highlight:: xml + +1 inch left, 0.5 inch (additional) first line, 0.5 inch right:: + + + + + +0.5 inch left, 0.5 inch hanging indent:: + + + + + + +Page placement +-------------- + +There are a handful of page placement properties that control such things as +keeping the lines of a paragraph together on the same page, keeing +a paragraph (such as a heading) on the same page as the subsequent paragraph, +and placing the paragraph at the top of a new page. Each of these are +tri-state boolean properties where |None| indicates "inherit". + +Protocol +~~~~~~~~ + +.. highlight:: python + +Getting and setting indentation:: + + >>> paragraph_format.keep_with_next + None + >>> paragraph_format.keep_together + None + >>> paragraph_format.page_break_before + None + >>> paragraph_format.widow_control + None + + >>> paragraph_format.keep_with_next = True + >>> paragraph_format.keep_with_next + True + + >>> paragraph_format.keep_together = False + >>> paragraph_format.keep_together + False + + >>> paragraph_format.page_break_before = True + >>> paragraph_format.widow_control = None + + +XML Semantics +~~~~~~~~~~~~~ + +* All four elements have "On/Off" semantics. + +* If not present, their value is inherited. + +* If not present in the style hierarchy, values default to False. + +Specimen XML +~~~~~~~~~~~~ + +.. highlight:: xml + +keep with next, keep together, no page break before, and widow/orphan +control:: + + + + + + + + + +Enumerations +------------ + +* :ref:`WdLineSpacing` +* :ref:`WdParagraphAlignment` + + +Specimen XML +------------ + +.. highlight:: xml + +A paragraph with inherited alignment:: + + + + Inherited paragraph alignment. + + + +A right-aligned paragraph:: + + + + + + + Right-aligned paragraph. + + + + + +Schema excerpt +-------------- + +:: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/dev/analysis/features/run-content.rst b/docs/dev/analysis/features/text/run-content.rst similarity index 100% rename from docs/dev/analysis/features/run-content.rst rename to docs/dev/analysis/features/text/run-content.rst diff --git a/docs/dev/analysis/features/underline.rst b/docs/dev/analysis/features/text/underline.rst similarity index 99% rename from docs/dev/analysis/features/underline.rst rename to docs/dev/analysis/features/text/underline.rst index 6cad0c25e..4ed1b7652 100644 --- a/docs/dev/analysis/features/underline.rst +++ b/docs/dev/analysis/features/text/underline.rst @@ -1,6 +1,6 @@ -Run underline -============= +Underline +========= Text in a Word document can be underlined in a variety of styles. diff --git a/docs/dev/analysis/index.rst b/docs/dev/analysis/index.rst index 7e4d7589e..07460ef88 100644 --- a/docs/dev/analysis/index.rst +++ b/docs/dev/analysis/index.rst @@ -8,23 +8,21 @@ Feature Analysis ---------------- .. toctree:: - :maxdepth: 1 + :titlesonly: + features/text/index + features/styles/index + features/coreprops + features/cell-merge features/table features/table-props features/table-cell - features/par-alignment - features/run-content features/numbering - features/underline - features/char-style - features/breaks features/sections features/shapes features/shapes-inline features/shapes-inline-size features/picture - features/bool-run-props Schema Analysis @@ -39,5 +37,3 @@ ISO/IEC 29500 spec. schema/ct_document schema/ct_body schema/ct_p - schema/ct_ppr - schema/ct_styles diff --git a/docs/dev/analysis/schema/ct_ppr.rst b/docs/dev/analysis/schema/ct_ppr.rst deleted file mode 100644 index a872dcca9..000000000 --- a/docs/dev/analysis/schema/ct_ppr.rst +++ /dev/null @@ -1,189 +0,0 @@ -########## -``CT_PPr`` -########## - -.. highlight:: xml - -.. csv-table:: - :header-rows: 0 - :stub-columns: 1 - :widths: 15, 50 - - Schema Name , CT_PPr - Spec Name , Paragraph Properties - Tag(s) , w:pPr - Namespace , wordprocessingml (wml.xsd) - Spec Section , 17.3.1.26 - - -Analysis -======== - - - -attributes -^^^^^^^^^^ - -None. - - -child elements -^^^^^^^^^^^^^^ - -========= === ================ -name # type -========= === ================ -xyz ? CT_abc -abc ? CT_TextListStyle -p ? CT_TextParagraph -========= === ================ - - -Spec text -^^^^^^^^^ - - This element specifies a set of paragraph properties which shall be applied - to the contents of the parent paragraph after all style/numbering/table - properties have been applied to the text. These properties are defined as - direct formatting, since they are directly applied to the paragraph and - supersede any formatting from styles. - - Consider a paragraph which should have a set of paragraph formatting - properties. This set of properties is specified in the paragraph properties - as follows:: - - - - - - - - - - - - The pPr element specifies the properties which are applied to the current - paragraph - in this case, a bottom paragraph border using the bottom - element (§17.3.1.7), spacing after the paragraph using the spacing element - (§17.3.1.33), and that spacing should be ignored for paragraphs above/below - of the same style using the contextualSpacing element (§17.3.1.9). - - -Schema excerpt -^^^^^^^^^^^^^^ - -:: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/dev/analysis/schema/ct_styles.rst b/docs/dev/analysis/schema/ct_styles.rst deleted file mode 100644 index 1977acc8a..000000000 --- a/docs/dev/analysis/schema/ct_styles.rst +++ /dev/null @@ -1,120 +0,0 @@ - -``CT_Styles`` -============= - -.. highlight:: xml - -.. csv-table:: - :header-rows: 0 - :stub-columns: 1 - :widths: 15, 50 - - Schema Name, CT_Styles - Spec Name, Styles - Tag(s), w:styles - Namespace, wordprocessingml (wml.xsd) - Spec Section, 17.7.4.18 - - -Analysis --------- - -Only styles with an explicit ```` definition affect the formatting -of paragraphs that are assigned that style. - -Word includes behavior definitions (```` elements) for the -"latent" styles that are built in to the Word client. These are present in a -new document created from install defaults. - -Word does not add a formatting definition (```` element) for a -built-in style until it is used. - -Once present in ``styles.xml``, Word does not remove a style element when it -is no longer used by any paragraphs. The definition of each of the styles -ever used in a document are accumulated in ``styles.xml``. - - -Spec text ---------- - - This element specifies all of the style information stored in the - WordprocessingML document: style definitions as well as latent style - information. - - Example: The Normal paragraph style in a word processing document can have - any number of formatting properties, e.g. font face = Times New Roman; font - size = 12pt; paragraph justification = left. All paragraphs which reference - this paragraph style would automatically inherit these properties. - - -Schema excerpt --------------- - -:: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/index.rst b/docs/index.rst index a79fa9644..eb922f510 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -68,11 +68,12 @@ User Guide user/install user/quickstart user/documents + user/text user/sections user/api-concepts - user/styles + user/styles-understanding + user/styles-using user/shapes - user/text API Documentation @@ -82,10 +83,12 @@ API Documentation :maxdepth: 2 api/document - api/table + api/style api/text + api/table api/section api/shape + api/dml api/shared api/enum/index diff --git a/docs/user/quickstart.rst b/docs/user/quickstart.rst index 01c5f2729..1c6f419ab 100644 --- a/docs/user/quickstart.rst +++ b/docs/user/quickstart.rst @@ -308,7 +308,9 @@ settings, Word has *character styles* which specify a group of run-level settings. In general you can think of a character style as specifying a font, including its typeface, size, color, bold, italic, etc. -Like paragraph styles, a character style must already be defined in the document you open with the ``Document()`` call (*see* :doc:`styles`). +Like paragraph styles, a character style must already be defined in the +document you open with the ``Document()`` call (*see* +:ref:`understanding_styles`). A character style can be specified when adding a new run:: diff --git a/docs/user/styles-understanding.rst b/docs/user/styles-understanding.rst new file mode 100644 index 000000000..e49fdea83 --- /dev/null +++ b/docs/user/styles-understanding.rst @@ -0,0 +1,382 @@ +.. _understanding_styles: + +Understanding Styles +==================== + +**Grasshopper:** + *"Master, why doesn't my paragraph appear with the style I specified?"* + +**Master:** + *"You have come to the right page Grasshopper; read on ..."* + + +What is a style in Word? +------------------------ + +Documents communicate better when like elements are formatted consistently. To +achieve that consistency, professional document designers develop a *style +sheet* which defines the document element types and specifies how each should +be formatted. For example, perhaps body paragraphs are to be set in 9 pt Times +Roman with a line height of 11 pt, justified flush left, ragged right. When +these specifications are applied to each of the elements of the document, +a consistent and polished look is achieved. + +A style in Word is such a set of specifications that may be applied, all at +once, to a document element. Word has paragraph styles, character styles, table +styles, and numbering definitions. These are applied to a paragraph, a span of +text, a table, and a list, respectively. + +Experienced programmers will recognize styles as a level of indirection. The +great thing about those is it allows you to define something once, then apply +that definition many times. This saves the work of defining the same thing +over an over; but more importantly it allows you to change the definition and +have that change reflected in all the places you have applied it. + + +Why doesn't the style I applied show up? +---------------------------------------- + +This is likely to show up quite a bit until I can add some fancier features to +work around it, so here it is up top. + +#. When you're working in Word, there are all these styles you can apply to + things, pretty good looking ones that look all the better because you don't + have to make them yourself. Most folks never look further than the built-in + styles. + +#. Although those styles show up in the UI, they're not actually in the + document you're creating, at least not until you use it for the first time. + That's kind of a good thing. They take up room and there's a lot of them. + The file would get a little bloated if it contained all the style + definitions you could use but haven't. + +#. If you apply a style using |docx| that's not defined in your file (in the + styles.xml part if you're curious), Word just ignores it. It doesn't + complain, it just doesn't change how things are formatted. I'm sure + there's a good reason for this. But it can present as a bit of a puzzle if + you don't understand how Word works that way. + +#. When you use a style, Word adds it to the file. Once there, it stays. + I imagine there's a way to get rid of it, but you have to work at it. If + you apply a style, delete the content you applied it to, and then save the + document; the style definition stays in the saved file. + +All this adds up to the following: If you want to use a style in a document you +create with |docx|, the document you start with must contain the style +definition. Otherwise it just won't work. It won't raise an exception, it just +won't work. + +If you use the "default" template document, it contains the styles listed +below, most of the ones you're likely to want if you're not designing your own. +If you're using your own starting document, you need to use each of the styles +you want at least once in it. You don't have to keep the content, but you need +to apply the style to something at least once before saving the document. +Creating a one-word paragraph, applying five styles to it in succession and +then deleting the paragraph works fine. That's how I got the ones below into +the default template :). + + +Glossary +-------- + +style definition + A ```` element in the styles part of a document that explicitly + defines the attributes of a style. + +defined style + A style that is explicitly defined in a document. Contrast with *latent + style*. + +built-in style + One of the set of 276 pre-set styles built into Word, such as "Heading + 1". A built-in style can be either defined or latent. A built-in style + that is not yet defined is known as a *latent style*. Both defined and + latent built-in styles may appear as options in Word's style panel and + style gallery. + +custom style + Also known as a *user defined style*, any style defined in a Word + document that is not a built-in style. Note that a custom style cannot be + a latent style. + +latent style + A built-in style having no definition in a particular document is known + as a *latent style* in that document. A latent style can appear as an + option in the Word UI depending on the settings in the |LatentStyles| + object for the document. + +recommended style list + A list of styles that appears in the styles toolbox or panel when + "Recommended" is selected from the "List:" dropdown box. + +Style Gallery + The selection of example styles that appear in the ribbon of the Word UI + and which may be applied by clicking on one of them. + + +Identifying a style +------------------- + +A style has three identifying properties, `name`, `style_id`, and `type`. + +Each style's :attr:`name` property is its stable, unique identifier for +access purposes. + +A style's :attr:`style_id` is used internally to key a content object such as +a paragraph to its style. However this value is generated automatically by +Word and is not guaranteed to be stable across saves. In general, the style +id is formed simply by removing spaces from the *localized* style name, +however there are exceptions. Users of |docx| should generally avoid using +the style id unless they are confident with the internals involved. + +A style's :attr:`type` is set at creation time and cannot be changed. + + +.. _builtin_styles: + +Built-in styles +--------------- + +Word comes with almost 300 so-called *built-in* styles like `Normal`, +`Heading 1`, and `List Bullet`. Style definitions are stored in the +`styles.xml` part of a .docx package, but built-in style definitions are +stored in the Word application itself and are not written to `styles.xml` +until they are actually used. This is a sensible strategy because they take +up considerable room and would be largely redundant and useless overhead in +every .docx file otherwise. + +The fact that built-in styles are not written to the .docx package until used +gives rise to the need for *latent style* definitions, explained below. + + +.. _style_behavior: + +Style Behavior +-------------- + +In addition to collecting a set of formatting properties, a style has five +properties that specify its *behavior*. This behavior is relatively simple, +basically amounting to when and where the style appears in the Word or +LibreOffice UI. + +The key notion to understanding style behavior is the recommended list. In +the style pane in Word, the user can select which list of styles they want to +see. One of these is named *Recommended* and is known as the *recommended +list*. All five behavior properties affect some aspect of the style’s +appearance in this list and in the style gallery. + +In brief, a style appears in the recommended list if its :attr:`hidden` +property is |False| (the default). If a style is not hidden and its +:attr:`quick_style` property is |True|, it also appears in the style gallery. +If a hidden style's :attr:`unhide_when_used` property is |True|, its hidden +property is set |False| the first time it is used. Styles in the style lists +and style gallery are sorted in :attr:`priority` order, then alphabetically +for styles of the same priority. If a style's :attr:`locked` property is +|True| and formatting restrictions are turned on for the document, the style +will not appear in any list or the style gallery and cannot be applied to +content. + + +.. _latent_styles: + +Latent styles +------------- + +The need to specify the UI behavior of built-in styles not defined in +`styles.xml` gives rise to the need for *latent style* definitions. A latent +style definition is basically a stub style definition that has at most the +five behavior attributes in addition to the style name. Additional space is +saved by defining defaults for each of the behavior attributes, so only those +that differ from the default need be defined and styles that match all +defaults need no latent style definition. + +Latent style definitions are specified using the `w:latentStyles` and +`w:lsdException` elements appearing in `styles.xml`. + +A latent style definition is only required for a built-in style because only +a built-in style can appear in the UI without a style definition in +`styles.xml`. + + +Style inheritance +----------------- + +A style can inherit properties from another style, somewhat similarly to how +Cascading Style Sheets (CSS) works. Inheritance is specified using the +:attr:`~.BaseStyle.base_style` attribute. By basing one style on another, an +inheritance hierarchy of arbitrary depth can be formed. A style having no +base style inherits properties from the document defaults. + + +Paragraph styles in default template +------------------------------------ + +* Normal +* Body Text +* Body Text 2 +* Body Text 3 +* Caption +* Heading 1 +* Heading 2 +* Heading 3 +* Heading 4 +* Heading 5 +* Heading 6 +* Heading 7 +* Heading 8 +* Heading 9 +* Intense Quote +* List +* List 2 +* List 3 +* List Bullet +* List Bullet 2 +* List Bullet 3 +* List Continue +* List Continue 2 +* List Continue 3 +* List Number +* List Number 2 +* List Number 3 +* List Paragraph +* Macro Text +* No Spacing +* Quote +* Subtitle +* TOCHeading +* Title + + +Character styles in default template +------------------------------------ + +* Body Text Char +* Body Text 2 Char +* Body Text 3 Char +* Book Title +* Default Paragraph Font +* Emphasis +* Heading 1 Char +* Heading 2 Char +* Heading 3 Char +* Heading 4 Char +* Heading 5 Char +* Heading 6 Char +* Heading 7 Char +* Heading 8 Char +* Heading 9 Char +* Intense Emphasis +* Intense Quote Char +* Intense Reference +* Macro Text Char +* Quote Char +* Strong +* Subtitle Char +* Subtle Emphasis +* Subtle Reference +* Title Char + + +Table styles in default template +-------------------------------- + +* Table Normal +* Colorful Grid +* Colorful Grid Accent 1 +* Colorful Grid Accent 2 +* Colorful Grid Accent 3 +* Colorful Grid Accent 4 +* Colorful Grid Accent 5 +* Colorful Grid Accent 6 +* Colorful List +* Colorful List Accent 1 +* Colorful List Accent 2 +* Colorful List Accent 3 +* Colorful List Accent 4 +* Colorful List Accent 5 +* Colorful List Accent 6 +* Colorful Shading +* Colorful Shading Accent 1 +* Colorful Shading Accent 2 +* Colorful Shading Accent 3 +* Colorful Shading Accent 4 +* Colorful Shading Accent 5 +* Colorful Shading Accent 6 +* Dark List +* Dark List Accent 1 +* Dark List Accent 2 +* Dark List Accent 3 +* Dark List Accent 4 +* Dark List Accent 5 +* Dark List Accent 6 +* Light Grid +* Light Grid Accent 1 +* Light Grid Accent 2 +* Light Grid Accent 3 +* Light Grid Accent 4 +* Light Grid Accent 5 +* Light Grid Accent 6 +* Light List +* Light List Accent 1 +* Light List Accent 2 +* Light List Accent 3 +* Light List Accent 4 +* Light List Accent 5 +* Light List Accent 6 +* Light Shading +* Light Shading Accent 1 +* Light Shading Accent 2 +* Light Shading Accent 3 +* Light Shading Accent 4 +* Light Shading Accent 5 +* Light Shading Accent 6 +* Medium Grid 1 +* Medium Grid 1 Accent 1 +* Medium Grid 1 Accent 2 +* Medium Grid 1 Accent 3 +* Medium Grid 1 Accent 4 +* Medium Grid 1 Accent 5 +* Medium Grid 1 Accent 6 +* Medium Grid 2 +* Medium Grid 2 Accent 1 +* Medium Grid 2 Accent 2 +* Medium Grid 2 Accent 3 +* Medium Grid 2 Accent 4 +* Medium Grid 2 Accent 5 +* Medium Grid 2 Accent 6 +* Medium Grid 3 +* Medium Grid 3 Accent 1 +* Medium Grid 3 Accent 2 +* Medium Grid 3 Accent 3 +* Medium Grid 3 Accent 4 +* Medium Grid 3 Accent 5 +* Medium Grid 3 Accent 6 +* Medium List 1 +* Medium List 1 Accent 1 +* Medium List 1 Accent 2 +* Medium List 1 Accent 3 +* Medium List 1 Accent 4 +* Medium List 1 Accent 5 +* Medium List 1 Accent 6 +* Medium List 2 +* Medium List 2 Accent 1 +* Medium List 2 Accent 2 +* Medium List 2 Accent 3 +* Medium List 2 Accent 4 +* Medium List 2 Accent 5 +* Medium List 2 Accent 6 +* Medium Shading 1 +* Medium Shading 1 Accent 1 +* Medium Shading 1 Accent 2 +* Medium Shading 1 Accent 3 +* Medium Shading 1 Accent 4 +* Medium Shading 1 Accent 5 +* Medium Shading 1 Accent 6 +* Medium Shading 2 +* Medium Shading 2 Accent 1 +* Medium Shading 2 Accent 2 +* Medium Shading 2 Accent 3 +* Medium Shading 2 Accent 4 +* Medium Shading 2 Accent 5 +* Medium Shading 2 Accent 6 +* Table Grid diff --git a/docs/user/styles-using.rst b/docs/user/styles-using.rst new file mode 100644 index 000000000..93dd7a344 --- /dev/null +++ b/docs/user/styles-using.rst @@ -0,0 +1,391 @@ + +Working with Styles +=================== + +This page uses concepts developed in the prior page without introduction. If +a term is unfamiliar, consult the prior page :ref:`understanding_styles` for +a definition. + + +Access a style +-------------- + +Styles are accessed using the :attr:`.Document.styles` attribute:: + + >>> document = Document() + >>> styles = document.styles + >>> styles + + +The |Styles| object provides dictionary-style access to defined styles by +name:: + + >>> styles['Normal'] + + +.. note:: Built-in styles are stored in a WordprocessingML file using their + English name, e.g. 'Heading 1', even though users working on a localized + version of Word will see native language names in the UI, e.g. 'Kop 1'. + Because |docx| operates on the WordprocessingML file, style lookups must + use the English name. A document available on this external site allows + you to create a mapping between local language names and English style + names: + http://www.thedoctools.com/index.php?show=mt_create_style_name_list + + User-defined styles, also known as *custom styles*, are not localized and + are accessed with the name exactly as it appears in the Word UI. + +The |Styles| object is also iterable. By using the identification properties +on |BaseStyle|, various subsets of the defined styles can be generated. For +example, this code will produce a list of the defined paragraph styles:: + + >>> from docx.enum.style import WD_STYLE_TYPE + >>> styles = document.styles + >>> paragraph_styles = [ + ... s for s in styles if s.type == WD_STYLE_TYPE.PARAGRAPH + ... ] + >>> for style in paragraph_styles: + ... print(style.name) + ... + Normal + Body Text + List Bullet + + +Apply a style +------------- + +The |Paragraph|, |Run|, and |Table| objects each have a :attr:`style` +attribute. Assigning a style object to this attribute applies that style:: + + >>> document = Document() + >>> paragraph = document.add_paragraph() + >>> paragraph.style + + >>> paragraph.style.name + 'Normal' + >>> paragraph.style = document.styles['Heading 1'] + >>> paragraph.style.name + 'Heading 1' + +A style name can also be assigned directly, in which case |docx| will do the +lookup for you:: + + >>> paragraph.style = 'List Bullet' + >>> paragraph.style + + >>> paragraph.style.name + 'List Bullet' + +A style can also be applied at creation time using either the style object or +its name:: + + >>> paragraph = document.add_paragraph(style='Body Text') + >>> paragraph.style.name + 'Body Text' + >>> body_text_style = document.styles['Body Text'] + >>> paragraph = document.add_paragraph(style=body_text_style) + >>> paragraph.style.name + 'Body Text' + + +Add or delete a style +--------------------- + +A new style can be added to the document by specifying a unique name and +a style type:: + + >>> from docx.enum.style import WD_STYLE_TYPE + >>> styles = document.styles + >>> style = styles.add_style('Citation', WD_STYLE_TYPE.PARAGRAPH) + >>> style.name + 'Citation' + >>> style.type + PARAGRAPH (1) + +Use the :attr:`~.BaseStyle.base_style` property to specify a style the new +style should inherit formatting settings from:: + + >>> style.base_style + None + >>> style.base_style = styles['Normal'] + >>> style.base_style + + >>> style.base_style.name + 'Normal' + +A style can be removed from the document simply by calling its +:meth:`~.BaseStyle.delete` method:: + + >>> styles = document.styles + >>> len(styles) + 10 + >>> styles['Citation'].delete() + >>> len(styles) + 9 + +.. note:: The :meth:`.Style.delete` method removes the style's definition + from the document. It does not affect content in the document to which + that style is applied. Content having a style not defined in the document + is rendered using the default style for that content object, e.g. + 'Normal' in the case of a paragraph. + + +Define character formatting +--------------------------- + +Character, paragraph, and table styles can all specify character formatting +to be applied to content with that style. All the character formatting that +can be applied directly to text can be specified in a style. Examples include +font typeface and size, bold, italic, and underline. + +Each of these three style types have a :attr:`~._CharacterStyle.font` +attribute providing access to a |Font| object. A style's |Font| object +provides properties for getting and setting the character formatting for that +style. + +Several examples are provided here. For a complete set of the available +properties, see the |Font| API documentation. + +The font for a style can be accessed like this:: + + >>> from docx import Document + >>> document = Document() + >>> style = document.styles['Normal'] + >>> font = style.font + +Typeface and size are set like this:: + + >>> from docx.shared import Pt + >>> font.name = 'Calibri' + >>> font.size = Pt(12) + +Many font properties are *tri-state*, meaning they can take the values +|True|, |False|, and |None|. |True| means the property is "on", |False| means +it is "off". Conceptually, the |None| value means "inherit". Because a style +exists in an inheritance hierarchy, it is important to have the ability to +specify a property at the right place in the hierarchy, generally as far up +the hierarchy as possible. For example, if all headings should be in the +Arial typeface, it makes more sense to set that property on the `Heading 1` +style and have `Heading 2` inherit from `Heading 1`. + +Bold and italic are tri-state properties, as are all-caps, strikethrough, +superscript, and many others. See the |Font| API documentation for a full +list:: + + >>> font.bold, font.italic + (None, None) + >>> font.italic = True + >>> font.italic + True + >>> font.italic = False + >>> font.italic + False + >>> font.italic = None + >>> font.italic + None + +Underline is a bit of a special case. It is a hybrid of a tri-state property +and an enumerated value property. |True| means single underline, by far the +most common. |False| means no underline, but more often |None| is the right +choice if no underlining is wanted since it is rare to inherit it from a base +style. The other forms of underlining, such as double or dashed, are +specified with a member of the :ref:`WdUnderline` enumeration:: + + >>> font.underline + None + >>> font.underline = True + >>> # or perhaps + >>> font.underline = WD_UNDERLINE.DOT_DASH + + +Define paragraph formatting +--------------------------- + +Both a paragraph style and a table style allow paragraph formatting to be +specified. These styles provide access to a |ParagraphFormat| object via +their :attr:`~._ParagraphStyle.paragraph_format` property. + +Paragraph formatting includes layout behaviors such as justification, +indentation, space before and after, page break before, and widow/orphan +control. For a complete list of the available properties, consult the API +documentation page for the |ParagraphFormat| object. + +Here's an example of how you would create a paragraph style having hanging +indentation of 1/4 inch, 12 points spacing above, and widow/orphan control:: + + >>> from docx.enum.style import WD_STYLE_TYPE + >>> from docx.shared import Inches, Pt + >>> document = Document() + >>> style = document.styles.add_style('Indent', WD_STYLE_TYPE.PARAGRAPH) + >>> paragraph_format = style.paragraph_format + >>> paragraph_format.left_indent = Inches(0.25) + >>> paragraph_format.first_line_indent = Inches(-0.25) + >>> paragraph_format.space_before = Pt(12) + >>> paragraph_format.widow_control = True + + +Use paragraph-specific style properties +--------------------------------------- + +A paragraph style has a :attr:`~._ParagraphStyle.next_paragraph_style` +property that specifies the style to be applied to new paragraphs inserted +after a paragraph of that style. This is most useful when the style would +normally appear only once in a sequence, such as a heading. In that case, the +paragraph style can automatically be set back to a body style after +completing the heading. + +In the most common case (body paragraphs), subsequent paragraphs should +receive the same style as the current paragraph. The default handles this +case well by applying the same style if a next paragraph style is not +specified. + +Here's an example of how you would change the next paragraph style of the +*Heading 1* style to *Body Text*:: + + >>> from docx import Document + >>> document = Document() + >>> styles = document.styles + + >>> styles['Heading 1'].next_paragraph_style = styles['Body Text'] + +The default behavior can be restored by assigning |None| or the style itself:: + + >>> heading_1_style = styles['Heading 1'] + >>> heading_1_style.next_paragraph_style.name + 'Body Text' + + >>> heading_1_style.next_paragraph_style = heading_1_style + >>> heading_1_style.next_paragraph_style.name + 'Heading 1' + + >>> heading_1_style.next_paragraph_style = None + >>> heading_1_style.next_paragraph_style.name + 'Heading 1' + + +Control how a style appears in the Word UI +------------------------------------------ + +The properties of a style fall into two categories, *behavioral properties* +and *formatting properties*. Its behavioral properties control when and where +the style appears in the Word UI. Its formatting properties determine the +formatting of content to which the style is applied, such as the size of the +font and its paragraph indentation. + +There are five behavioral properties of a style: + +* :attr:`~.BaseStyle.hidden` +* :attr:`~.BaseStyle.unhide_when_used` +* :attr:`~.BaseStyle.priority` +* :attr:`~.BaseStyle.quick_style` +* :attr:`~.BaseStyle.locked` + +See the :ref:`style_behavior` section in :ref:`understanding_styles` for +a description of how these behavioral properties interact to determine when +and where a style appears in the Word UI. + +The :attr:`priority` property takes an integer value. The other four style +behavior properties are *tri-state*, meaning they can take the value |True| +(on), |False| (off), or |None| (inherit). + +Display a style in the style gallery +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following code will cause the 'Body Text' paragraph style to appear first +in the style gallery:: + + >>> from docx import Document + >>> document = Document() + >>> style = document.styles['Body Text'] + + >>> style.hidden = False + >>> style.quick_style = True + >>> style.priorty = 1 + +Remove a style from the style gallery +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This code will remove the 'Normal' paragraph style from the style gallery, +but allow it to remain in the recommended list:: + + >>> style = document.styles['Normal'] + + >>> style.hidden = False + >>> style.quick_style = False + + +Working with Latent Styles +-------------------------- + +See the :ref:`builtin_styles` and :ref:`latent_styles` sections in +:ref:`understanding_styles` for a description of how latent styles define the +behavioral properties of built-in styles that are not yet defined in the +`styles.xml` part of a .docx file. + +Access the latent styles in a document +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The latent styles in a document are accessed from the styles object:: + + >>> document = Document() + >>> latent_styles = document.styles.latent_styles + +A |LatentStyles| object supports :meth:`len`, iteration, and dictionary-style +access by style name:: + + >>> len(latent_styles) + 161 + + >>> latent_style_names = [ls.name for ls in latent_styles] + >>> latent_style_names + ['Normal', 'Heading 1', 'Heading 2', ... 'TOC Heading'] + + >>> latent_quote = latent_styles['Quote'] + >>> latent_quote + + >>> latent_quote.priority + 29 + +Change latent style defaults +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The |LatentStyles| object also provides access to the default behavioral +properties for built-in styles in the current document. These defaults +provide the value for any undefined attributes of the |_LatentStyle| +definitions and to all behavioral properties of built-in styles having no +explicit latent style definition. See the API documentation for the +|LatentStyles| object for the complete set of available properties:: + + >>> latent_styles.default_to_locked + False + >>> latent_styles.default_to_locked = True + >>> latent_styles.default_to_locked + True + +Add a latent style definition +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A new latent style can be added using the +:meth:`~.LatentStyles.add_latent_style` method on |LatentStyles|. This code +adds a new latent style for the builtin style 'List Bullet', setting it to +appear in the style gallery:: + + >>> latent_style = latent_styles['List Bullet'] + KeyError: no latent style with name 'List Bullet' + >>> latent_style = latent_styles.add_latent_style('List Bullet') + >>> latent_style.hidden = False + >>> latent_style.priority = 2 + >>> latent_style.quick_style = True + +Delete a latent style definition +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A latent style definition can be deleted by calling its +:meth:`~.LatentStyle.delete` method:: + + >>> latent_styles['Light Grid'] + + >>> latent_styles['Light Grid'].delete() + >>> latent_styles['Light Grid'] + KeyError: no latent style with name 'Light Grid' diff --git a/docs/user/styles.rst b/docs/user/styles.rst deleted file mode 100644 index 87e34272d..000000000 --- a/docs/user/styles.rst +++ /dev/null @@ -1,249 +0,0 @@ - -Understanding Styles -==================== - -**Grasshopper:** - *"Master, why doesn't my paragraph appear with the style I specified?"* - -**Master:** - *"You have come to the right page Grasshopper; read on ..."* - - -What is a style in Word? ------------------------- - -Documents communicate better when like elements are formatted consistently. To -achieve that consistency, professional document designers develop a *style -sheet* which defines the document element types and specifies how each should -be formatted. For example, perhaps body paragraphs are to be set in 9 pt Times -Roman with a line height of 11 pt, justified flush left, ragged right. When -these specifications are applied to each of the elements of the document, -a consistent and polished look is achieved. - -A style in Word is such a set of specifications that may be applied, all at -once, to a document element. Word has paragraph styles, character styles, table -styles, and numbering definitions. These are applied to a paragraph, a span of -text, a table, and a list, respectively. - -Experienced programmers will recognize styles as a level of indirection. The -great thing about those is it allows you to define something once, then apply -that definition many times. This saves the work of defining the same thing over -an over; but more importantly it allows you to change it the definition and -have that change reflected in all the places you originally applied it. - - -Why doesn't the style I applied show up? ----------------------------------------- - -This is likely to show up quite a bit until I can add some fancier features to -work around it, so here it is up top. - -#. When you're working in Word, there are all these styles you can apply to - things, pretty good looking ones that look all the better because you don't - have to make them yourself. Most folks never look further than the built-in - styles. - -#. Although those styles show up in the UI, they're not actually in the - document you're creating, at least not until you use it for the first time. - That's kind of a good thing. They take up room and there's a lot of them. - The file would get a little bloated if it contained all the style - definitions you could use but haven't. - -#. If you apply a style that's not defined in your file (in the styles.xml part - if you're curious), Word just ignores it. It doesn't complain, it just - doesn't change how things are formatted. I'm sure there's a good reason for - this. But it can present as a bit of a puzzle if you don't understand how - Word works that way. - -#. When you use a style, Word adds it to the file. Once there, it stays. - I imagine there's a way to get rid of it, but you have to work at it. If - you apply a style, delete the content you applied it to, and then save the - document; the style definition stays in the saved file. - -All this adds up to the following: If you want to use a style in a document you -create with |docx|, the document you start with must contain the style -definition. Otherwise it just won't work. It won't raise an exception, it just -won't work. - -If you use the "default" template document, it contains the styles listed -below, most of the ones you're likely to want if you're not designing your own. -If you're using your own starting document, you need to use each of the styles -you want at least once in it. You don't have to keep the content, but you need -to apply the style to something at least once before saving the document. -Creating a one-word paragraph, applying five styles to it in succession and -then deleting the paragraph works fine. That's how I got the ones below into -the default template :). - - -Paragraph styles in default template -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* Normal -* BodyText -* BodyText2 -* BodyText3 -* Caption -* Heading1 -* Heading2 -* Heading3 -* Heading4 -* Heading5 -* Heading6 -* Heading7 -* Heading8 -* Heading9 -* IntenseQuote -* List -* List2 -* List3 -* ListBullet -* ListBullet2 -* ListBullet3 -* ListContinue -* ListContinue2 -* ListContinue3 -* ListNumber -* ListNumber2 -* ListNumber3 -* ListParagraph -* MacroText -* NoSpacing -* Quote -* Subtitle -* TOCHeading -* Title - - -Table styles in default template -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* TableNormal -* ColorfulGrid -* ColorfulGrid-Accent1 -* ColorfulGrid-Accent2 -* ColorfulGrid-Accent3 -* ColorfulGrid-Accent4 -* ColorfulGrid-Accent5 -* ColorfulGrid-Accent6 -* ColorfulList -* ColorfulList-Accent1 -* ColorfulList-Accent2 -* ColorfulList-Accent3 -* ColorfulList-Accent4 -* ColorfulList-Accent5 -* ColorfulList-Accent6 -* ColorfulShading -* ColorfulShading-Accent1 -* ColorfulShading-Accent2 -* ColorfulShading-Accent3 -* ColorfulShading-Accent4 -* ColorfulShading-Accent5 -* ColorfulShading-Accent6 -* DarkList -* DarkList-Accent1 -* DarkList-Accent2 -* DarkList-Accent3 -* DarkList-Accent4 -* DarkList-Accent5 -* DarkList-Accent6 -* LightGrid -* LightGrid-Accent1 -* LightGrid-Accent2 -* LightGrid-Accent3 -* LightGrid-Accent4 -* LightGrid-Accent5 -* LightGrid-Accent6 -* LightList -* LightList-Accent1 -* LightList-Accent2 -* LightList-Accent3 -* LightList-Accent4 -* LightList-Accent5 -* LightList-Accent6 -* LightShading -* LightShading-Accent1 -* LightShading-Accent2 -* LightShading-Accent3 -* LightShading-Accent4 -* LightShading-Accent5 -* LightShading-Accent6 -* MediumGrid1 -* MediumGrid1-Accent1 -* MediumGrid1-Accent2 -* MediumGrid1-Accent3 -* MediumGrid1-Accent4 -* MediumGrid1-Accent5 -* MediumGrid1-Accent6 -* MediumGrid2 -* MediumGrid2-Accent1 -* MediumGrid2-Accent2 -* MediumGrid2-Accent3 -* MediumGrid2-Accent4 -* MediumGrid2-Accent5 -* MediumGrid2-Accent6 -* MediumGrid3 -* MediumGrid3-Accent1 -* MediumGrid3-Accent2 -* MediumGrid3-Accent3 -* MediumGrid3-Accent4 -* MediumGrid3-Accent5 -* MediumGrid3-Accent6 -* MediumList1 -* MediumList1-Accent1 -* MediumList1-Accent2 -* MediumList1-Accent3 -* MediumList1-Accent4 -* MediumList1-Accent5 -* MediumList1-Accent6 -* MediumList2 -* MediumList2-Accent1 -* MediumList2-Accent2 -* MediumList2-Accent3 -* MediumList2-Accent4 -* MediumList2-Accent5 -* MediumList2-Accent6 -* MediumShading1 -* MediumShading1-Accent1 -* MediumShading1-Accent2 -* MediumShading1-Accent3 -* MediumShading1-Accent4 -* MediumShading1-Accent5 -* MediumShading1-Accent6 -* MediumShading2 -* MediumShading2-Accent1 -* MediumShading2-Accent2 -* MediumShading2-Accent3 -* MediumShading2-Accent4 -* MediumShading2-Accent5 -* MediumShading2-Accent6 -* TableGrid - - -Character styles in default template -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* BodyText2Char -* BodyText3Char -* BodyTextChar -* BookTitle -* DefaultParagraphFont -* Emphasis -* Heading1Char -* Heading2Char -* Heading3Char -* Heading4Char -* Heading5Char -* Heading6Char -* Heading7Char -* Heading8Char -* Heading9Char -* IntenseEmphasis -* IntenseQuoteChar -* IntenseReference -* MacroTextChar -* QuoteChar -* Strong -* SubtitleChar -* SubtleEmphasis -* SubtleReference -* TitleChar diff --git a/docs/user/text.rst b/docs/user/text.rst index 25ab8f742..113501fa4 100644 --- a/docs/user/text.rst +++ b/docs/user/text.rst @@ -1,28 +1,342 @@ -Low-level text API -================== +Working with Text +================= + +To work effectively with text, it's important to first understand a little +about block-level elements like paragraphs and inline-level objects like +runs. -For the greatest control over inserted text, an understanding of the low-level -text API is required. Block-level vs. inline text objects ----------------------------------- -The paragraph is the primary block-level object in Word. A table is also -a block-level object, however its acts primarily as a container rather than -content. Each cell of a table is a block-level container, much like the -document body itself. Its rows and columns simply provide structure to the -cells. +The paragraph is the primary block-level object in Word. + +A block-level item flows the text it contains between its left and right +edges, adding an additional line each time the text extends beyond its right +boundary. For a paragraph, the boundaries are generally the page margins, but +they can also be column boundaries if the page is laid out in columns, or +cell boundaries if the paragraph occurs inside a table cell. + +A table is also a block-level object. + +An inline object is a portion of the content that occurs inside a block-level +item. An example would be a word that appears in bold or a sentence in +all-caps. The most common inline object is a *run*. All content within +a block container is inside of an inline object. Typically, a paragraph +contains one or more runs, each of which contain some part of the paragraph's +text. + +The attributes of a block-level item specify its placement on the page, such +items as indentation and space before and after a paragraph. The attributes +of an inline item generally specify the font in which the content appears, +things like typeface, font size, bold, and italic. + + +Paragraph properties +-------------------- + +A paragraph has a variety of properties that specify its placement within its +container (typically a page) and the way it divides its content into separate +lines. + +In general, it's best to define a *paragraph style* collecting these +attributes into a meaningful group and apply the appropriate style to each +paragraph, rather than repeatedly apply those properties directly to each +paragraph. This is analogous to how Cascading Style Sheets (CSS) work with +HTML. All the paragraph properties described here can be set using a style as +well as applied directly to a paragraph. + +The formatting properties of a paragraph are accessed using the +|ParagraphFormat| object available using the paragraph's +:attr:`~.Paragraph.paragraph_format` property. + + +Horizontal alignment (justification) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Also known as *justification*, the horizontal alignment of a paragraph can be +set to left, centered, right, or fully justified (aligned on both the left +and right sides) using values from the enumeration +:ref:`WdParagraphAlignment`:: + + >>> from docx.enum.text import WD_ALIGN_PARAGRAPH + >>> document = Document() + >>> paragraph = document.add_paragraph() + >>> paragraph_format = paragraph.paragraph_format + + >>> paragraph_format.alignment + None # indicating alignment is inherited from the style hierarchy + >>> paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER + >>> paragraph_format.alignment + CENTER (1) + + +Indentation +~~~~~~~~~~~ + +Indentation is the horizontal space between a paragraph and edge of its +container, typically the page margin. A paragraph can be indented separately +on the left and right side. The first line can also have a different +indentation than the rest of the paragraph. A first line indented further +than the rest of the paragraph has *first line indent*. A first line indented +less has a *hanging indent*. + +Indentation is specified using a |Length| value, such as |Inches|, |Pt|, or +|Cm|. Negative values are valid and cause the paragraph to overlap the margin +by the specified amount. A value of |None| indicates the indentation value is +inherited from the style hierarchy. Assigning |None| to an indentation +property removes any directly-applied indentation setting and restores +inheritance from the style hierarchy:: + + >>> from docx.shared import Inches + >>> paragraph = document.add_paragraph() + >>> paragraph_format = paragraph.paragraph_format + + >>> paragraph.left_indent + None # indicating indentation is inherited from the style hierarchy + >>> paragraph.left_indent = Inches(0.5) + >>> paragraph.left_indent + 457200 + >>> paragraph.left_indent.inches + 0.5 + + +Right-side indent works in a similar way:: + + >>> from docx.shared import Pt + >>> paragraph.right_indent + None + >>> paragraph.right_indent = Pt(24) + >>> paragraph.right_indent + 304800 + >>> paragraph.right_indent.pt + 24.0 + + +First-line indent is specified using the +:attr:`~.ParagraphFormat.first_line_indent` property and is interpreted +relative to the left indent. A negative value indicates a hanging indent:: + + >>> paragraph.first_line_indent + None + >>> paragraph.first_line_indent = Inches(-0.25) + >>> paragraph.first_line_indent + -228600 + >>> paragraph.first_line_indent.inches + -0.25 + + +Paragraph spacing +~~~~~~~~~~~~~~~~~ + +The :attr:`~.ParagraphFormat.space_before` and +:attr:`~.ParagraphFormat.space_after` properties control the spacing between +subsequent paragraphs, controlling the spacing before and after a paragraph, +respectively. Inter-paragraph spacing is *collapsed* during page layout, +meaning the spacing between two paragraphs is the maximum of the +`space_after` for the first paragraph and the `space_before` of the second +paragraph. Paragraph spacing is specified as a |Length| value, often using +|Pt|:: + + >>> paragraph_format.space_before, paragraph_format.space_after + (None, None) # inherited by default + + >>> paragraph_format.space_before = Pt(18) + >>> paragraph_format.space_before.pt + 18.0 + + >>> paragraph_format.space_after = Pt(12) + >>> paragraph_format.space_after.pt + 12.0 + + +Line spacing +~~~~~~~~~~~~ + +Line spacing is the distance between subsequent baselines in the lines of +a paragraph. Line spacing can be specified either as an absolute distance or +relative to the line height (essentially the point size of the font used). +A typical absolute measure would be 18 points. A typical relative measure +would be double-spaced (2.0 line heights). The default line spacing is +single-spaced (1.0 line heights). + +Line spacing is controlled by the interaction of the +:attr:`~.ParagraphFormat.line_spacing` and +:attr:`~.ParagraphFormat.line_spacing_rule` properties. +:attr:`~.ParagraphFormat.line_spacing` is either a |Length| value, +a (small-ish) |float|, or None. A |Length| value indicates an absolute +distance. A |float| indicates a number of line heights. |None| indicates line +spacing is inherited. :attr:`~.ParagraphFormat.line_spacing_rule` is a member +of the :ref:`WdLineSpacing` enumeration or |None|:: + + >>> from docx.shared import Length + >>> paragraph_format.line_spacing + None + >>> paragraph_format.line_spacing_rule + None + + >>> paragraph_format.line_spacing = Pt(18) + >>> isinstance(Length, paragraph_format.line_spacing) + True + >>> paragraph_format.line_spacing.pt + 18.0 + >>> paragraph_format.line_spacing_rule + EXACTLY (4) + + >>> paragraph_format.line_spacing = 1.75 + >>> paragraph_format.line_spacing + 1.75 + >>> paragraph_format.line_spacing_rule + MULTIPLE (5) + + +Pagination properties +~~~~~~~~~~~~~~~~~~~~~ + +Four paragraph properties, :attr:`~.ParagraphFormat.keep_together`, +:attr:`~.ParagraphFormat.keep_with_next`, +:attr:`~.ParagraphFormat.page_break_before`, and +:attr:`~.ParagraphFormat.widow_control` control aspects of how the paragraph +behaves near page boundaries. + +:attr:`~.ParagraphFormat.keep_together` causes the entire paragraph to appear +on the same page, issuing a page break before the paragraph if it would +otherwise be broken across two pages. + +:attr:`~.ParagraphFormat.keep_with_next` keeps a paragraph on the same page +as the subsequent paragraph. This can be used, for example, to keep a section +heading on the same page as the first paragraph of the section. + +:attr:`~.ParagraphFormat.page_break_before` causes a paragraph to be placed +at the top of a new page. This could be used on a chapter heading to ensure +chapters start on a new page. + +:attr:`~.ParagraphFormat.widow_control` breaks a page to avoid placing the +first or last line of the paragraph on a separate page from the rest of the +paragraph. + +All four of these properties are *tri-state*, meaning they can take the value +|True|, |False|, or |None|. |None| indicates the property value is inherited +from the style hierarchy. |True| means "on" and |False| means "off":: + + >>> paragraph_format.keep_together + None # all four inherit by default + >>> paragraph_format.keep_with_next = True + >>> paragraph_format.keep_with_next + True + >>> paragraph_format.page_break_before = False + >>> paragraph_format.page_break_before + False + + +Apply character formatting +-------------------------- + +Character formatting is applied at the Run level. Examples include font +typeface and size, bold, italic, and underline. + +A |Run| object has a read-only :attr:`~.Run.font` property providing access +to a |Font| object. A run's |Font| object provides properties for getting +and setting the character formatting for that run. + +Several examples are provided here. For a complete set of the available +properties, see the |Font| API documentation. + +The font for a run can be accessed like this:: + + >>> from docx import Document + >>> document = Document() + >>> run = document.add_paragraph().add_run() + >>> font = run.font + +Typeface and size are set like this:: + + >>> from docx.shared import Pt + >>> font.name = 'Calibri' + >>> font.size = Pt(12) + +Many font properties are *tri-state*, meaning they can take the values +|True|, |False|, and |None|. |True| means the property is "on", |False| means +it is "off". Conceptually, the |None| value means "inherit". A run exists in +the style inheritance hierarchy and by default inherits its character +formatting from that hierarchy. Any character formatting directly applied +using the |Font| object overrides the inherited values. + +Bold and italic are tri-state properties, as are all-caps, strikethrough, +superscript, and many others. See the |Font| API documentation for a full +list:: + + >>> font.bold, font.italic + (None, None) + >>> font.italic = True + >>> font.italic + True + >>> font.italic = False + >>> font.italic + False + >>> font.italic = None + >>> font.italic + None + +Underline is a bit of a special case. It is a hybrid of a tri-state property +and an enumerated value property. |True| means single underline, by far the +most common. |False| means no underline, but more often |None| is the right +choice if no underlining is wanted. The other forms of underlining, such as +double or dashed, are specified with a member of the :ref:`WdUnderline` +enumeration:: + + >>> font.underline + None + >>> font.underline = True + >>> # or perhaps + >>> font.underline = WD_UNDERLINE.DOT_DASH + +Font color +~~~~~~~~~~ + +Each |Font| object has a |ColorFormat| object that provides access to its +color, accessed via its read-only :attr:`~.Font.color` property. + +Apply a specific RGB color to a font:: + + >>> from docx.shared import RGBColor + >>> font.color.rgb = RGBColor(0x42, 0x24, 0xE9) + +A font can also be set to a theme color by assigning a member of the +:ref:`MsoThemeColorIndex` enumeration:: + + >>> from docx.enum.dml import MSO_THEME_COLOR + >>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_1 + +A font's color can be restored to its default (inherited) value by assigning +|None| to either the :attr:`~.ColorFormat.rgb` or +:attr:`~.ColorFormat.theme_color` attribute of |ColorFormat|:: + + >>> font.color.rgb = None + +Determining the color of a font begins with determining its color type:: + + >>> font.color.type + RGB (1) + +The value of the :attr:`~.ColorFormat.type` property can be a member of the +:ref:`MsoColorType` enumeration or None. `MSO_COLOR_TYPE.RGB` indicates it is +an RGB color. `MSO_COLOR_TYPE.THEME` indicates a theme color. +`MSO_COLOR_TYPE.AUTO` indicates its value is determined automatically by the +application, usually set to black. (This value is relatively rare.) |None| +indicates no color is applied and the color is inherited from the style +hierarchy; this is the most common case. + +When the color type is `MSO_COLOR_TYPE.RGB`, the :attr:`~.ColorFormat.rgb` +property will be an |RGBColor| value indicating the RGB color:: -A paragraph contains one or more inline elements called *runs*. It is the -run that actually contains text content. + >>> font.color.rgb + RGBColor(0x42, 0x24, 0xe9) -The main purpose of a run it to carry character formatting information, such as -font typeface and size. Bold, italic, and underline formatting are also -examples. All text within a run shares the same character formatting. So -a three-word paragraph having the middle word bold would require three runs. +When the color type is `MSO_COLOR_TYPE.THEME`, the +:attr:`~.ColorFormat.theme_color` property will be a member of +:ref:`MsoThemeColorIndex` indicating the theme color:: -Producing paragraphs containing so-called "rich" text requires building the -paragraph up out of multiple runs. Runs can also contain other content objects -such as line breaks and fields, so there are other reasons you may need to use -the low-level text API. + >>> font.color.theme_color + ACCENT_1 (5) diff --git a/docx/__init__.py b/docx/__init__.py index 4e4fdfda0..1bf421391 100644 --- a/docx/__init__.py +++ b/docx/__init__.py @@ -2,13 +2,14 @@ from docx.api import Document # noqa -__version__ = '0.7.4' +__version__ = '0.8.5' # register custom Part classes with opc package reader from docx.opc.constants import CONTENT_TYPE as CT, RELATIONSHIP_TYPE as RT -from docx.opc.package import PartFactory +from docx.opc.part import PartFactory +from docx.opc.parts.coreprops import CorePropertiesPart from docx.parts.document import DocumentPart from docx.parts.image import ImagePart @@ -23,8 +24,12 @@ def part_class_selector(content_type, reltype): PartFactory.part_class_selector = part_class_selector +PartFactory.part_type_for[CT.OPC_CORE_PROPERTIES] = CorePropertiesPart PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart PartFactory.part_type_for[CT.WML_NUMBERING] = NumberingPart PartFactory.part_type_for[CT.WML_STYLES] = StylesPart -del CT, DocumentPart, PartFactory, part_class_selector +del ( + CT, CorePropertiesPart, DocumentPart, NumberingPart, PartFactory, + StylesPart, part_class_selector +) diff --git a/docx/api.py b/docx/api.py index c1ac093b7..63e18c406 100644 --- a/docx/api.py +++ b/docx/api.py @@ -10,181 +10,28 @@ import os -from docx.enum.section import WD_SECTION -from docx.enum.text import WD_BREAK -from docx.opc.constants import CONTENT_TYPE as CT, RELATIONSHIP_TYPE as RT +from docx.opc.constants import CONTENT_TYPE as CT from docx.package import Package -from docx.parts.numbering import NumberingPart -from docx.parts.styles import StylesPart -from docx.shared import lazyproperty -_thisdir = os.path.split(__file__)[0] -_default_docx_path = os.path.join(_thisdir, 'templates', 'default.docx') - - -class Document(object): +def Document(docx=None): """ - Return a |Document| instance loaded from *docx*, where *docx* can be + Return a |Document| object loaded from *docx*, where *docx* can be either a path to a ``.docx`` file (a string) or a file-like object. If *docx* is missing or ``None``, the built-in default document "template" is loaded. """ - def __init__(self, docx=None): - super(Document, self).__init__() - document_part, package = self._open(docx) - self._document_part = document_part - self._package = package - - def add_heading(self, text='', level=1): - """ - Return a heading paragraph newly added to the end of the document, - populated with *text* and having the heading paragraph style - determined by *level*. If *level* is 0, the style is set to - ``'Title'``. If *level* is 1 (or not present), ``'Heading1'`` is used. - Otherwise the style is set to ``'Heading{level}'``. If *level* is - outside the range 0-9, |ValueError| is raised. - """ - if not 0 <= level <= 9: - raise ValueError("level must be in range 0-9, got %d" % level) - style = 'Title' if level == 0 else 'Heading%d' % level - return self.add_paragraph(text, style) - - def add_page_break(self): - """ - Return a paragraph newly added to the end of the document and - containing only a page break. - """ - p = self._document_part.add_paragraph() - r = p.add_run() - r.add_break(WD_BREAK.PAGE) - return p - - def add_paragraph(self, text='', style=None): - """ - Return a paragraph newly added to the end of the document, populated - with *text* and having paragraph style *style*. *text* can contain - tab (``\\t``) characters, which are converted to the appropriate XML - form for a tab. *text* can also include newline (``\\n``) or carriage - return (``\\r``) characters, each of which is converted to a line - break. - """ - return self._document_part.add_paragraph(text, style) - - def add_picture(self, image_path_or_stream, width=None, height=None): - """ - Return a new picture shape added in its own paragraph at the end of - the document. The picture contains the image at - *image_path_or_stream*, scaled based on *width* and *height*. If - neither width nor height is specified, the picture appears at its - native size. If only one is specified, it is used to compute - a scaling factor that is then applied to the unspecified dimension, - preserving the aspect ratio of the image. The native size of the - picture is calculated using the dots-per-inch (dpi) value specified - in the image file, defaulting to 72 dpi if no value is specified, as - is often the case. - """ - run = self.add_paragraph().add_run() - picture = run.add_picture(image_path_or_stream, width, height) - return picture - - def add_section(self, start_type=WD_SECTION.NEW_PAGE): - """ - Return a |Section| object representing a new section added at the end - of the document. The optional *start_type* argument must be a member - of the :ref:`WdSectionStart` enumeration defaulting to - ``WD_SECTION.NEW_PAGE`` if not provided. - """ - return self._document_part.add_section(start_type) - - def add_table(self, rows, cols, style='LightShading-Accent1'): - """ - Add a table having row and column counts of *rows* and *cols* - respectively and table style of *style*. If *style* is |None|, a - table with no style is produced. - """ - table = self._document_part.add_table(rows, cols) - if style: - table.style = style - return table + docx = _default_docx_path() if docx is None else docx + document_part = Package.open(docx).main_document_part + if document_part.content_type != CT.WML_DOCUMENT_MAIN: + tmpl = "file '%s' is not a Word file, content type is '%s'" + raise ValueError(tmpl % (docx, document_part.content_type)) + return document_part.document - @property - def inline_shapes(self): - """ - Return a reference to the |InlineShapes| instance for this document. - """ - return self._document_part.inline_shapes - @lazyproperty - def numbering_part(self): - """ - Instance of |NumberingPart| for this document. Creates an empty - numbering part if one is not present. - """ - try: - return self._document_part.part_related_by(RT.NUMBERING) - except KeyError: - numbering_part = NumberingPart.new() - self._document_part.relate_to(numbering_part, RT.NUMBERING) - return numbering_part - - @property - def paragraphs(self): - """ - A list of |Paragraph| instances corresponding to the paragraphs in - the document, in document order. Note that paragraphs within revision - marks such as ```` or ```` do not appear in this list. - """ - return self._document_part.paragraphs - - def save(self, path_or_stream): - """ - Save this document to *path_or_stream*, which can be either a path to - a filesystem location (a string) or a file-like object. - """ - self._package.save(path_or_stream) - - @property - def sections(self): - """ - Return a reference to the |Sections| instance for this document. - """ - return self._document_part.sections - - @lazyproperty - def styles_part(self): - """ - Instance of |StylesPart| for this document. Creates an empty styles - part if one is not present. - """ - try: - return self._document_part.part_related_by(RT.STYLES) - except KeyError: - styles_part = StylesPart.new() - self._document_part.relate_to(styles_part, RT.STYLES) - return styles_part - - @property - def tables(self): - """ - A list of |Table| instances corresponding to the tables in the - document, in document order. Note that tables within revision marks - such as ```` or ```` do not appear in this list. - """ - return self._document_part.tables - - @staticmethod - def _open(docx): - """ - Return a (document_part, package) 2-tuple loaded from *docx*, where - *docx* can be either a path to a ``.docx`` file (a string) or a - file-like object. If *docx* is ``None``, the built-in default - document "template" is loaded. - """ - docx = _default_docx_path if docx is None else docx - package = Package.open(docx) - document_part = package.main_document - if document_part.content_type != CT.WML_DOCUMENT_MAIN: - tmpl = "file '%s' is not a Word file, content type is '%s'" - raise ValueError(tmpl % (docx, document_part.content_type)) - return document_part, package +def _default_docx_path(): + """ + Return the path to the built-in default .docx package. + """ + _thisdir = os.path.split(__file__)[0] + return os.path.join(_thisdir, 'templates', 'default.docx') diff --git a/docx/blkcntnr.py b/docx/blkcntnr.py index b11f3a50d..d57a0cd0f 100644 --- a/docx/blkcntnr.py +++ b/docx/blkcntnr.py @@ -8,8 +8,9 @@ from __future__ import absolute_import, print_function +from .oxml.table import CT_Tbl from .shared import Parented -from .text import Paragraph +from .text.paragraph import Paragraph class BlockItemContainer(Parented): @@ -30,27 +31,23 @@ def add_paragraph(self, text='', style=None): paragraph style *style*. If *style* is |None|, no paragraph style is applied, which has the same effect as applying the 'Normal' style. """ - p = self._element.add_p() - paragraph = Paragraph(p, self) + paragraph = self._add_paragraph() if text: paragraph.add_run(text) if style is not None: paragraph.style = style return paragraph - def add_table(self, rows, cols): + def add_table(self, rows, cols, width): """ - Return a newly added table having *rows* rows and *cols* cols, - appended to the content in this container. + Return a table of *width* having *rows* rows and *cols* columns, + newly appended to the content in this container. *width* is evenly + distributed between the table columns. """ from .table import Table - tbl = self._element.add_tbl() - table = Table(tbl, self) - for i in range(cols): - table.add_column() - for i in range(rows): - table.add_row() - return table + tbl = CT_Tbl.new_tbl(rows, cols, width) + self._element._insert_tbl(tbl) + return Table(tbl, self) @property def paragraphs(self): @@ -68,3 +65,10 @@ def tables(self): """ from .table import Table return [Table(tbl, self) for tbl in self._element.tbl_lst] + + def _add_paragraph(self): + """ + Return a paragraph newly added to the end of the content in this + container. + """ + return Paragraph(self._element.add_p(), self) diff --git a/docx/oxml/parts/__init__.py b/docx/dml/__init__.py similarity index 100% rename from docx/oxml/parts/__init__.py rename to docx/dml/__init__.py diff --git a/docx/dml/color.py b/docx/dml/color.py new file mode 100644 index 000000000..2f2f25cb2 --- /dev/null +++ b/docx/dml/color.py @@ -0,0 +1,116 @@ +# encoding: utf-8 + +""" +DrawingML objects related to color, ColorFormat being the most prominent. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from ..enum.dml import MSO_COLOR_TYPE +from ..oxml.simpletypes import ST_HexColorAuto +from ..shared import ElementProxy + + +class ColorFormat(ElementProxy): + """ + Provides access to color settings such as RGB color, theme color, and + luminance adjustments. + """ + + __slots__ = () + + def __init__(self, rPr_parent): + super(ColorFormat, self).__init__(rPr_parent) + + @property + def rgb(self): + """ + An |RGBColor| value or |None| if no RGB color is specified. + + When :attr:`type` is `MSO_COLOR_TYPE.RGB`, the value of this property + will always be an |RGBColor| value. It may also be an |RGBColor| + value if :attr:`type` is `MSO_COLOR_TYPE.THEME`, as Word writes the + current value of a theme color when one is assigned. In that case, + the RGB value should be interpreted as no more than a good guess + however, as the theme color takes precedence at rendering time. Its + value is |None| whenever :attr:`type` is either |None| or + `MSO_COLOR_TYPE.AUTO`. + + Assigning an |RGBColor| value causes :attr:`type` to become + `MSO_COLOR_TYPE.RGB` and any theme color is removed. Assigning |None| + causes any color to be removed such that the effective color is + inherited from the style hierarchy. + """ + color = self._color + if color is None: + return None + if color.val == ST_HexColorAuto.AUTO: + return None + return color.val + + @rgb.setter + def rgb(self, value): + if value is None and self._color is None: + return + rPr = self._element.get_or_add_rPr() + rPr._remove_color() + if value is not None: + rPr.get_or_add_color().val = value + + @property + def theme_color(self): + """ + A member of :ref:`MsoThemeColorIndex` or |None| if no theme color is + specified. When :attr:`type` is `MSO_COLOR_TYPE.THEME`, the value of + this property will always be a member of :ref:`MsoThemeColorIndex`. + When :attr:`type` has any other value, the value of this property is + |None|. + + Assigning a member of :ref:`MsoThemeColorIndex` causes :attr:`type` + to become `MSO_COLOR_TYPE.THEME`. Any existing RGB value is retained + but ignored by Word. Assigning |None| causes any color specification + to be removed such that the effective color is inherited from the + style hierarchy. + """ + color = self._color + if color is None or color.themeColor is None: + return None + return color.themeColor + + @theme_color.setter + def theme_color(self, value): + if value is None: + if self._color is not None: + self._element.rPr._remove_color() + return + self._element.get_or_add_rPr().get_or_add_color().themeColor = value + + @property + def type(self): + """ + Read-only. A member of :ref:`MsoColorType`, one of RGB, THEME, or + AUTO, corresponding to the way this color is defined. Its value is + |None| if no color is applied at this level, which causes the + effective color to be inherited from the style hierarchy. + """ + color = self._color + if color is None: + return None + if color.themeColor is not None: + return MSO_COLOR_TYPE.THEME + if color.val == ST_HexColorAuto.AUTO: + return MSO_COLOR_TYPE.AUTO + return MSO_COLOR_TYPE.RGB + + @property + def _color(self): + """ + Return `w:rPr/w:color` or |None| if not present. Helper to factor out + repetitive element access. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.color diff --git a/docx/document.py b/docx/document.py new file mode 100644 index 000000000..655a70e95 --- /dev/null +++ b/docx/document.py @@ -0,0 +1,207 @@ +# encoding: utf-8 + +""" +|Document| and closely related objects +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from .blkcntnr import BlockItemContainer +from .enum.section import WD_SECTION +from .enum.text import WD_BREAK +from .section import Section, Sections +from .shared import ElementProxy, Emu + + +class Document(ElementProxy): + """ + WordprocessingML (WML) document. Not intended to be constructed directly. + Use :func:`docx.Document` to open or create a document. + """ + + __slots__ = ('_part', '__body') + + def __init__(self, element, part): + super(Document, self).__init__(element) + self._part = part + self.__body = None + + def add_heading(self, text='', level=1): + """ + Return a heading paragraph newly added to the end of the document, + containing *text* and having its paragraph style determined by + *level*. If *level* is 0, the style is set to `Title`. If *level* is + 1 (or omitted), `Heading 1` is used. Otherwise the style is set to + `Heading {level}`. Raises |ValueError| if *level* is outside the + range 0-9. + """ + if not 0 <= level <= 9: + raise ValueError("level must be in range 0-9, got %d" % level) + style = 'Title' if level == 0 else 'Heading %d' % level + return self.add_paragraph(text, style) + + def add_page_break(self): + """ + Return a paragraph newly added to the end of the document and + containing only a page break. + """ + paragraph = self.add_paragraph() + paragraph.add_run().add_break(WD_BREAK.PAGE) + return paragraph + + def add_paragraph(self, text='', style=None): + """ + Return a paragraph newly added to the end of the document, populated + with *text* and having paragraph style *style*. *text* can contain + tab (``\\t``) characters, which are converted to the appropriate XML + form for a tab. *text* can also include newline (``\\n``) or carriage + return (``\\r``) characters, each of which is converted to a line + break. + """ + return self._body.add_paragraph(text, style) + + def add_picture(self, image_path_or_stream, width=None, height=None): + """ + Return a new picture shape added in its own paragraph at the end of + the document. The picture contains the image at + *image_path_or_stream*, scaled based on *width* and *height*. If + neither width nor height is specified, the picture appears at its + native size. If only one is specified, it is used to compute + a scaling factor that is then applied to the unspecified dimension, + preserving the aspect ratio of the image. The native size of the + picture is calculated using the dots-per-inch (dpi) value specified + in the image file, defaulting to 72 dpi if no value is specified, as + is often the case. + """ + run = self.add_paragraph().add_run() + return run.add_picture(image_path_or_stream, width, height) + + def add_section(self, start_type=WD_SECTION.NEW_PAGE): + """ + Return a |Section| object representing a new section added at the end + of the document. The optional *start_type* argument must be a member + of the :ref:`WdSectionStart` enumeration, and defaults to + ``WD_SECTION.NEW_PAGE`` if not provided. + """ + new_sectPr = self._element.body.add_section_break() + new_sectPr.start_type = start_type + return Section(new_sectPr) + + def add_table(self, rows, cols, style=None): + """ + Add a table having row and column counts of *rows* and *cols* + respectively and table style of *style*. *style* may be a paragraph + style object or a paragraph style name. If *style* is |None|, the + table inherits the default table style of the document. + """ + table = self._body.add_table(rows, cols, self._block_width) + table.style = style + return table + + @property + def core_properties(self): + """ + A |CoreProperties| object providing read/write access to the core + properties of this document. + """ + return self._part.core_properties + + @property + def inline_shapes(self): + """ + An |InlineShapes| object providing access to the inline shapes in + this document. An inline shape is a graphical object, such as + a picture, contained in a run of text and behaving like a character + glyph, being flowed like other text in a paragraph. + """ + return self._part.inline_shapes + + @property + def paragraphs(self): + """ + A list of |Paragraph| instances corresponding to the paragraphs in + the document, in document order. Note that paragraphs within revision + marks such as ```` or ```` do not appear in this list. + """ + return self._body.paragraphs + + @property + def part(self): + """ + The |DocumentPart| object of this document. + """ + return self._part + + def save(self, path_or_stream): + """ + Save this document to *path_or_stream*, which can be either a path to + a filesystem location (a string) or a file-like object. + """ + self._part.save(path_or_stream) + + @property + def sections(self): + """ + A |Sections| object providing access to each section in this + document. + """ + return Sections(self._element) + + @property + def styles(self): + """ + A |Styles| object providing access to the styles in this document. + """ + return self._part.styles + + @property + def tables(self): + """ + A list of |Table| instances corresponding to the tables in the + document, in document order. Note that only tables appearing at the + top level of the document appear in this list; a table nested inside + a table cell does not appear. A table within revision marks such as + ```` or ```` will also not appear in the list. + """ + return self._body.tables + + @property + def _block_width(self): + """ + Return a |Length| object specifying the width of available "writing" + space between the margins of the last section of this document. + """ + section = self.sections[-1] + return Emu( + section.page_width - section.left_margin - section.right_margin + ) + + @property + def _body(self): + """ + The |_Body| instance containing the content for this document. + """ + if self.__body is None: + self.__body = _Body(self._element.body, self) + return self.__body + + +class _Body(BlockItemContainer): + """ + Proxy for ```` element in this document, having primarily a + container role. + """ + def __init__(self, body_elm, parent): + super(_Body, self).__init__(body_elm, parent) + self._body = body_elm + + def clear_content(self): + """ + Return this |_Body| instance after clearing it of all content. + Section properties for the main document story, if present, are + preserved. + """ + self._body.clear_content() + return self diff --git a/docx/enum/dml.py b/docx/enum/dml.py new file mode 100644 index 000000000..1ad0eaa87 --- /dev/null +++ b/docx/enum/dml.py @@ -0,0 +1,124 @@ +# encoding: utf-8 + +""" +Enumerations used by DrawingML objects +""" + +from __future__ import absolute_import + +from .base import ( + alias, Enumeration, EnumMember, XmlEnumeration, XmlMappedEnumMember +) + + +class MSO_COLOR_TYPE(Enumeration): + """ + Specifies the color specification scheme + + Example:: + + from docx.enum.dml import MSO_COLOR_TYPE + + assert font.color.type == MSO_COLOR_TYPE.SCHEME + """ + + __ms_name__ = 'MsoColorType' + + __url__ = ( + 'http://msdn.microsoft.com/en-us/library/office/ff864912(v=office.15' + ').aspx' + ) + + __members__ = ( + EnumMember( + 'RGB', 1, 'Color is specified by an |RGBColor| value.' + ), + EnumMember( + 'THEME', 2, 'Color is one of the preset theme colors.' + ), + EnumMember( + 'AUTO', 101, 'Color is determined automatically by the ' + 'application.' + ), + ) + + +@alias('MSO_THEME_COLOR') +class MSO_THEME_COLOR_INDEX(XmlEnumeration): + """ + Indicates the Office theme color, one of those shown in the color gallery + on the formatting ribbon. + + Alias: ``MSO_THEME_COLOR`` + + Example:: + + from docx.enum.dml import MSO_THEME_COLOR + + font.color.theme_color = MSO_THEME_COLOR.ACCENT_1 + """ + + __ms_name__ = 'MsoThemeColorIndex' + + __url__ = ( + 'http://msdn.microsoft.com/en-us/library/office/ff860782(v=office.15' + ').aspx' + ) + + __members__ = ( + EnumMember( + 'NOT_THEME_COLOR', 0, 'Indicates the color is not a theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_1', 5, 'accent1', 'Specifies the Accent 1 theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_2', 6, 'accent2', 'Specifies the Accent 2 theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_3', 7, 'accent3', 'Specifies the Accent 3 theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_4', 8, 'accent4', 'Specifies the Accent 4 theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_5', 9, 'accent5', 'Specifies the Accent 5 theme color.' + ), + XmlMappedEnumMember( + 'ACCENT_6', 10, 'accent6', 'Specifies the Accent 6 theme color.' + ), + XmlMappedEnumMember( + 'BACKGROUND_1', 14, 'background1', 'Specifies the Background 1 ' + 'theme color.' + ), + XmlMappedEnumMember( + 'BACKGROUND_2', 16, 'background2', 'Specifies the Background 2 ' + 'theme color.' + ), + XmlMappedEnumMember( + 'DARK_1', 1, 'dark1', 'Specifies the Dark 1 theme color.' + ), + XmlMappedEnumMember( + 'DARK_2', 3, 'dark2', 'Specifies the Dark 2 theme color.' + ), + XmlMappedEnumMember( + 'FOLLOWED_HYPERLINK', 12, 'followedHyperlink', 'Specifies the ' + 'theme color for a clicked hyperlink.' + ), + XmlMappedEnumMember( + 'HYPERLINK', 11, 'hyperlink', 'Specifies the theme color for a ' + 'hyperlink.' + ), + XmlMappedEnumMember( + 'LIGHT_1', 2, 'light1', 'Specifies the Light 1 theme color.' + ), + XmlMappedEnumMember( + 'LIGHT_2', 4, 'light2', 'Specifies the Light 2 theme color.' + ), + XmlMappedEnumMember( + 'TEXT_1', 13, 'text1', 'Specifies the Text 1 theme color.' + ), + XmlMappedEnumMember( + 'TEXT_2', 15, 'text2', 'Specifies the Text 2 theme color.' + ), + ) diff --git a/docx/enum/style.py b/docx/enum/style.py new file mode 100644 index 000000000..515c594ce --- /dev/null +++ b/docx/enum/style.py @@ -0,0 +1,466 @@ +# encoding: utf-8 + +""" +Enumerations related to styles +""" + +from __future__ import absolute_import, print_function, unicode_literals + +from .base import alias, EnumMember, XmlEnumeration, XmlMappedEnumMember + + +@alias('WD_STYLE') +class WD_BUILTIN_STYLE(XmlEnumeration): + """ + alias: **WD_STYLE** + + Specifies a built-in Microsoft Word style. + + Example:: + + from docx import Document + from docx.enum.style import WD_STYLE + + document = Document() + styles = document.styles + style = styles[WD_STYLE.BODY_TEXT] + """ + + __ms_name__ = 'WdBuiltinStyle' + + __url__ = 'http://msdn.microsoft.com/en-us/library/office/ff835210.aspx' + + __members__ = ( + EnumMember( + 'BLOCK_QUOTATION', -85, 'Block Text.' + ), + EnumMember( + 'BODY_TEXT', -67, 'Body Text.' + ), + EnumMember( + 'BODY_TEXT_2', -81, 'Body Text 2.' + ), + EnumMember( + 'BODY_TEXT_3', -82, 'Body Text 3.' + ), + EnumMember( + 'BODY_TEXT_FIRST_INDENT', -78, 'Body Text First Indent.' + ), + EnumMember( + 'BODY_TEXT_FIRST_INDENT_2', -79, 'Body Text First Indent 2.' + ), + EnumMember( + 'BODY_TEXT_INDENT', -68, 'Body Text Indent.' + ), + EnumMember( + 'BODY_TEXT_INDENT_2', -83, 'Body Text Indent 2.' + ), + EnumMember( + 'BODY_TEXT_INDENT_3', -84, 'Body Text Indent 3.' + ), + EnumMember( + 'BOOK_TITLE', -265, 'Book Title.' + ), + EnumMember( + 'CAPTION', -35, 'Caption.' + ), + EnumMember( + 'CLOSING', -64, 'Closing.' + ), + EnumMember( + 'COMMENT_REFERENCE', -40, 'Comment Reference.' + ), + EnumMember( + 'COMMENT_TEXT', -31, 'Comment Text.' + ), + EnumMember( + 'DATE', -77, 'Date.' + ), + EnumMember( + 'DEFAULT_PARAGRAPH_FONT', -66, 'Default Paragraph Font.' + ), + EnumMember( + 'EMPHASIS', -89, 'Emphasis.' + ), + EnumMember( + 'ENDNOTE_REFERENCE', -43, 'Endnote Reference.' + ), + EnumMember( + 'ENDNOTE_TEXT', -44, 'Endnote Text.' + ), + EnumMember( + 'ENVELOPE_ADDRESS', -37, 'Envelope Address.' + ), + EnumMember( + 'ENVELOPE_RETURN', -38, 'Envelope Return.' + ), + EnumMember( + 'FOOTER', -33, 'Footer.' + ), + EnumMember( + 'FOOTNOTE_REFERENCE', -39, 'Footnote Reference.' + ), + EnumMember( + 'FOOTNOTE_TEXT', -30, 'Footnote Text.' + ), + EnumMember( + 'HEADER', -32, 'Header.' + ), + EnumMember( + 'HEADING_1', -2, 'Heading 1.' + ), + EnumMember( + 'HEADING_2', -3, 'Heading 2.' + ), + EnumMember( + 'HEADING_3', -4, 'Heading 3.' + ), + EnumMember( + 'HEADING_4', -5, 'Heading 4.' + ), + EnumMember( + 'HEADING_5', -6, 'Heading 5.' + ), + EnumMember( + 'HEADING_6', -7, 'Heading 6.' + ), + EnumMember( + 'HEADING_7', -8, 'Heading 7.' + ), + EnumMember( + 'HEADING_8', -9, 'Heading 8.' + ), + EnumMember( + 'HEADING_9', -10, 'Heading 9.' + ), + EnumMember( + 'HTML_ACRONYM', -96, 'HTML Acronym.' + ), + EnumMember( + 'HTML_ADDRESS', -97, 'HTML Address.' + ), + EnumMember( + 'HTML_CITE', -98, 'HTML Cite.' + ), + EnumMember( + 'HTML_CODE', -99, 'HTML Code.' + ), + EnumMember( + 'HTML_DFN', -100, 'HTML Definition.' + ), + EnumMember( + 'HTML_KBD', -101, 'HTML Keyboard.' + ), + EnumMember( + 'HTML_NORMAL', -95, 'Normal (Web).' + ), + EnumMember( + 'HTML_PRE', -102, 'HTML Preformatted.' + ), + EnumMember( + 'HTML_SAMP', -103, 'HTML Sample.' + ), + EnumMember( + 'HTML_TT', -104, 'HTML Typewriter.' + ), + EnumMember( + 'HTML_VAR', -105, 'HTML Variable.' + ), + EnumMember( + 'HYPERLINK', -86, 'Hyperlink.' + ), + EnumMember( + 'HYPERLINK_FOLLOWED', -87, 'Followed Hyperlink.' + ), + EnumMember( + 'INDEX_1', -11, 'Index 1.' + ), + EnumMember( + 'INDEX_2', -12, 'Index 2.' + ), + EnumMember( + 'INDEX_3', -13, 'Index 3.' + ), + EnumMember( + 'INDEX_4', -14, 'Index 4.' + ), + EnumMember( + 'INDEX_5', -15, 'Index 5.' + ), + EnumMember( + 'INDEX_6', -16, 'Index 6.' + ), + EnumMember( + 'INDEX_7', -17, 'Index 7.' + ), + EnumMember( + 'INDEX_8', -18, 'Index 8.' + ), + EnumMember( + 'INDEX_9', -19, 'Index 9.' + ), + EnumMember( + 'INDEX_HEADING', -34, 'Index Heading' + ), + EnumMember( + 'INTENSE_EMPHASIS', -262, 'Intense Emphasis.' + ), + EnumMember( + 'INTENSE_QUOTE', -182, 'Intense Quote.' + ), + EnumMember( + 'INTENSE_REFERENCE', -264, 'Intense Reference.' + ), + EnumMember( + 'LINE_NUMBER', -41, 'Line Number.' + ), + EnumMember( + 'LIST', -48, 'List.' + ), + EnumMember( + 'LIST_2', -51, 'List 2.' + ), + EnumMember( + 'LIST_3', -52, 'List 3.' + ), + EnumMember( + 'LIST_4', -53, 'List 4.' + ), + EnumMember( + 'LIST_5', -54, 'List 5.' + ), + EnumMember( + 'LIST_BULLET', -49, 'List Bullet.' + ), + EnumMember( + 'LIST_BULLET_2', -55, 'List Bullet 2.' + ), + EnumMember( + 'LIST_BULLET_3', -56, 'List Bullet 3.' + ), + EnumMember( + 'LIST_BULLET_4', -57, 'List Bullet 4.' + ), + EnumMember( + 'LIST_BULLET_5', -58, 'List Bullet 5.' + ), + EnumMember( + 'LIST_CONTINUE', -69, 'List Continue.' + ), + EnumMember( + 'LIST_CONTINUE_2', -70, 'List Continue 2.' + ), + EnumMember( + 'LIST_CONTINUE_3', -71, 'List Continue 3.' + ), + EnumMember( + 'LIST_CONTINUE_4', -72, 'List Continue 4.' + ), + EnumMember( + 'LIST_CONTINUE_5', -73, 'List Continue 5.' + ), + EnumMember( + 'LIST_NUMBER', -50, 'List Number.' + ), + EnumMember( + 'LIST_NUMBER_2', -59, 'List Number 2.' + ), + EnumMember( + 'LIST_NUMBER_3', -60, 'List Number 3.' + ), + EnumMember( + 'LIST_NUMBER_4', -61, 'List Number 4.' + ), + EnumMember( + 'LIST_NUMBER_5', -62, 'List Number 5.' + ), + EnumMember( + 'LIST_PARAGRAPH', -180, 'List Paragraph.' + ), + EnumMember( + 'MACRO_TEXT', -46, 'Macro Text.' + ), + EnumMember( + 'MESSAGE_HEADER', -74, 'Message Header.' + ), + EnumMember( + 'NAV_PANE', -90, 'Document Map.' + ), + EnumMember( + 'NORMAL', -1, 'Normal.' + ), + EnumMember( + 'NORMAL_INDENT', -29, 'Normal Indent.' + ), + EnumMember( + 'NORMAL_OBJECT', -158, 'Normal (applied to an object).' + ), + EnumMember( + 'NORMAL_TABLE', -106, 'Normal (applied within a table).' + ), + EnumMember( + 'NOTE_HEADING', -80, 'Note Heading.' + ), + EnumMember( + 'PAGE_NUMBER', -42, 'Page Number.' + ), + EnumMember( + 'PLAIN_TEXT', -91, 'Plain Text.' + ), + EnumMember( + 'QUOTE', -181, 'Quote.' + ), + EnumMember( + 'SALUTATION', -76, 'Salutation.' + ), + EnumMember( + 'SIGNATURE', -65, 'Signature.' + ), + EnumMember( + 'STRONG', -88, 'Strong.' + ), + EnumMember( + 'SUBTITLE', -75, 'Subtitle.' + ), + EnumMember( + 'SUBTLE_EMPHASIS', -261, 'Subtle Emphasis.' + ), + EnumMember( + 'SUBTLE_REFERENCE', -263, 'Subtle Reference.' + ), + EnumMember( + 'TABLE_COLORFUL_GRID', -172, 'Colorful Grid.' + ), + EnumMember( + 'TABLE_COLORFUL_LIST', -171, 'Colorful List.' + ), + EnumMember( + 'TABLE_COLORFUL_SHADING', -170, 'Colorful Shading.' + ), + EnumMember( + 'TABLE_DARK_LIST', -169, 'Dark List.' + ), + EnumMember( + 'TABLE_LIGHT_GRID', -161, 'Light Grid.' + ), + EnumMember( + 'TABLE_LIGHT_GRID_ACCENT_1', -175, 'Light Grid Accent 1.' + ), + EnumMember( + 'TABLE_LIGHT_LIST', -160, 'Light List.' + ), + EnumMember( + 'TABLE_LIGHT_LIST_ACCENT_1', -174, 'Light List Accent 1.' + ), + EnumMember( + 'TABLE_LIGHT_SHADING', -159, 'Light Shading.' + ), + EnumMember( + 'TABLE_LIGHT_SHADING_ACCENT_1', -173, 'Light Shading Accent 1.' + ), + EnumMember( + 'TABLE_MEDIUM_GRID_1', -166, 'Medium Grid 1.' + ), + EnumMember( + 'TABLE_MEDIUM_GRID_2', -167, 'Medium Grid 2.' + ), + EnumMember( + 'TABLE_MEDIUM_GRID_3', -168, 'Medium Grid 3.' + ), + EnumMember( + 'TABLE_MEDIUM_LIST_1', -164, 'Medium List 1.' + ), + EnumMember( + 'TABLE_MEDIUM_LIST_1_ACCENT_1', -178, 'Medium List 1 Accent 1.' + ), + EnumMember( + 'TABLE_MEDIUM_LIST_2', -165, 'Medium List 2.' + ), + EnumMember( + 'TABLE_MEDIUM_SHADING_1', -162, 'Medium Shading 1.' + ), + EnumMember( + 'TABLE_MEDIUM_SHADING_1_ACCENT_1', -176, + 'Medium Shading 1 Accent 1.' + ), + EnumMember( + 'TABLE_MEDIUM_SHADING_2', -163, 'Medium Shading 2.' + ), + EnumMember( + 'TABLE_MEDIUM_SHADING_2_ACCENT_1', -177, + 'Medium Shading 2 Accent 1.' + ), + EnumMember( + 'TABLE_OF_AUTHORITIES', -45, 'Table of Authorities.' + ), + EnumMember( + 'TABLE_OF_FIGURES', -36, 'Table of Figures.' + ), + EnumMember( + 'TITLE', -63, 'Title.' + ), + EnumMember( + 'TOAHEADING', -47, 'TOA Heading.' + ), + EnumMember( + 'TOC_1', -20, 'TOC 1.' + ), + EnumMember( + 'TOC_2', -21, 'TOC 2.' + ), + EnumMember( + 'TOC_3', -22, 'TOC 3.' + ), + EnumMember( + 'TOC_4', -23, 'TOC 4.' + ), + EnumMember( + 'TOC_5', -24, 'TOC 5.' + ), + EnumMember( + 'TOC_6', -25, 'TOC 6.' + ), + EnumMember( + 'TOC_7', -26, 'TOC 7.' + ), + EnumMember( + 'TOC_8', -27, 'TOC 8.' + ), + EnumMember( + 'TOC_9', -28, 'TOC 9.' + ), + ) + + +class WD_STYLE_TYPE(XmlEnumeration): + """ + Specifies one of the four style types: paragraph, character, list, or + table. + + Example:: + + from docx import Document + from docx.enum.style import WD_STYLE_TYPE + + styles = Document().styles + assert styles[0].type == WD_STYLE_TYPE.PARAGRAPH + """ + + __ms_name__ = 'WdStyleType' + + __url__ = 'http://msdn.microsoft.com/en-us/library/office/ff196870.aspx' + + __members__ = ( + XmlMappedEnumMember( + 'CHARACTER', 2, 'character', 'Character style.' + ), + XmlMappedEnumMember( + 'LIST', 4, 'numbering', 'List style.' + ), + XmlMappedEnumMember( + 'PARAGRAPH', 1, 'paragraph', 'Paragraph style.' + ), + XmlMappedEnumMember( + 'TABLE', 3, 'table', 'Table style.' + ), + ) diff --git a/docx/enum/table.py b/docx/enum/table.py new file mode 100644 index 000000000..bc201346c --- /dev/null +++ b/docx/enum/table.py @@ -0,0 +1,71 @@ +# encoding: utf-8 + +""" +Enumerations related to tables in WordprocessingML files +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from .base import ( + Enumeration, EnumMember, XmlEnumeration, XmlMappedEnumMember +) + + +class WD_TABLE_ALIGNMENT(XmlEnumeration): + """ + Specifies table justification type. + + Example:: + + from docx.enum.table import WD_TABLE_ALIGNMENT + + table = document.add_table(3, 3) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + """ + + __ms_name__ = 'WdRowAlignment' + + __url__ = ' http://office.microsoft.com/en-us/word-help/HV080607259.aspx' + + __members__ = ( + XmlMappedEnumMember( + 'LEFT', 0, 'left', 'Left-aligned' + ), + XmlMappedEnumMember( + 'CENTER', 1, 'center', 'Center-aligned.' + ), + XmlMappedEnumMember( + 'RIGHT', 2, 'right', 'Right-aligned.' + ), + ) + + +class WD_TABLE_DIRECTION(Enumeration): + """ + Specifies the direction in which an application orders cells in the + specified table or row. + + Example:: + + from docx.enum.table import WD_TABLE_DIRECTION + + table = document.add_table(3, 3) + table.direction = WD_TABLE_DIRECTION.RTL + """ + + __ms_name__ = 'WdTableDirection' + + __url__ = ' http://msdn.microsoft.com/en-us/library/ff835141.aspx' + + __members__ = ( + EnumMember( + 'LTR', 0, 'The table or row is arranged with the first column ' + 'in the leftmost position.' + ), + EnumMember( + 'RTL', 1, 'The table or row is arranged with the first column ' + 'in the rightmost position.' + ), + ) diff --git a/docx/enum/text.py b/docx/enum/text.py index 713597fc6..3bb16d308 100644 --- a/docx/enum/text.py +++ b/docx/enum/text.py @@ -6,7 +6,7 @@ from __future__ import absolute_import, print_function, unicode_literals -from .base import alias, XmlEnumeration, XmlMappedEnumMember +from .base import alias, EnumMember, XmlEnumeration, XmlMappedEnumMember @alias('WD_ALIGN_PARAGRAPH') @@ -84,6 +84,48 @@ class WD_BREAK_TYPE(object): WD_BREAK = WD_BREAK_TYPE +class WD_LINE_SPACING(XmlEnumeration): + """ + Specifies a line spacing format to be applied to a paragraph. + + Example:: + + from docx.enum.text import WD_LINE_SPACING + + paragraph = document.add_paragraph() + paragraph.line_spacing_rule = WD_LINE_SPACING.EXACTLY + """ + + __ms_name__ = 'WdLineSpacing' + + __url__ = 'http://msdn.microsoft.com/en-us/library/office/ff844910.aspx' + + __members__ = ( + EnumMember( + 'ONE_POINT_FIVE', 1, 'Space-and-a-half line spacing.' + ), + XmlMappedEnumMember( + 'AT_LEAST', 3, 'atLeast', 'Line spacing is always at least the s' + 'pecified amount. The amount is specified separately.' + ), + EnumMember( + 'DOUBLE', 2, 'Double spaced.' + ), + XmlMappedEnumMember( + 'EXACTLY', 4, 'exact', 'Line spacing is exactly the specified am' + 'ount. The amount is specified separately.' + ), + XmlMappedEnumMember( + 'MULTIPLE', 5, 'auto', 'Line spacing is specified as a multiple ' + 'of line heights. Changing the font size will change the line sp' + 'acing proportionately.' + ), + EnumMember( + 'SINGLE', 0, 'Single spaced (default).' + ), + ) + + class WD_UNDERLINE(XmlEnumeration): """ Specifies the style of underline applied to a run of characters. diff --git a/docx/exceptions.py b/docx/exceptions.py index 00215615b..7a8b99c81 100644 --- a/docx/exceptions.py +++ b/docx/exceptions.py @@ -13,6 +13,13 @@ class PythonDocxError(Exception): """ +class InvalidSpanError(PythonDocxError): + """ + Raised when an invalid merge region is specified in a request to merge + table cells. + """ + + class InvalidXmlError(PythonDocxError): """ Raised when invalid XML is encountered, such as on attempt to access a diff --git a/docx/image/image.py b/docx/image/image.py index 692ea5860..ba2158e72 100644 --- a/docx/image/image.py +++ b/docx/image/image.py @@ -11,8 +11,8 @@ import os from ..compat import BytesIO, is_string -from ..shared import lazyproperty from .exceptions import UnrecognizedImageError +from ..shared import Emu, Inches, lazyproperty class Image(object): @@ -117,6 +117,49 @@ def vert_dpi(self): """ return self._image_header.vert_dpi + @property + def width(self): + """ + A |Length| value representing the native width of the image, + calculated from the values of `px_width` and `horz_dpi`. + """ + return Inches(self.px_width / self.horz_dpi) + + @property + def height(self): + """ + A |Length| value representing the native height of the image, + calculated from the values of `px_height` and `vert_dpi`. + """ + return Inches(self.px_height / self.vert_dpi) + + def scaled_dimensions(self, width=None, height=None): + """ + Return a (cx, cy) 2-tuple representing the native dimensions of this + image scaled by applying the following rules to *width* and *height*. + If both *width* and *height* are specified, the return value is + (*width*, *height*); no scaling is performed. If only one is + specified, it is used to compute a scaling factor that is then + applied to the unspecified dimension, preserving the aspect ratio of + the image. If both *width* and *height* are |None|, the native + dimensions are returned. The native dimensions are calculated using + the dots-per-inch (dpi) value embedded in the image, defaulting to 72 + dpi if no value is specified, as is often the case. The returned + values are both |Length| objects. + """ + if width is None and height is None: + return self.width, self.height + + if width is None: + scaling_factor = float(height) / float(self.height) + width = round(self.width * scaling_factor) + + if height is None: + scaling_factor = float(width) / float(self.width) + height = round(self.height * scaling_factor) + + return Emu(width), Emu(height) + @lazyproperty def sha1(self): """ diff --git a/docx/image/tiff.py b/docx/image/tiff.py index d6561eca8..c38242360 100644 --- a/docx/image/tiff.py +++ b/docx/image/tiff.py @@ -116,14 +116,22 @@ def _dpi(self, resolution_tag): calculation is based on the values of both that tag and the TIFF_TAG.RESOLUTION_UNIT tag in this parser's |_IfdEntries| instance. """ - if resolution_tag not in self._ifd_entries: + ifd_entries = self._ifd_entries + + if resolution_tag not in ifd_entries: return 72 - resolution_unit = self._ifd_entries[TIFF_TAG.RESOLUTION_UNIT] + + # resolution unit defaults to inches (2) + resolution_unit = ( + ifd_entries[TIFF_TAG.RESOLUTION_UNIT] + if TIFF_TAG.RESOLUTION_UNIT in ifd_entries else 2 + ) + if resolution_unit == 1: # aspect ratio only return 72 # resolution_unit == 2 for inches, 3 for centimeters units_per_inch = 1 if resolution_unit == 2 else 2.54 - dots_per_unit = self._ifd_entries[resolution_tag] + dots_per_unit = ifd_entries[resolution_tag] return int(round(dots_per_unit * units_per_inch)) @classmethod diff --git a/docx/opc/coreprops.py b/docx/opc/coreprops.py new file mode 100644 index 000000000..2d38dabd3 --- /dev/null +++ b/docx/opc/coreprops.py @@ -0,0 +1,139 @@ +# encoding: utf-8 + +""" +The :mod:`pptx.packaging` module coheres around the concerns of reading and +writing presentations to and from a .pptx file. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + + +class CoreProperties(object): + """ + Corresponds to part named ``/docProps/core.xml``, containing the core + document properties for this document package. + """ + def __init__(self, element): + self._element = element + + @property + def author(self): + return self._element.author_text + + @author.setter + def author(self, value): + self._element.author_text = value + + @property + def category(self): + return self._element.category_text + + @category.setter + def category(self, value): + self._element.category_text = value + + @property + def comments(self): + return self._element.comments_text + + @comments.setter + def comments(self, value): + self._element.comments_text = value + + @property + def content_status(self): + return self._element.contentStatus_text + + @content_status.setter + def content_status(self, value): + self._element.contentStatus_text = value + + @property + def created(self): + return self._element.created_datetime + + @created.setter + def created(self, value): + self._element.created_datetime = value + + @property + def identifier(self): + return self._element.identifier_text + + @identifier.setter + def identifier(self, value): + self._element.identifier_text = value + + @property + def keywords(self): + return self._element.keywords_text + + @keywords.setter + def keywords(self, value): + self._element.keywords_text = value + + @property + def language(self): + return self._element.language_text + + @language.setter + def language(self, value): + self._element.language_text = value + + @property + def last_modified_by(self): + return self._element.lastModifiedBy_text + + @last_modified_by.setter + def last_modified_by(self, value): + self._element.lastModifiedBy_text = value + + @property + def last_printed(self): + return self._element.lastPrinted_datetime + + @last_printed.setter + def last_printed(self, value): + self._element.lastPrinted_datetime = value + + @property + def modified(self): + return self._element.modified_datetime + + @modified.setter + def modified(self, value): + self._element.modified_datetime = value + + @property + def revision(self): + return self._element.revision_number + + @revision.setter + def revision(self, value): + self._element.revision_number = value + + @property + def subject(self): + return self._element.subject_text + + @subject.setter + def subject(self, value): + self._element.subject_text = value + + @property + def title(self): + return self._element.title_text + + @title.setter + def title(self, value): + self._element.title_text = value + + @property + def version(self): + return self._element.version_text + + @version.setter + def version(self, value): + self._element.version_text = value diff --git a/docx/opc/package.py b/docx/opc/package.py index 6c44453ce..b0ea37ea5 100644 --- a/docx/opc/package.py +++ b/docx/opc/package.py @@ -7,13 +7,13 @@ from __future__ import absolute_import, print_function, unicode_literals -from .compat import cls_method_fn from .constants import RELATIONSHIP_TYPE as RT -from .oxml import CT_Relationships, serialize_part_xml -from ..oxml import parse_xml -from .packuri import PACKAGE_URI, PackURI +from .packuri import PACKAGE_URI +from .part import PartFactory +from .parts.coreprops import CorePropertiesPart from .pkgreader import PackageReader from .pkgwriter import PackageWriter +from .rel import Relationships from .shared import lazyproperty @@ -35,6 +35,14 @@ def after_unmarshal(self): # subclass pass + @property + def core_properties(self): + """ + |CoreProperties| object providing read/write access to the Dublin + Core properties for this document. + """ + return self._core_properties_part.core_properties + def iter_rels(self): """ Generate exactly one reference to each relationship in the package by @@ -90,7 +98,7 @@ def load_rel(self, reltype, target, rId, is_external=False): return self.rels.add_relationship(reltype, target, rId, is_external) @property - def main_document(self): + def main_document_part(self): """ Return a reference to the main document part for this package. Examples include a document part for a WordprocessingML package, a @@ -151,343 +159,18 @@ def save(self, pkg_file): part.before_marshal() PackageWriter.write(pkg_file, self.rels, self.parts) - -class Part(object): - """ - Base class for package parts. Provides common properties and methods, but - intended to be subclassed in client code to implement specific part - behaviors. - """ - def __init__(self, partname, content_type, blob=None, package=None): - super(Part, self).__init__() - self._partname = partname - self._content_type = content_type - self._blob = blob - self._package = package - - def after_unmarshal(self): - """ - Entry point for post-unmarshaling processing, for example to parse - the part XML. May be overridden by subclasses without forwarding call - to super. - """ - # don't place any code here, just catch call if not overridden by - # subclass - pass - - def before_marshal(self): - """ - Entry point for pre-serialization processing, for example to finalize - part naming if necessary. May be overridden by subclasses without - forwarding call to super. - """ - # don't place any code here, just catch call if not overridden by - # subclass - pass - @property - def blob(self): - """ - Contents of this package part as a sequence of bytes. May be text or - binary. Intended to be overridden by subclasses. Default behavior is - to return load blob. - """ - return self._blob - - @property - def content_type(self): - """ - Content type of this part. - """ - return self._content_type - - def drop_rel(self, rId): - """ - Remove the relationship identified by *rId* if its reference count - is less than 2. Relationships with a reference count of 0 are - implicit relationships. - """ - if self._rel_ref_count(rId) < 2: - del self.rels[rId] - - @classmethod - def load(cls, partname, content_type, blob, package): - return cls(partname, content_type, blob, package) - - def load_rel(self, reltype, target, rId, is_external=False): - """ - Return newly added |_Relationship| instance of *reltype* between this - part and *target* with key *rId*. Target mode is set to - ``RTM.EXTERNAL`` if *is_external* is |True|. Intended for use during - load from a serialized package, where the rId is well-known. Other - methods exist for adding a new relationship to a part when - manipulating a part. - """ - return self.rels.add_relationship(reltype, target, rId, is_external) - - @property - def partname(self): - """ - |PackURI| instance holding partname of this part, e.g. - '/ppt/slides/slide1.xml' - """ - return self._partname - - @partname.setter - def partname(self, partname): - if not isinstance(partname, PackURI): - tmpl = "partname must be instance of PackURI, got '%s'" - raise TypeError(tmpl % type(partname).__name__) - self._partname = partname - - @property - def package(self): - """ - |OpcPackage| instance this part belongs to. - """ - return self._package - - def part_related_by(self, reltype): - """ - Return part to which this part has a relationship of *reltype*. - Raises |KeyError| if no such relationship is found and |ValueError| - if more than one such relationship is found. Provides ability to - resolve implicitly related part, such as Slide -> SlideLayout. - """ - return self.rels.part_with_reltype(reltype) - - def relate_to(self, target, reltype, is_external=False): - """ - Return rId key of relationship of *reltype* to *target*, from an - existing relationship if there is one, otherwise a newly created one. - """ - if is_external: - return self.rels.get_or_add_ext_rel(reltype, target) - else: - rel = self.rels.get_or_add(reltype, target) - return rel.rId - - @property - def related_parts(self): - """ - Dictionary mapping related parts by rId, so child objects can resolve - explicit relationships present in the part XML, e.g. sldIdLst to a - specific |Slide| instance. - """ - return self.rels.related_parts - - @lazyproperty - def rels(self): - """ - |Relationships| instance holding the relationships for this part. - """ - return Relationships(self._partname.baseURI) - - def target_ref(self, rId): - """ - Return URL contained in target ref of relationship identified by - *rId*. - """ - rel = self.rels[rId] - return rel.target_ref - - def _rel_ref_count(self, rId): + def _core_properties_part(self): """ - Return the count of references in this part's XML to the relationship - identified by *rId*. + |CorePropertiesPart| object related to this package. Creates + a default core properties part if one is not present (not common). """ - rIds = self._element.xpath('//@r:id') - return len([_rId for _rId in rIds if _rId == rId]) - - -class XmlPart(Part): - """ - Base class for package parts containing an XML payload, which is most of - them. Provides additional methods to the |Part| base class that take care - of parsing and reserializing the XML payload and managing relationships - to other parts. - """ - def __init__(self, partname, content_type, element, package): - super(XmlPart, self).__init__( - partname, content_type, package=package - ) - self._element = element - - @property - def blob(self): - return serialize_part_xml(self._element) - - @classmethod - def load(cls, partname, content_type, blob, package): - element = parse_xml(blob) - return cls(partname, content_type, element, package) - - @property - def part(self): - """ - Part of the parent protocol, "children" of the document will not know - the part that contains them so must ask their parent object. That - chain of delegation ends here for child objects. - """ - return self - - -class PartFactory(object): - """ - Provides a way for client code to specify a subclass of |Part| to be - constructed by |Unmarshaller| based on its content type and/or a custom - callable. Setting ``PartFactory.part_class_selector`` to a callable - object will cause that object to be called with the parameters - ``content_type, reltype``, once for each part in the package. If the - callable returns an object, it is used as the class for that part. If it - returns |None|, part class selection falls back to the content type map - defined in ``PartFactory.part_type_for``. If no class is returned from - either of these, the class contained in ``PartFactory.default_part_type`` - is used to construct the part, which is by default ``opc.package.Part``. - """ - part_class_selector = None - part_type_for = {} - default_part_type = Part - - def __new__(cls, partname, content_type, reltype, blob, package): - PartClass = None - if cls.part_class_selector is not None: - part_class_selector = cls_method_fn(cls, 'part_class_selector') - PartClass = part_class_selector(content_type, reltype) - if PartClass is None: - PartClass = cls._part_cls_for(content_type) - return PartClass.load(partname, content_type, blob, package) - - @classmethod - def _part_cls_for(cls, content_type): - """ - Return the custom part class registered for *content_type*, or the - default part class if no custom class is registered for - *content_type*. - """ - if content_type in cls.part_type_for: - return cls.part_type_for[content_type] - return cls.default_part_type - - -class Relationships(dict): - """ - Collection object for |_Relationship| instances, having list semantics. - """ - def __init__(self, baseURI): - super(Relationships, self).__init__() - self._baseURI = baseURI - self._target_parts_by_rId = {} - - def add_relationship(self, reltype, target, rId, is_external=False): - """ - Return a newly added |_Relationship| instance. - """ - rel = _Relationship(rId, reltype, target, self._baseURI, is_external) - self[rId] = rel - if not is_external: - self._target_parts_by_rId[rId] = target - return rel - - def get_or_add(self, reltype, target_part): - """ - Return relationship of *reltype* to *target_part*, newly added if not - already present in collection. - """ - rel = self._get_matching(reltype, target_part) - if rel is None: - rId = self._next_rId - rel = self.add_relationship(reltype, target_part, rId) - return rel - - def get_or_add_ext_rel(self, reltype, target_ref): - """ - Return rId of external relationship of *reltype* to *target_ref*, - newly added if not already present in collection. - """ - rel = self._get_matching(reltype, target_ref, is_external=True) - if rel is None: - rId = self._next_rId - rel = self.add_relationship( - reltype, target_ref, rId, is_external=True - ) - return rel.rId - - def part_with_reltype(self, reltype): - """ - Return target part of rel with matching *reltype*, raising |KeyError| - if not found and |ValueError| if more than one matching relationship - is found. - """ - rel = self._get_rel_of_type(reltype) - return rel.target_part - - @property - def related_parts(self): - """ - dict mapping rIds to target parts for all the internal relationships - in the collection. - """ - return self._target_parts_by_rId - - @property - def xml(self): - """ - Serialize this relationship collection into XML suitable for storage - as a .rels file in an OPC package. - """ - rels_elm = CT_Relationships.new() - for rel in self.values(): - rels_elm.add_rel( - rel.rId, rel.reltype, rel.target_ref, rel.is_external - ) - return rels_elm.xml - - def _get_matching(self, reltype, target, is_external=False): - """ - Return relationship of matching *reltype*, *target*, and - *is_external* from collection, or None if not found. - """ - def matches(rel, reltype, target, is_external): - if rel.reltype != reltype: - return False - if rel.is_external != is_external: - return False - rel_target = rel.target_ref if rel.is_external else rel.target_part - if rel_target != target: - return False - return True - - for rel in self.values(): - if matches(rel, reltype, target, is_external): - return rel - return None - - def _get_rel_of_type(self, reltype): - """ - Return single relationship of type *reltype* from the collection. - Raises |KeyError| if no matching relationship is found. Raises - |ValueError| if more than one matching relationship is found. - """ - matching = [rel for rel in self.values() if rel.reltype == reltype] - if len(matching) == 0: - tmpl = "no relationship of type '%s' in collection" - raise KeyError(tmpl % reltype) - if len(matching) > 1: - tmpl = "multiple relationships of type '%s' in collection" - raise ValueError(tmpl % reltype) - return matching[0] - - @property - def _next_rId(self): - """ - Next available rId in collection, starting from 'rId1' and making use - of any gaps in numbering, e.g. 'rId2' for rIds ['rId1', 'rId3']. - """ - for n in range(1, len(self)+2): - rId_candidate = 'rId%d' % n # like 'rId19' - if rId_candidate not in self: - return rId_candidate + try: + return self.part_related_by(RT.CORE_PROPERTIES) + except KeyError: + core_properties_part = CorePropertiesPart.default(self) + self.relate_to(core_properties_part, RT.CORE_PROPERTIES) + return core_properties_part class Unmarshaller(object): @@ -536,42 +219,3 @@ def _unmarshal_relationships(pkg_reader, package, parts): target = (srel.target_ref if srel.is_external else parts[srel.target_partname]) source.load_rel(srel.reltype, target, srel.rId, srel.is_external) - - -class _Relationship(object): - """ - Value object for relationship to part. - """ - def __init__(self, rId, reltype, target, baseURI, external=False): - super(_Relationship, self).__init__() - self._rId = rId - self._reltype = reltype - self._target = target - self._baseURI = baseURI - self._is_external = bool(external) - - @property - def is_external(self): - return self._is_external - - @property - def reltype(self): - return self._reltype - - @property - def rId(self): - return self._rId - - @property - def target_part(self): - if self._is_external: - raise ValueError("target_part property on _Relationship is undef" - "ined when target mode is External") - return self._target - - @property - def target_ref(self): - if self._is_external: - return self._target - else: - return self._target.partname.relative_ref(self._baseURI) diff --git a/docx/opc/part.py b/docx/opc/part.py new file mode 100644 index 000000000..928d3c183 --- /dev/null +++ b/docx/opc/part.py @@ -0,0 +1,241 @@ +# encoding: utf-8 + +""" +Open Packaging Convention (OPC) objects related to package parts. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from .compat import cls_method_fn +from .oxml import serialize_part_xml +from ..oxml import parse_xml +from .packuri import PackURI +from .rel import Relationships +from .shared import lazyproperty + + +class Part(object): + """ + Base class for package parts. Provides common properties and methods, but + intended to be subclassed in client code to implement specific part + behaviors. + """ + def __init__(self, partname, content_type, blob=None, package=None): + super(Part, self).__init__() + self._partname = partname + self._content_type = content_type + self._blob = blob + self._package = package + + def after_unmarshal(self): + """ + Entry point for post-unmarshaling processing, for example to parse + the part XML. May be overridden by subclasses without forwarding call + to super. + """ + # don't place any code here, just catch call if not overridden by + # subclass + pass + + def before_marshal(self): + """ + Entry point for pre-serialization processing, for example to finalize + part naming if necessary. May be overridden by subclasses without + forwarding call to super. + """ + # don't place any code here, just catch call if not overridden by + # subclass + pass + + @property + def blob(self): + """ + Contents of this package part as a sequence of bytes. May be text or + binary. Intended to be overridden by subclasses. Default behavior is + to return load blob. + """ + return self._blob + + @property + def content_type(self): + """ + Content type of this part. + """ + return self._content_type + + def drop_rel(self, rId): + """ + Remove the relationship identified by *rId* if its reference count + is less than 2. Relationships with a reference count of 0 are + implicit relationships. + """ + if self._rel_ref_count(rId) < 2: + del self.rels[rId] + + @classmethod + def load(cls, partname, content_type, blob, package): + return cls(partname, content_type, blob, package) + + def load_rel(self, reltype, target, rId, is_external=False): + """ + Return newly added |_Relationship| instance of *reltype* between this + part and *target* with key *rId*. Target mode is set to + ``RTM.EXTERNAL`` if *is_external* is |True|. Intended for use during + load from a serialized package, where the rId is well-known. Other + methods exist for adding a new relationship to a part when + manipulating a part. + """ + return self.rels.add_relationship(reltype, target, rId, is_external) + + @property + def package(self): + """ + |OpcPackage| instance this part belongs to. + """ + return self._package + + @property + def partname(self): + """ + |PackURI| instance holding partname of this part, e.g. + '/ppt/slides/slide1.xml' + """ + return self._partname + + @partname.setter + def partname(self, partname): + if not isinstance(partname, PackURI): + tmpl = "partname must be instance of PackURI, got '%s'" + raise TypeError(tmpl % type(partname).__name__) + self._partname = partname + + def part_related_by(self, reltype): + """ + Return part to which this part has a relationship of *reltype*. + Raises |KeyError| if no such relationship is found and |ValueError| + if more than one such relationship is found. Provides ability to + resolve implicitly related part, such as Slide -> SlideLayout. + """ + return self.rels.part_with_reltype(reltype) + + def relate_to(self, target, reltype, is_external=False): + """ + Return rId key of relationship of *reltype* to *target*, from an + existing relationship if there is one, otherwise a newly created one. + """ + if is_external: + return self.rels.get_or_add_ext_rel(reltype, target) + else: + rel = self.rels.get_or_add(reltype, target) + return rel.rId + + @property + def related_parts(self): + """ + Dictionary mapping related parts by rId, so child objects can resolve + explicit relationships present in the part XML, e.g. sldIdLst to a + specific |Slide| instance. + """ + return self.rels.related_parts + + @lazyproperty + def rels(self): + """ + |Relationships| instance holding the relationships for this part. + """ + return Relationships(self._partname.baseURI) + + def target_ref(self, rId): + """ + Return URL contained in target ref of relationship identified by + *rId*. + """ + rel = self.rels[rId] + return rel.target_ref + + def _rel_ref_count(self, rId): + """ + Return the count of references in this part's XML to the relationship + identified by *rId*. + """ + rIds = self._element.xpath('//@r:id') + return len([_rId for _rId in rIds if _rId == rId]) + + +class PartFactory(object): + """ + Provides a way for client code to specify a subclass of |Part| to be + constructed by |Unmarshaller| based on its content type and/or a custom + callable. Setting ``PartFactory.part_class_selector`` to a callable + object will cause that object to be called with the parameters + ``content_type, reltype``, once for each part in the package. If the + callable returns an object, it is used as the class for that part. If it + returns |None|, part class selection falls back to the content type map + defined in ``PartFactory.part_type_for``. If no class is returned from + either of these, the class contained in ``PartFactory.default_part_type`` + is used to construct the part, which is by default ``opc.package.Part``. + """ + part_class_selector = None + part_type_for = {} + default_part_type = Part + + def __new__(cls, partname, content_type, reltype, blob, package): + PartClass = None + if cls.part_class_selector is not None: + part_class_selector = cls_method_fn(cls, 'part_class_selector') + PartClass = part_class_selector(content_type, reltype) + if PartClass is None: + PartClass = cls._part_cls_for(content_type) + return PartClass.load(partname, content_type, blob, package) + + @classmethod + def _part_cls_for(cls, content_type): + """ + Return the custom part class registered for *content_type*, or the + default part class if no custom class is registered for + *content_type*. + """ + if content_type in cls.part_type_for: + return cls.part_type_for[content_type] + return cls.default_part_type + + +class XmlPart(Part): + """ + Base class for package parts containing an XML payload, which is most of + them. Provides additional methods to the |Part| base class that take care + of parsing and reserializing the XML payload and managing relationships + to other parts. + """ + def __init__(self, partname, content_type, element, package): + super(XmlPart, self).__init__( + partname, content_type, package=package + ) + self._element = element + + @property + def blob(self): + return serialize_part_xml(self._element) + + @property + def element(self): + """ + The root XML element of this XML part. + """ + return self._element + + @classmethod + def load(cls, partname, content_type, blob, package): + element = parse_xml(blob) + return cls(partname, content_type, element, package) + + @property + def part(self): + """ + Part of the parent protocol, "children" of the document will not know + the part that contains them so must ask their parent object. That + chain of delegation ends here for child objects. + """ + return self diff --git a/docx/opc/parts/__init__.py b/docx/opc/parts/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/docx/opc/parts/coreprops.py b/docx/opc/parts/coreprops.py new file mode 100644 index 000000000..3c692fb99 --- /dev/null +++ b/docx/opc/parts/coreprops.py @@ -0,0 +1,54 @@ +# encoding: utf-8 + +""" +Core properties part, corresponds to ``/docProps/core.xml`` part in package. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from datetime import datetime + +from ..constants import CONTENT_TYPE as CT +from ..coreprops import CoreProperties +from ...oxml.coreprops import CT_CoreProperties +from ..packuri import PackURI +from ..part import XmlPart + + +class CorePropertiesPart(XmlPart): + """ + Corresponds to part named ``/docProps/core.xml``, containing the core + document properties for this document package. + """ + @classmethod + def default(cls, package): + """ + Return a new |CorePropertiesPart| object initialized with default + values for its base properties. + """ + core_properties_part = cls._new(package) + core_properties = core_properties_part.core_properties + core_properties.title = 'Word Document' + core_properties.last_modified_by = 'python-docx' + core_properties.revision = 1 + core_properties.modified = datetime.utcnow() + return core_properties_part + + @property + def core_properties(self): + """ + A |CoreProperties| object providing read/write access to the core + properties contained in this core properties part. + """ + return CoreProperties(self.element) + + @classmethod + def _new(cls, package): + partname = PackURI('/docProps/core.xml') + content_type = CT.OPC_CORE_PROPERTIES + coreProperties = CT_CoreProperties.new() + return CorePropertiesPart( + partname, content_type, coreProperties, package + ) diff --git a/docx/opc/rel.py b/docx/opc/rel.py new file mode 100644 index 000000000..7dba2af8e --- /dev/null +++ b/docx/opc/rel.py @@ -0,0 +1,170 @@ +# encoding: utf-8 + +""" +Relationship-related objects. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from .oxml import CT_Relationships + + +class Relationships(dict): + """ + Collection object for |_Relationship| instances, having list semantics. + """ + def __init__(self, baseURI): + super(Relationships, self).__init__() + self._baseURI = baseURI + self._target_parts_by_rId = {} + + def add_relationship(self, reltype, target, rId, is_external=False): + """ + Return a newly added |_Relationship| instance. + """ + rel = _Relationship(rId, reltype, target, self._baseURI, is_external) + self[rId] = rel + if not is_external: + self._target_parts_by_rId[rId] = target + return rel + + def get_or_add(self, reltype, target_part): + """ + Return relationship of *reltype* to *target_part*, newly added if not + already present in collection. + """ + rel = self._get_matching(reltype, target_part) + if rel is None: + rId = self._next_rId + rel = self.add_relationship(reltype, target_part, rId) + return rel + + def get_or_add_ext_rel(self, reltype, target_ref): + """ + Return rId of external relationship of *reltype* to *target_ref*, + newly added if not already present in collection. + """ + rel = self._get_matching(reltype, target_ref, is_external=True) + if rel is None: + rId = self._next_rId + rel = self.add_relationship( + reltype, target_ref, rId, is_external=True + ) + return rel.rId + + def part_with_reltype(self, reltype): + """ + Return target part of rel with matching *reltype*, raising |KeyError| + if not found and |ValueError| if more than one matching relationship + is found. + """ + rel = self._get_rel_of_type(reltype) + return rel.target_part + + @property + def related_parts(self): + """ + dict mapping rIds to target parts for all the internal relationships + in the collection. + """ + return self._target_parts_by_rId + + @property + def xml(self): + """ + Serialize this relationship collection into XML suitable for storage + as a .rels file in an OPC package. + """ + rels_elm = CT_Relationships.new() + for rel in self.values(): + rels_elm.add_rel( + rel.rId, rel.reltype, rel.target_ref, rel.is_external + ) + return rels_elm.xml + + def _get_matching(self, reltype, target, is_external=False): + """ + Return relationship of matching *reltype*, *target*, and + *is_external* from collection, or None if not found. + """ + def matches(rel, reltype, target, is_external): + if rel.reltype != reltype: + return False + if rel.is_external != is_external: + return False + rel_target = rel.target_ref if rel.is_external else rel.target_part + if rel_target != target: + return False + return True + + for rel in self.values(): + if matches(rel, reltype, target, is_external): + return rel + return None + + def _get_rel_of_type(self, reltype): + """ + Return single relationship of type *reltype* from the collection. + Raises |KeyError| if no matching relationship is found. Raises + |ValueError| if more than one matching relationship is found. + """ + matching = [rel for rel in self.values() if rel.reltype == reltype] + if len(matching) == 0: + tmpl = "no relationship of type '%s' in collection" + raise KeyError(tmpl % reltype) + if len(matching) > 1: + tmpl = "multiple relationships of type '%s' in collection" + raise ValueError(tmpl % reltype) + return matching[0] + + @property + def _next_rId(self): + """ + Next available rId in collection, starting from 'rId1' and making use + of any gaps in numbering, e.g. 'rId2' for rIds ['rId1', 'rId3']. + """ + for n in range(1, len(self)+2): + rId_candidate = 'rId%d' % n # like 'rId19' + if rId_candidate not in self: + return rId_candidate + + +class _Relationship(object): + """ + Value object for relationship to part. + """ + def __init__(self, rId, reltype, target, baseURI, external=False): + super(_Relationship, self).__init__() + self._rId = rId + self._reltype = reltype + self._target = target + self._baseURI = baseURI + self._is_external = bool(external) + + @property + def is_external(self): + return self._is_external + + @property + def reltype(self): + return self._reltype + + @property + def rId(self): + return self._rId + + @property + def target_part(self): + if self._is_external: + raise ValueError("target_part property on _Relationship is undef" + "ined when target mode is External") + return self._target + + @property + def target_ref(self): + if self._is_external: + return self._target + else: + return self._target.partname.relative_ref(self._baseURI) diff --git a/docx/oxml/__init__.py b/docx/oxml/__init__.py index c5938c7c8..b6f2b747e 100644 --- a/docx/oxml/__init__.py +++ b/docx/oxml/__init__.py @@ -64,34 +64,17 @@ def OxmlElement(nsptag_str, attrs=None, nsdecls=None): # custom element class mappings # =========================================================================== -from docx.oxml.shared import CT_DecimalNumber, CT_OnOff, CT_String +from .shared import CT_DecimalNumber, CT_OnOff, CT_String -from docx.oxml.shape import ( - CT_Blip, CT_BlipFillProperties, CT_GraphicalObject, - CT_GraphicalObjectData, CT_Inline, CT_NonVisualDrawingProps, CT_Picture, - CT_PictureNonVisual, CT_Point2D, CT_PositiveSize2D, CT_ShapeProperties, - CT_Transform2D -) -register_element_cls('a:blip', CT_Blip) -register_element_cls('a:ext', CT_PositiveSize2D) -register_element_cls('a:graphic', CT_GraphicalObject) -register_element_cls('a:graphicData', CT_GraphicalObjectData) -register_element_cls('a:off', CT_Point2D) -register_element_cls('a:xfrm', CT_Transform2D) -register_element_cls('pic:blipFill', CT_BlipFillProperties) -register_element_cls('pic:cNvPr', CT_NonVisualDrawingProps) -register_element_cls('pic:nvPicPr', CT_PictureNonVisual) -register_element_cls('pic:pic', CT_Picture) -register_element_cls('pic:spPr', CT_ShapeProperties) -register_element_cls('wp:docPr', CT_NonVisualDrawingProps) -register_element_cls('wp:extent', CT_PositiveSize2D) -register_element_cls('wp:inline', CT_Inline) -from docx.oxml.parts.document import CT_Body, CT_Document +from .coreprops import CT_CoreProperties +register_element_cls('cp:coreProperties', CT_CoreProperties) + +from .document import CT_Body, CT_Document register_element_cls('w:body', CT_Body) register_element_cls('w:document', CT_Document) -from docx.oxml.parts.numbering import ( +from .numbering import ( CT_Num, CT_Numbering, CT_NumLvl, CT_NumPr ) register_element_cls('w:abstractNumId', CT_DecimalNumber) @@ -103,52 +86,83 @@ def OxmlElement(nsptag_str, attrs=None, nsdecls=None): register_element_cls('w:numbering', CT_Numbering) register_element_cls('w:startOverride', CT_DecimalNumber) -from docx.oxml.parts.styles import CT_Style, CT_Styles -register_element_cls('w:style', CT_Style) -register_element_cls('w:styles', CT_Styles) - -from docx.oxml.section import CT_PageMar, CT_PageSz, CT_SectPr, CT_SectType +from .section import CT_PageMar, CT_PageSz, CT_SectPr, CT_SectType register_element_cls('w:pgMar', CT_PageMar) register_element_cls('w:pgSz', CT_PageSz) register_element_cls('w:sectPr', CT_SectPr) register_element_cls('w:type', CT_SectType) -from docx.oxml.table import ( +from .shape import ( + CT_Blip, CT_BlipFillProperties, CT_GraphicalObject, + CT_GraphicalObjectData, CT_Inline, CT_NonVisualDrawingProps, CT_Picture, + CT_PictureNonVisual, CT_Point2D, CT_PositiveSize2D, CT_ShapeProperties, + CT_Transform2D +) +register_element_cls('a:blip', CT_Blip) +register_element_cls('a:ext', CT_PositiveSize2D) +register_element_cls('a:graphic', CT_GraphicalObject) +register_element_cls('a:graphicData', CT_GraphicalObjectData) +register_element_cls('a:off', CT_Point2D) +register_element_cls('a:xfrm', CT_Transform2D) +register_element_cls('pic:blipFill', CT_BlipFillProperties) +register_element_cls('pic:cNvPr', CT_NonVisualDrawingProps) +register_element_cls('pic:nvPicPr', CT_PictureNonVisual) +register_element_cls('pic:pic', CT_Picture) +register_element_cls('pic:spPr', CT_ShapeProperties) +register_element_cls('wp:docPr', CT_NonVisualDrawingProps) +register_element_cls('wp:extent', CT_PositiveSize2D) +register_element_cls('wp:inline', CT_Inline) + +from .styles import CT_LatentStyles, CT_LsdException, CT_Style, CT_Styles +register_element_cls('w:basedOn', CT_String) +register_element_cls('w:latentStyles', CT_LatentStyles) +register_element_cls('w:locked', CT_OnOff) +register_element_cls('w:lsdException', CT_LsdException) +register_element_cls('w:name', CT_String) +register_element_cls('w:next', CT_String) +register_element_cls('w:qFormat', CT_OnOff) +register_element_cls('w:semiHidden', CT_OnOff) +register_element_cls('w:style', CT_Style) +register_element_cls('w:styles', CT_Styles) +register_element_cls('w:uiPriority', CT_DecimalNumber) +register_element_cls('w:unhideWhenUsed', CT_OnOff) + +from .table import ( CT_Row, CT_Tbl, CT_TblGrid, CT_TblGridCol, CT_TblLayoutType, CT_TblPr, - CT_TblWidth, CT_Tc, CT_TcPr + CT_TblWidth, CT_Tc, CT_TcPr, CT_VMerge ) -register_element_cls('w:gridCol', CT_TblGridCol) -register_element_cls('w:tbl', CT_Tbl) -register_element_cls('w:tblGrid', CT_TblGrid) -register_element_cls('w:tblLayout', CT_TblLayoutType) -register_element_cls('w:tblPr', CT_TblPr) -register_element_cls('w:tblStyle', CT_String) -register_element_cls('w:tc', CT_Tc) -register_element_cls('w:tcPr', CT_TcPr) -register_element_cls('w:tcW', CT_TblWidth) -register_element_cls('w:tr', CT_Row) - -from docx.oxml.text import ( - CT_Br, CT_Jc, CT_P, CT_PPr, CT_R, CT_RPr, CT_Text, CT_Underline +register_element_cls('w:bidiVisual', CT_OnOff) +register_element_cls('w:gridCol', CT_TblGridCol) +register_element_cls('w:gridSpan', CT_DecimalNumber) +register_element_cls('w:tbl', CT_Tbl) +register_element_cls('w:tblGrid', CT_TblGrid) +register_element_cls('w:tblLayout', CT_TblLayoutType) +register_element_cls('w:tblPr', CT_TblPr) +register_element_cls('w:tblStyle', CT_String) +register_element_cls('w:tc', CT_Tc) +register_element_cls('w:tcPr', CT_TcPr) +register_element_cls('w:tcW', CT_TblWidth) +register_element_cls('w:tr', CT_Row) +register_element_cls('w:vMerge', CT_VMerge) + +from .text.font import ( + CT_Color, CT_Fonts, CT_HpsMeasure, CT_RPr, CT_Underline, + CT_VerticalAlignRun ) register_element_cls('w:b', CT_OnOff) register_element_cls('w:bCs', CT_OnOff) -register_element_cls('w:br', CT_Br) register_element_cls('w:caps', CT_OnOff) +register_element_cls('w:color', CT_Color) register_element_cls('w:cs', CT_OnOff) register_element_cls('w:dstrike', CT_OnOff) register_element_cls('w:emboss', CT_OnOff) register_element_cls('w:i', CT_OnOff) register_element_cls('w:iCs', CT_OnOff) register_element_cls('w:imprint', CT_OnOff) -register_element_cls('w:jc', CT_Jc) register_element_cls('w:noProof', CT_OnOff) register_element_cls('w:oMath', CT_OnOff) register_element_cls('w:outline', CT_OnOff) -register_element_cls('w:p', CT_P) -register_element_cls('w:pPr', CT_PPr) -register_element_cls('w:pStyle', CT_String) -register_element_cls('w:r', CT_R) +register_element_cls('w:rFonts', CT_Fonts) register_element_cls('w:rPr', CT_RPr) register_element_cls('w:rStyle', CT_String) register_element_cls('w:rtl', CT_OnOff) @@ -157,7 +171,30 @@ def OxmlElement(nsptag_str, attrs=None, nsdecls=None): register_element_cls('w:snapToGrid', CT_OnOff) register_element_cls('w:specVanish', CT_OnOff) register_element_cls('w:strike', CT_OnOff) -register_element_cls('w:t', CT_Text) +register_element_cls('w:sz', CT_HpsMeasure) register_element_cls('w:u', CT_Underline) register_element_cls('w:vanish', CT_OnOff) +register_element_cls('w:vertAlign', CT_VerticalAlignRun) register_element_cls('w:webHidden', CT_OnOff) + +from .text.paragraph import CT_P +register_element_cls('w:p', CT_P) + +from .text.hyperlink import CT_Hyperlink +register_element_cls('w:hyperlink', CT_Hyperlink) + +from .text.parfmt import CT_Ind, CT_Jc, CT_PPr, CT_Spacing +register_element_cls('w:ind', CT_Ind) +register_element_cls('w:jc', CT_Jc) +register_element_cls('w:keepLines', CT_OnOff) +register_element_cls('w:keepNext', CT_OnOff) +register_element_cls('w:pageBreakBefore', CT_OnOff) +register_element_cls('w:pPr', CT_PPr) +register_element_cls('w:pStyle', CT_String) +register_element_cls('w:spacing', CT_Spacing) +register_element_cls('w:widowControl', CT_OnOff) + +from .text.run import CT_Br, CT_R, CT_Text +register_element_cls('w:br', CT_Br) +register_element_cls('w:r', CT_R) +register_element_cls('w:t', CT_Text) diff --git a/docx/oxml/coreprops.py b/docx/oxml/coreprops.py new file mode 100644 index 000000000..b53807443 --- /dev/null +++ b/docx/oxml/coreprops.py @@ -0,0 +1,318 @@ +# encoding: utf-8 + +""" +lxml custom element classes for core properties-related XML elements. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +import re + +from datetime import datetime, timedelta + +from . import parse_xml +from .ns import nsdecls, qn +from .xmlchemy import BaseOxmlElement, ZeroOrOne + + +class CT_CoreProperties(BaseOxmlElement): + """ + ```` element, the root element of the Core Properties + part stored as ``/docProps/core.xml``. Implements many of the Dublin Core + document metadata elements. String elements resolve to an empty string + ('') if the element is not present in the XML. String elements are + limited in length to 255 unicode characters. + """ + category = ZeroOrOne('cp:category', successors=()) + contentStatus = ZeroOrOne('cp:contentStatus', successors=()) + created = ZeroOrOne('dcterms:created', successors=()) + creator = ZeroOrOne('dc:creator', successors=()) + description = ZeroOrOne('dc:description', successors=()) + identifier = ZeroOrOne('dc:identifier', successors=()) + keywords = ZeroOrOne('cp:keywords', successors=()) + language = ZeroOrOne('dc:language', successors=()) + lastModifiedBy = ZeroOrOne('cp:lastModifiedBy', successors=()) + lastPrinted = ZeroOrOne('cp:lastPrinted', successors=()) + modified = ZeroOrOne('dcterms:modified', successors=()) + revision = ZeroOrOne('cp:revision', successors=()) + subject = ZeroOrOne('dc:subject', successors=()) + title = ZeroOrOne('dc:title', successors=()) + version = ZeroOrOne('cp:version', successors=()) + + _coreProperties_tmpl = ( + '\n' % nsdecls('cp', 'dc', 'dcterms') + ) + + @classmethod + def new(cls): + """ + Return a new ```` element + """ + xml = cls._coreProperties_tmpl + coreProperties = parse_xml(xml) + return coreProperties + + @property + def author_text(self): + """ + The text in the `dc:creator` child element. + """ + return self._text_of_element('creator') + + @author_text.setter + def author_text(self, value): + self._set_element_text('creator', value) + + @property + def category_text(self): + return self._text_of_element('category') + + @category_text.setter + def category_text(self, value): + self._set_element_text('category', value) + + @property + def comments_text(self): + return self._text_of_element('description') + + @comments_text.setter + def comments_text(self, value): + self._set_element_text('description', value) + + @property + def contentStatus_text(self): + return self._text_of_element('contentStatus') + + @contentStatus_text.setter + def contentStatus_text(self, value): + self._set_element_text('contentStatus', value) + + @property + def created_datetime(self): + return self._datetime_of_element('created') + + @created_datetime.setter + def created_datetime(self, value): + self._set_element_datetime('created', value) + + @property + def identifier_text(self): + return self._text_of_element('identifier') + + @identifier_text.setter + def identifier_text(self, value): + self._set_element_text('identifier', value) + + @property + def keywords_text(self): + return self._text_of_element('keywords') + + @keywords_text.setter + def keywords_text(self, value): + self._set_element_text('keywords', value) + + @property + def language_text(self): + return self._text_of_element('language') + + @language_text.setter + def language_text(self, value): + self._set_element_text('language', value) + + @property + def lastModifiedBy_text(self): + return self._text_of_element('lastModifiedBy') + + @lastModifiedBy_text.setter + def lastModifiedBy_text(self, value): + self._set_element_text('lastModifiedBy', value) + + @property + def lastPrinted_datetime(self): + return self._datetime_of_element('lastPrinted') + + @lastPrinted_datetime.setter + def lastPrinted_datetime(self, value): + self._set_element_datetime('lastPrinted', value) + + @property + def modified_datetime(self): + return self._datetime_of_element('modified') + + @modified_datetime.setter + def modified_datetime(self, value): + self._set_element_datetime('modified', value) + + @property + def revision_number(self): + """ + Integer value of revision property. + """ + revision = self.revision + if revision is None: + return 0 + revision_str = revision.text + try: + revision = int(revision_str) + except ValueError: + # non-integer revision strings also resolve to 0 + revision = 0 + # as do negative integers + if revision < 0: + revision = 0 + return revision + + @revision_number.setter + def revision_number(self, value): + """ + Set revision property to string value of integer *value*. + """ + if not isinstance(value, int) or value < 1: + tmpl = "revision property requires positive int, got '%s'" + raise ValueError(tmpl % value) + revision = self.get_or_add_revision() + revision.text = str(value) + + @property + def subject_text(self): + return self._text_of_element('subject') + + @subject_text.setter + def subject_text(self, value): + self._set_element_text('subject', value) + + @property + def title_text(self): + return self._text_of_element('title') + + @title_text.setter + def title_text(self, value): + self._set_element_text('title', value) + + @property + def version_text(self): + return self._text_of_element('version') + + @version_text.setter + def version_text(self, value): + self._set_element_text('version', value) + + def _datetime_of_element(self, property_name): + element = getattr(self, property_name) + if element is None: + return None + datetime_str = element.text + try: + return self._parse_W3CDTF_to_datetime(datetime_str) + except ValueError: + # invalid datetime strings are ignored + return None + + def _get_or_add(self, prop_name): + """ + Return element returned by 'get_or_add_' method for *prop_name*. + """ + get_or_add_method_name = 'get_or_add_%s' % prop_name + get_or_add_method = getattr(self, get_or_add_method_name) + element = get_or_add_method() + return element + + @classmethod + def _offset_dt(cls, dt, offset_str): + """ + Return a |datetime| instance that is offset from datetime *dt* by + the timezone offset specified in *offset_str*, a string like + ``'-07:00'``. + """ + match = cls._offset_pattern.match(offset_str) + if match is None: + raise ValueError( + "'%s' is not a valid offset string" % offset_str + ) + sign, hours_str, minutes_str = match.groups() + sign_factor = -1 if sign == '+' else 1 + hours = int(hours_str) * sign_factor + minutes = int(minutes_str) * sign_factor + td = timedelta(hours=hours, minutes=minutes) + return dt + td + + _offset_pattern = re.compile('([+-])(\d\d):(\d\d)') + + @classmethod + def _parse_W3CDTF_to_datetime(cls, w3cdtf_str): + # valid W3CDTF date cases: + # yyyy e.g. '2003' + # yyyy-mm e.g. '2003-12' + # yyyy-mm-dd e.g. '2003-12-31' + # UTC timezone e.g. '2003-12-31T10:14:55Z' + # numeric timezone e.g. '2003-12-31T10:14:55-08:00' + templates = ( + '%Y-%m-%dT%H:%M:%S', + '%Y-%m-%d', + '%Y-%m', + '%Y', + ) + # strptime isn't smart enough to parse literal timezone offsets like + # '-07:30', so we have to do it ourselves + parseable_part = w3cdtf_str[:19] + offset_str = w3cdtf_str[19:] + dt = None + for tmpl in templates: + try: + dt = datetime.strptime(parseable_part, tmpl) + except ValueError: + continue + if dt is None: + tmpl = "could not parse W3CDTF datetime string '%s'" + raise ValueError(tmpl % w3cdtf_str) + if len(offset_str) == 6: + return cls._offset_dt(dt, offset_str) + return dt + + def _set_element_datetime(self, prop_name, value): + """ + Set date/time value of child element having *prop_name* to *value*. + """ + if not isinstance(value, datetime): + tmpl = ( + "property requires object, got %s" + ) + raise ValueError(tmpl % type(value)) + element = self._get_or_add(prop_name) + dt_str = value.strftime('%Y-%m-%dT%H:%M:%SZ') + element.text = dt_str + if prop_name in ('created', 'modified'): + # These two require an explicit 'xsi:type="dcterms:W3CDTF"' + # attribute. The first and last line are a hack required to add + # the xsi namespace to the root element rather than each child + # element in which it is referenced + self.set(qn('xsi:foo'), 'bar') + element.set(qn('xsi:type'), 'dcterms:W3CDTF') + del self.attrib[qn('xsi:foo')] + + def _set_element_text(self, prop_name, value): + """ + Set string value of *name* property to *value*. + """ + value = str(value) + if len(value) > 255: + tmpl = ( + "exceeded 255 char limit for property, got:\n\n'%s'" + ) + raise ValueError(tmpl % value) + element = self._get_or_add(prop_name) + element.text = value + + def _text_of_element(self, property_name): + """ + Return the text in the element matching *property_name*, or an empty + string if the element is not present or contains no text. + """ + element = getattr(self, property_name) + if element is None: + return '' + if element.text is None: + return '' + return element.text diff --git a/docx/oxml/parts/document.py b/docx/oxml/document.py similarity index 92% rename from docx/oxml/parts/document.py rename to docx/oxml/document.py index ff5eedb91..e1cb4ac55 100644 --- a/docx/oxml/parts/document.py +++ b/docx/oxml/document.py @@ -5,8 +5,7 @@ . """ -from ..table import CT_Tbl -from ..xmlchemy import BaseOxmlElement, ZeroOrOne, ZeroOrMore +from .xmlchemy import BaseOxmlElement, ZeroOrOne, ZeroOrMore class CT_Document(BaseOxmlElement): @@ -47,9 +46,6 @@ def add_section_break(self): p.set_sectPr(cloned_sectPr) return sentinel_sectPr - def _new_tbl(self): - return CT_Tbl.new() - def clear_content(self): """ Remove all content child elements from this element. Leave diff --git a/docx/oxml/ns.py b/docx/oxml/ns.py index d4b3014db..e6f6a4acc 100644 --- a/docx/oxml/ns.py +++ b/docx/oxml/ns.py @@ -10,6 +10,11 @@ nsmap = { 'a': ('http://schemas.openxmlformats.org/drawingml/2006/main'), 'c': ('http://schemas.openxmlformats.org/drawingml/2006/chart'), + 'cp': ('http://schemas.openxmlformats.org/package/2006/metadata/core-pr' + 'operties'), + 'dc': ('http://purl.org/dc/elements/1.1/'), + 'dcmitype': ('http://purl.org/dc/dcmitype/'), + 'dcterms': ('http://purl.org/dc/terms/'), 'dgm': ('http://schemas.openxmlformats.org/drawingml/2006/diagram'), 'pic': ('http://schemas.openxmlformats.org/drawingml/2006/picture'), 'r': ('http://schemas.openxmlformats.org/officeDocument/2006/relations' @@ -17,7 +22,8 @@ 'w': ('http://schemas.openxmlformats.org/wordprocessingml/2006/main'), 'wp': ('http://schemas.openxmlformats.org/drawingml/2006/wordprocessing' 'Drawing'), - 'xml': ('http://www.w3.org/XML/1998/namespace') + 'xml': ('http://www.w3.org/XML/1998/namespace'), + 'xsi': ('http://www.w3.org/2001/XMLSchema-instance'), } pfxmap = dict((value, key) for key, value in nsmap.items()) diff --git a/docx/oxml/parts/numbering.py b/docx/oxml/numbering.py similarity index 96% rename from docx/oxml/parts/numbering.py rename to docx/oxml/numbering.py index 31d97dbce..aeedfa9a0 100644 --- a/docx/oxml/parts/numbering.py +++ b/docx/oxml/numbering.py @@ -4,10 +4,10 @@ Custom element classes related to the numbering part """ -from .. import OxmlElement -from ..shared import CT_DecimalNumber -from ..simpletypes import ST_DecimalNumber -from ..xmlchemy import ( +from . import OxmlElement +from .shared import CT_DecimalNumber +from .simpletypes import ST_DecimalNumber +from .xmlchemy import ( BaseOxmlElement, OneAndOnlyOne, RequiredAttribute, ZeroOrMore, ZeroOrOne ) diff --git a/docx/oxml/parts/styles.py b/docx/oxml/parts/styles.py deleted file mode 100644 index ed3054f13..000000000 --- a/docx/oxml/parts/styles.py +++ /dev/null @@ -1,35 +0,0 @@ -# encoding: utf-8 - -""" -Custom element classes related to the styles part -""" - -from ..xmlchemy import BaseOxmlElement, ZeroOrMore, ZeroOrOne - - -class CT_Style(BaseOxmlElement): - """ - A ```` element, representing a style definition - """ - pPr = ZeroOrOne('w:pPr', successors=( - 'w:rPr', 'w:tblPr', 'w:trPr', 'w:tcPr', 'w:tblStylePr' - )) - - -class CT_Styles(BaseOxmlElement): - """ - ```` element, the root element of a styles part, i.e. - styles.xml - """ - style = ZeroOrMore('w:style', successors=()) - - def style_having_styleId(self, styleId): - """ - Return the ```` child element having ``styleId`` attribute - matching *styleId*. - """ - xpath = './w:style[@w:styleId="%s"]' % styleId - try: - return self.xpath(xpath)[0] - except IndexError: - raise KeyError('no element with styleId %d' % styleId) diff --git a/docx/oxml/shape.py b/docx/oxml/shape.py index ae58dd59d..77ca7db8a 100644 --- a/docx/oxml/shape.py +++ b/docx/oxml/shape.py @@ -74,6 +74,18 @@ def new(cls, cx, cy, shape_id, pic): inline.graphic.graphicData._insert_pic(pic) return inline + @classmethod + def new_pic_inline(cls, shape_id, rId, filename, cx, cy): + """ + Return a new `wp:inline` element containing the `pic:pic` element + specified by the argument values. + """ + pic_id = 0 # Word doesn't seem to use this, but does not omit it + pic = CT_Picture.new(pic_id, filename, rId, cx, cy) + inline = cls.new(cx, cy, shape_id, pic) + inline.graphic.graphicData._insert_pic(pic) + return inline + @classmethod def _inline_xml(cls): return ( diff --git a/docx/oxml/simpletypes.py b/docx/oxml/simpletypes.py index 07b51d533..400a23700 100644 --- a/docx/oxml/simpletypes.py +++ b/docx/oxml/simpletypes.py @@ -6,10 +6,12 @@ type in the associated XML schema. """ -from __future__ import absolute_import, print_function +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) from ..exceptions import InvalidXmlError -from ..shared import Emu, Twips +from ..shared import Emu, Pt, RGBColor, Twips class BaseSimpleType(object): @@ -54,34 +56,45 @@ def validate_string(cls, value): ) -class BaseStringType(BaseSimpleType): +class BaseIntType(BaseSimpleType): @classmethod def convert_from_xml(cls, str_value): - return str_value + return int(str_value) @classmethod def convert_to_xml(cls, value): - return value + return str(value) @classmethod def validate(cls, value): - cls.validate_string(value) + cls.validate_int(value) -class BaseIntType(BaseSimpleType): +class BaseStringType(BaseSimpleType): @classmethod def convert_from_xml(cls, str_value): - return int(str_value) + return str_value @classmethod def convert_to_xml(cls, value): - return str(value) + return value @classmethod def validate(cls, value): - cls.validate_int(value) + cls.validate_string(value) + + +class BaseStringEnumerationType(BaseStringType): + + @classmethod + def validate(cls, value): + cls.validate_string(value) + if value not in cls._members: + raise ValueError( + "must be one of %s, got '%s'" % (cls._members, value) + ) class XsdAnyUri(BaseStringType): @@ -144,6 +157,12 @@ class XsdString(BaseStringType): pass +class XsdStringEnumeration(BaseStringEnumerationType): + """ + Set of enumerated xsd:string values. + """ + + class XsdToken(BaseStringType): """ xsd:string with whitespace collapsing, e.g. multiple spaces reduced to @@ -218,6 +237,68 @@ class ST_DrawingElementId(XsdUnsignedInt): pass +class ST_HexColor(BaseStringType): + + @classmethod + def convert_from_xml(cls, str_value): + if str_value == 'auto': + return ST_HexColorAuto.AUTO + return RGBColor.from_string(str_value) + + @classmethod + def convert_to_xml(cls, value): + """ + Keep alpha hex numerals all uppercase just for consistency. + """ + # expecting 3-tuple of ints in range 0-255 + return '%02X%02X%02X' % value + + @classmethod + def validate(cls, value): + # must be an RGBColor object --- + if not isinstance(value, RGBColor): + raise ValueError( + "rgb color value must be RGBColor object, got %s %s" + % (type(value), value) + ) + + +class ST_HexColorAuto(XsdStringEnumeration): + """ + Value for `w:color/[@val="auto"] attribute setting + """ + AUTO = 'auto' + + _members = (AUTO,) + + +class ST_HpsMeasure(XsdUnsignedLong): + """ + Half-point measure, e.g. 24.0 represents 12.0 points. + """ + @classmethod + def convert_from_xml(cls, str_value): + if 'm' in str_value or 'n' in str_value or 'p' in str_value: + return ST_UniversalMeasure.convert_from_xml(str_value) + return Pt(int(str_value)/2.0) + + @classmethod + def convert_to_xml(cls, value): + emu = Emu(value) + half_points = int(emu.pt * 2) + return str(half_points) + + +class ST_Merge(XsdStringEnumeration): + """ + Valid values for attribute + """ + CONTINUE = 'continue' + RESTART = 'restart' + + _members = (CONTINUE, RESTART) + + class ST_OnOff(XsdBoolean): @classmethod @@ -315,3 +396,14 @@ def convert_from_xml(cls, str_value): }[units_part] emu_value = Emu(int(round(quantity * multiplier))) return emu_value + + +class ST_VerticalAlignRun(XsdStringEnumeration): + """ + Valid values for `w:vertAlign/@val`. + """ + BASELINE = 'baseline' + SUPERSCRIPT = 'superscript' + SUBSCRIPT = 'subscript' + + _members = (BASELINE, SUPERSCRIPT, SUBSCRIPT) diff --git a/docx/oxml/styles.py b/docx/oxml/styles.py new file mode 100644 index 000000000..6f27e45eb --- /dev/null +++ b/docx/oxml/styles.py @@ -0,0 +1,351 @@ +# encoding: utf-8 + +""" +Custom element classes related to the styles part +""" + +from ..enum.style import WD_STYLE_TYPE +from .simpletypes import ST_DecimalNumber, ST_OnOff, ST_String +from .xmlchemy import ( + BaseOxmlElement, OptionalAttribute, RequiredAttribute, ZeroOrMore, + ZeroOrOne +) + + +def styleId_from_name(name): + """ + Return the style id corresponding to *name*, taking into account + special-case names such as 'Heading 1'. + """ + return { + 'caption': 'Caption', + 'heading 1': 'Heading1', + 'heading 2': 'Heading2', + 'heading 3': 'Heading3', + 'heading 4': 'Heading4', + 'heading 5': 'Heading5', + 'heading 6': 'Heading6', + 'heading 7': 'Heading7', + 'heading 8': 'Heading8', + 'heading 9': 'Heading9', + }.get(name, name.replace(' ', '')) + + +class CT_LatentStyles(BaseOxmlElement): + """ + `w:latentStyles` element, defining behavior defaults for latent styles + and containing `w:lsdException` child elements that each override those + defaults for a named latent style. + """ + lsdException = ZeroOrMore('w:lsdException', successors=()) + + count = OptionalAttribute('w:count', ST_DecimalNumber) + defLockedState = OptionalAttribute('w:defLockedState', ST_OnOff) + defQFormat = OptionalAttribute('w:defQFormat', ST_OnOff) + defSemiHidden = OptionalAttribute('w:defSemiHidden', ST_OnOff) + defUIPriority = OptionalAttribute('w:defUIPriority', ST_DecimalNumber) + defUnhideWhenUsed = OptionalAttribute('w:defUnhideWhenUsed', ST_OnOff) + + def bool_prop(self, attr_name): + """ + Return the boolean value of the attribute having *attr_name*, or + |False| if not present. + """ + value = getattr(self, attr_name) + if value is None: + return False + return value + + def get_by_name(self, name): + """ + Return the `w:lsdException` child having *name*, or |None| if not + found. + """ + found = self.xpath('w:lsdException[@w:name="%s"]' % name) + if not found: + return None + return found[0] + + def set_bool_prop(self, attr_name, value): + """ + Set the on/off attribute having *attr_name* to *value*. + """ + setattr(self, attr_name, bool(value)) + + +class CT_LsdException(BaseOxmlElement): + """ + ```` element, defining override visibility behaviors for + a named latent style. + """ + locked = OptionalAttribute('w:locked', ST_OnOff) + name = RequiredAttribute('w:name', ST_String) + qFormat = OptionalAttribute('w:qFormat', ST_OnOff) + semiHidden = OptionalAttribute('w:semiHidden', ST_OnOff) + uiPriority = OptionalAttribute('w:uiPriority', ST_DecimalNumber) + unhideWhenUsed = OptionalAttribute('w:unhideWhenUsed', ST_OnOff) + + def delete(self): + """ + Remove this `w:lsdException` element from the XML document. + """ + self.getparent().remove(self) + + def on_off_prop(self, attr_name): + """ + Return the boolean value of the attribute having *attr_name*, or + |None| if not present. + """ + return getattr(self, attr_name) + + def set_on_off_prop(self, attr_name, value): + """ + Set the on/off attribute having *attr_name* to *value*. + """ + setattr(self, attr_name, value) + + +class CT_Style(BaseOxmlElement): + """ + A ```` element, representing a style definition + """ + _tag_seq = ( + 'w:name', 'w:aliases', 'w:basedOn', 'w:next', 'w:link', + 'w:autoRedefine', 'w:hidden', 'w:uiPriority', 'w:semiHidden', + 'w:unhideWhenUsed', 'w:qFormat', 'w:locked', 'w:personal', + 'w:personalCompose', 'w:personalReply', 'w:rsid', 'w:pPr', 'w:rPr', + 'w:tblPr', 'w:trPr', 'w:tcPr', 'w:tblStylePr' + ) + name = ZeroOrOne('w:name', successors=_tag_seq[1:]) + basedOn = ZeroOrOne('w:basedOn', successors=_tag_seq[3:]) + next = ZeroOrOne('w:next', successors=_tag_seq[4:]) + uiPriority = ZeroOrOne('w:uiPriority', successors=_tag_seq[8:]) + semiHidden = ZeroOrOne('w:semiHidden', successors=_tag_seq[9:]) + unhideWhenUsed = ZeroOrOne('w:unhideWhenUsed', successors=_tag_seq[10:]) + qFormat = ZeroOrOne('w:qFormat', successors=_tag_seq[11:]) + locked = ZeroOrOne('w:locked', successors=_tag_seq[12:]) + pPr = ZeroOrOne('w:pPr', successors=_tag_seq[17:]) + rPr = ZeroOrOne('w:rPr', successors=_tag_seq[18:]) + del _tag_seq + + type = OptionalAttribute('w:type', WD_STYLE_TYPE) + styleId = OptionalAttribute('w:styleId', ST_String) + default = OptionalAttribute('w:default', ST_OnOff) + customStyle = OptionalAttribute('w:customStyle', ST_OnOff) + + @property + def basedOn_val(self): + """ + Value of `w:basedOn/@w:val` or |None| if not present. + """ + basedOn = self.basedOn + if basedOn is None: + return None + return basedOn.val + + @basedOn_val.setter + def basedOn_val(self, value): + if value is None: + self._remove_basedOn() + else: + self.get_or_add_basedOn().val = value + + @property + def base_style(self): + """ + Sibling CT_Style element this style is based on or |None| if no base + style or base style not found. + """ + basedOn = self.basedOn + if basedOn is None: + return None + styles = self.getparent() + base_style = styles.get_by_id(basedOn.val) + if base_style is None: + return None + return base_style + + def delete(self): + """ + Remove this `w:style` element from its parent `w:styles` element. + """ + self.getparent().remove(self) + + @property + def locked_val(self): + """ + Value of `w:locked/@w:val` or |False| if not present. + """ + locked = self.locked + if locked is None: + return False + return locked.val + + @locked_val.setter + def locked_val(self, value): + self._remove_locked() + if bool(value) is True: + locked = self._add_locked() + locked.val = value + + @property + def name_val(self): + """ + Value of ```` child or |None| if not present. + """ + name = self.name + if name is None: + return None + return name.val + + @name_val.setter + def name_val(self, value): + self._remove_name() + if value is not None: + name = self._add_name() + name.val = value + + @property + def next_style(self): + """ + Sibling CT_Style element identified by the value of `w:name/@w:val` + or |None| if no value is present or no style with that style id + is found. + """ + next = self.next + if next is None: + return None + styles = self.getparent() + return styles.get_by_id(next.val) # None if not found + + @property + def qFormat_val(self): + """ + Value of `w:qFormat/@w:val` or |False| if not present. + """ + qFormat = self.qFormat + if qFormat is None: + return False + return qFormat.val + + @qFormat_val.setter + def qFormat_val(self, value): + self._remove_qFormat() + if bool(value): + self._add_qFormat() + + @property + def semiHidden_val(self): + """ + Value of ```` child or |False| if not present. + """ + semiHidden = self.semiHidden + if semiHidden is None: + return False + return semiHidden.val + + @semiHidden_val.setter + def semiHidden_val(self, value): + self._remove_semiHidden() + if bool(value) is True: + semiHidden = self._add_semiHidden() + semiHidden.val = value + + @property + def uiPriority_val(self): + """ + Value of ```` child or |None| if not present. + """ + uiPriority = self.uiPriority + if uiPriority is None: + return None + return uiPriority.val + + @uiPriority_val.setter + def uiPriority_val(self, value): + self._remove_uiPriority() + if value is not None: + uiPriority = self._add_uiPriority() + uiPriority.val = value + + @property + def unhideWhenUsed_val(self): + """ + Value of `w:unhideWhenUsed/@w:val` or |False| if not present. + """ + unhideWhenUsed = self.unhideWhenUsed + if unhideWhenUsed is None: + return False + return unhideWhenUsed.val + + @unhideWhenUsed_val.setter + def unhideWhenUsed_val(self, value): + self._remove_unhideWhenUsed() + if bool(value) is True: + unhideWhenUsed = self._add_unhideWhenUsed() + unhideWhenUsed.val = value + + +class CT_Styles(BaseOxmlElement): + """ + ```` element, the root element of a styles part, i.e. + styles.xml + """ + _tag_seq = ('w:docDefaults', 'w:latentStyles', 'w:style') + latentStyles = ZeroOrOne('w:latentStyles', successors=_tag_seq[2:]) + style = ZeroOrMore('w:style', successors=()) + del _tag_seq + + def add_style_of_type(self, name, style_type, builtin): + """ + Return a newly added `w:style` element having *name* and + *style_type*. `w:style/@customStyle` is set based on the value of + *builtin*. + """ + style = self.add_style() + style.type = style_type + style.customStyle = None if builtin else True + style.styleId = styleId_from_name(name) + style.name_val = name + return style + + def default_for(self, style_type): + """ + Return `w:style[@w:type="*{style_type}*][-1]` or |None| if not found. + """ + default_styles_for_type = [ + s for s in self._iter_styles() + if s.type == style_type and s.default + ] + if not default_styles_for_type: + return None + # spec calls for last default in document order + return default_styles_for_type[-1] + + def get_by_id(self, styleId): + """ + Return the ```` child element having ``styleId`` attribute + matching *styleId*, or |None| if not found. + """ + xpath = 'w:style[@w:styleId="%s"]' % styleId + try: + return self.xpath(xpath)[0] + except IndexError: + return None + + def get_by_name(self, name): + """ + Return the ```` child element having ```` child + element with value *name*, or |None| if not found. + """ + xpath = 'w:style[w:name/@w:val="%s"]' % name + try: + return self.xpath(xpath)[0] + except IndexError: + return None + + def _iter_styles(self): + """ + Generate each of the `w:style` child elements in document order. + """ + return (style for style in self.xpath('w:style')) diff --git a/docx/oxml/table.py b/docx/oxml/table.py index f2fbd540f..30d349373 100644 --- a/docx/oxml/table.py +++ b/docx/oxml/table.py @@ -4,13 +4,16 @@ Custom element classes for tables """ -from __future__ import absolute_import, print_function, unicode_literals +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) from . import parse_xml -from .ns import nsdecls +from ..exceptions import InvalidSpanError +from .ns import nsdecls, qn from ..shared import Emu, Twips from .simpletypes import ( - ST_TblLayoutType, ST_TblWidth, ST_TwipsMeasure, XsdInt + ST_Merge, ST_TblLayoutType, ST_TblWidth, ST_TwipsMeasure, XsdInt ) from .xmlchemy import ( BaseOxmlElement, OneAndOnlyOne, OneOrMore, OptionalAttribute, @@ -22,8 +25,42 @@ class CT_Row(BaseOxmlElement): """ ```` element """ + tblPrEx = ZeroOrOne('w:tblPrEx') # custom inserter below + trPr = ZeroOrOne('w:trPr') # custom inserter below tc = ZeroOrMore('w:tc') + def tc_at_grid_col(self, idx): + """ + The ```` element appearing at grid column *idx*. Raises + |ValueError| if no ``w:tc`` element begins at that grid column. + """ + grid_col = 0 + for tc in self.tc_lst: + if grid_col == idx: + return tc + grid_col += tc.grid_span + if grid_col > idx: + raise ValueError('no cell on grid column %d' % idx) + raise ValueError('index out of bounds') + + @property + def tr_idx(self): + """ + The index of this ```` element within its parent ```` + element. + """ + return self.getparent().tr_lst.index(self) + + def _insert_tblPrEx(self, tblPrEx): + self.insert(0, tblPrEx) + + def _insert_trPr(self, trPr): + tblPrEx = self.tblPrEx + if tblPrEx is not None: + tblPrEx.addnext(trPr) + else: + self.insert(0, trPr) + def _new_tc(self): return CT_Tc.new() @@ -36,26 +73,127 @@ class CT_Tbl(BaseOxmlElement): tblGrid = OneAndOnlyOne('w:tblGrid') tr = ZeroOrMore('w:tr') + @property + def bidiVisual_val(self): + """ + Value of `w:tblPr/w:bidiVisual/@w:val` or |None| if not present. + Controls whether table cells are displayed right-to-left or + left-to-right. + """ + bidiVisual = self.tblPr.bidiVisual + if bidiVisual is None: + return None + return bidiVisual.val + + @bidiVisual_val.setter + def bidiVisual_val(self, value): + tblPr = self.tblPr + if value is None: + tblPr._remove_bidiVisual() + else: + tblPr.get_or_add_bidiVisual().val = value + + @property + def col_count(self): + """ + The number of grid columns in this table. + """ + return len(self.tblGrid.gridCol_lst) + + def iter_tcs(self): + """ + Generate each of the `w:tc` elements in this table, left to right and + top to bottom. Each cell in the first row is generated, followed by + each cell in the second row, etc. + """ + for tr in self.tr_lst: + for tc in tr.tc_lst: + yield tc + @classmethod - def new(cls): + def new_tbl(cls, rows, cols, width): """ - Return a new ```` element, containing the required - ```` and ```` child elements. + Return a new `w:tbl` element having *rows* rows and *cols* columns + with *width* distributed evenly between the columns. """ - tbl = parse_xml(cls._tbl_xml()) - return tbl + return parse_xml(cls._tbl_xml(rows, cols, width)) + + @property + def tblStyle_val(self): + """ + Value of `w:tblPr/w:tblStyle/@w:val` (a table style id) or |None| if + not present. + """ + tblStyle = self.tblPr.tblStyle + if tblStyle is None: + return None + return tblStyle.val + + @tblStyle_val.setter + def tblStyle_val(self, styleId): + """ + Set the value of `w:tblPr/w:tblStyle/@w:val` (a table style id) to + *styleId*. If *styleId* is None, remove the `w:tblStyle` element. + """ + tblPr = self.tblPr + tblPr._remove_tblStyle() + if styleId is None: + return + tblPr._add_tblStyle().val = styleId @classmethod - def _tbl_xml(cls): + def _tbl_xml(cls, rows, cols, width): + col_width = Emu(width/cols) if cols > 0 else Emu(0) return ( '\n' ' \n' ' \n' + ' \n' ' \n' - ' \n' - '' % nsdecls('w') + '%s' # tblGrid + '%s' # trs + '\n' + ) % ( + nsdecls('w'), + cls._tblGrid_xml(cols, col_width), + cls._trs_xml(rows, cols, col_width) ) + @classmethod + def _tblGrid_xml(cls, col_count, col_width): + xml = ' \n' + for i in range(col_count): + xml += ' \n' % col_width.twips + xml += ' \n' + return xml + + @classmethod + def _trs_xml(cls, row_count, col_count, col_width): + xml = '' + for i in range(row_count): + xml += ( + ' \n' + '%s' + ' \n' + ) % cls._tcs_xml(col_count, col_width) + return xml + + @classmethod + def _tcs_xml(cls, col_count, col_width): + xml = '' + for i in range(col_count): + xml += ( + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ) % col_width.twips + return xml + class CT_TblGrid(BaseOxmlElement): """ @@ -72,6 +210,14 @@ class CT_TblGridCol(BaseOxmlElement): """ w = OptionalAttribute('w:w', ST_TwipsMeasure) + @property + def gridCol_idx(self): + """ + The index of this ```` element within its parent + ```` element. + """ + return self.getparent().gridCol_lst.index(self) + class CT_TblLayoutType(BaseOxmlElement): """ @@ -86,16 +232,38 @@ class CT_TblPr(BaseOxmlElement): ```` element, child of ````, holds child elements that define table properties such as style and borders. """ - tblStyle = ZeroOrOne('w:tblStyle', successors=( - 'w:tblpPr', 'w:tblOverlap', 'w:bidiVisual', 'w:tblStyleRowBandSize', - 'w:tblStyleColBandSize', 'w:tblW', 'w:jc', 'w:tblCellSpacing', - 'w:tblInd', 'w:tblBorders', 'w:shd', 'w:tblLayout', 'w:tblCellMar', - 'w:tblLook', 'w:tblCaption', 'w:tblDescription', 'w:tblPrChange' - )) - tblLayout = ZeroOrOne('w:tblLayout', successors=( + _tag_seq = ( + 'w:tblStyle', 'w:tblpPr', 'w:tblOverlap', 'w:bidiVisual', + 'w:tblStyleRowBandSize', 'w:tblStyleColBandSize', 'w:tblW', 'w:jc', + 'w:tblCellSpacing', 'w:tblInd', 'w:tblBorders', 'w:shd', 'w:tblLayout', 'w:tblCellMar', 'w:tblLook', 'w:tblCaption', 'w:tblDescription', 'w:tblPrChange' - )) + ) + tblStyle = ZeroOrOne('w:tblStyle', successors=_tag_seq[1:]) + bidiVisual = ZeroOrOne('w:bidiVisual', successors=_tag_seq[4:]) + jc = ZeroOrOne('w:jc', successors=_tag_seq[8:]) + tblLayout = ZeroOrOne('w:tblLayout', successors=_tag_seq[13:]) + del _tag_seq + + @property + def alignment(self): + """ + Member of :ref:`WdRowAlignment` enumeration or |None|, based on the + contents of the `w:val` attribute of `./w:jc`. |None| if no `w:jc` + element is present. + """ + jc = self.jc + if jc is None: + return None + return jc.val + + @alignment.setter + def alignment(self, value): + self._remove_jc() + if value is None: + return + jc = self.get_or_add_jc() + jc.val = value @property def autofit(self): @@ -167,17 +335,19 @@ class CT_Tc(BaseOxmlElement): p = OneOrMore('w:p') tbl = OneOrMore('w:tbl') - def _insert_tcPr(self, tcPr): + @property + def bottom(self): """ - ``tcPr`` has a bunch of successors, but it comes first if it appears, - so just overriding and using insert(0, ...) rather than spelling out - successors. + The row index that marks the bottom extent of the vertical span of + this cell. This is one greater than the index of the bottom-most row + of the span, similar to how a slice of the cell's rows would be + specified. """ - self.insert(0, tcPr) - return tcPr - - def _new_tbl(self): - return CT_Tbl.new() + if self.vMerge is not None: + tc_below = self._tc_below + if tc_below is not None and tc_below.vMerge == ST_Merge.CONTINUE: + return tc_below.bottom + return self._tr_idx + 1 def clear_content(self): """ @@ -193,6 +363,50 @@ def clear_content(self): new_children.append(tcPr) self[:] = new_children + @property + def grid_span(self): + """ + The integer number of columns this cell spans. Determined by + ./w:tcPr/w:gridSpan/@val, it defaults to 1. + """ + tcPr = self.tcPr + if tcPr is None: + return 1 + return tcPr.grid_span + + @grid_span.setter + def grid_span(self, value): + tcPr = self.get_or_add_tcPr() + tcPr.grid_span = value + + def iter_block_items(self): + """ + Generate a reference to each of the block-level content elements in + this cell, in the order they appear. + """ + block_item_tags = (qn('w:p'), qn('w:tbl'), qn('w:sdt')) + for child in self: + if child.tag in block_item_tags: + yield child + + @property + def left(self): + """ + The grid column index at which this ```` element appears. + """ + return self._grid_col + + def merge(self, other_tc): + """ + Return the top-left ```` element of a new span formed by + merging the rectangular region defined by using this tc element and + *other_tc* as diagonal corners. + """ + top, left, height, width = self._span_dimensions(other_tc) + top_tc = self._tbl.tr_lst[top].tc_at_grid_col(left) + top_tc._grow_to(width, height) + return top_tc + @classmethod def new(cls): """ @@ -205,6 +419,41 @@ def new(cls): '' % nsdecls('w') ) + @property + def right(self): + """ + The grid column index that marks the right-side extent of the + horizontal span of this cell. This is one greater than the index of + the right-most column of the span, similar to how a slice of the + cell's columns would be specified. + """ + return self._grid_col + self.grid_span + + @property + def top(self): + """ + The top-most row index in the vertical span of this cell. + """ + if self.vMerge is None or self.vMerge == ST_Merge.RESTART: + return self._tr_idx + return self._tc_above.top + + @property + def vMerge(self): + """ + The value of the ./w:tcPr/w:vMerge/@val attribute, or |None| if the + w:vMerge element is not present. + """ + tcPr = self.tcPr + if tcPr is None: + return None + return tcPr.vMerge_val + + @vMerge.setter + def vMerge(self, value): + tcPr = self.get_or_add_tcPr() + tcPr.vMerge_val = value + @property def width(self): """ @@ -221,17 +470,294 @@ def width(self, value): tcPr = self.get_or_add_tcPr() tcPr.width = value + def _add_width_of(self, other_tc): + """ + Add the width of *other_tc* to this cell. Does nothing if either this + tc or *other_tc* does not have a specified width. + """ + if self.width and other_tc.width: + self.width += other_tc.width + + @property + def _grid_col(self): + """ + The grid column at which this cell begins. + """ + tr = self._tr + idx = tr.tc_lst.index(self) + preceding_tcs = tr.tc_lst[:idx] + return sum(tc.grid_span for tc in preceding_tcs) + + def _grow_to(self, width, height, top_tc=None): + """ + Grow this cell to *width* grid columns and *height* rows by expanding + horizontal spans and creating continuation cells to form vertical + spans. + """ + def vMerge_val(top_tc): + if top_tc is not self: + return ST_Merge.CONTINUE + if height == 1: + return None + return ST_Merge.RESTART + + top_tc = self if top_tc is None else top_tc + self._span_to_width(width, top_tc, vMerge_val(top_tc)) + if height > 1: + self._tc_below._grow_to(width, height-1, top_tc) + + def _insert_tcPr(self, tcPr): + """ + ``tcPr`` has a bunch of successors, but it comes first if it appears, + so just overriding and using insert(0, ...) rather than spelling out + successors. + """ + self.insert(0, tcPr) + return tcPr + + @property + def _is_empty(self): + """ + True if this cell contains only a single empty ```` element. + """ + block_items = list(self.iter_block_items()) + if len(block_items) > 1: + return False + p = block_items[0] # cell must include at least one element + if len(p.r_lst) == 0: + return True + return False + + def _move_content_to(self, other_tc): + """ + Append the content of this cell to *other_tc*, leaving this cell with + a single empty ```` element. + """ + if other_tc is self: + return + if self._is_empty: + return + other_tc._remove_trailing_empty_p() + # appending moves each element from self to other_tc + for block_element in self.iter_block_items(): + other_tc.append(block_element) + # add back the required minimum single empty element + self.append(self._new_p()) + + def _new_tbl(self): + return CT_Tbl.new() + + @property + def _next_tc(self): + """ + The `w:tc` element immediately following this one in this row, or + |None| if this is the last `w:tc` element in the row. + """ + following_tcs = self.xpath('./following-sibling::w:tc') + return following_tcs[0] if following_tcs else None + + def _remove(self): + """ + Remove this `w:tc` element from the XML tree. + """ + self.getparent().remove(self) + + def _remove_trailing_empty_p(self): + """ + Remove the last content element from this cell if it is an empty + ```` element. + """ + block_items = list(self.iter_block_items()) + last_content_elm = block_items[-1] + if last_content_elm.tag != qn('w:p'): + return + p = last_content_elm + if len(p.r_lst) > 0: + return + self.remove(p) + + def _span_dimensions(self, other_tc): + """ + Return a (top, left, height, width) 4-tuple specifying the extents of + the merged cell formed by using this tc and *other_tc* as opposite + corner extents. + """ + def raise_on_inverted_L(a, b): + if a.top == b.top and a.bottom != b.bottom: + raise InvalidSpanError('requested span not rectangular') + if a.left == b.left and a.right != b.right: + raise InvalidSpanError('requested span not rectangular') + + def raise_on_tee_shaped(a, b): + top_most, other = (a, b) if a.top < b.top else (b, a) + if top_most.top < other.top and top_most.bottom > other.bottom: + raise InvalidSpanError('requested span not rectangular') + + left_most, other = (a, b) if a.left < b.left else (b, a) + if left_most.left < other.left and left_most.right > other.right: + raise InvalidSpanError('requested span not rectangular') + + raise_on_inverted_L(self, other_tc) + raise_on_tee_shaped(self, other_tc) + + top = min(self.top, other_tc.top) + left = min(self.left, other_tc.left) + bottom = max(self.bottom, other_tc.bottom) + right = max(self.right, other_tc.right) + + return top, left, bottom - top, right - left + + def _span_to_width(self, grid_width, top_tc, vMerge): + """ + Incorporate and then remove `w:tc` elements to the right of this one + until this cell spans *grid_width*. Raises |ValueError| if + *grid_width* cannot be exactly achieved, such as when a merged cell + would drive the span width greater than *grid_width* or if not enough + grid columns are available to make this cell that wide. All content + from incorporated cells is appended to *top_tc*. The val attribute of + the vMerge element on the single remaining cell is set to *vMerge*. + If *vMerge* is |None|, the vMerge element is removed if present. + """ + self._move_content_to(top_tc) + while self.grid_span < grid_width: + self._swallow_next_tc(grid_width, top_tc) + self.vMerge = vMerge + + def _swallow_next_tc(self, grid_width, top_tc): + """ + Extend the horizontal span of this `w:tc` element to incorporate the + following `w:tc` element in the row and then delete that following + `w:tc` element. Any content in the following `w:tc` element is + appended to the content of *top_tc*. The width of the following + `w:tc` element is added to this one, if present. Raises + |InvalidSpanError| if the width of the resulting cell is greater than + *grid_width* or if there is no next `` element in the row. + """ + def raise_on_invalid_swallow(next_tc): + if next_tc is None: + raise InvalidSpanError('not enough grid columns') + if self.grid_span + next_tc.grid_span > grid_width: + raise InvalidSpanError('span is not rectangular') + + next_tc = self._next_tc + raise_on_invalid_swallow(next_tc) + next_tc._move_content_to(top_tc) + self._add_width_of(next_tc) + self.grid_span += next_tc.grid_span + next_tc._remove() + + @property + def _tbl(self): + """ + The tbl element this tc element appears in. + """ + return self.xpath('./ancestor::w:tbl')[0] + + @property + def _tc_above(self): + """ + The `w:tc` element immediately above this one in its grid column. + """ + return self._tr_above.tc_at_grid_col(self._grid_col) + + @property + def _tc_below(self): + """ + The tc element immediately below this one in its grid column. + """ + tr_below = self._tr_below + if tr_below is None: + return None + return tr_below.tc_at_grid_col(self._grid_col) + + @property + def _tr(self): + """ + The tr element this tc element appears in. + """ + return self.xpath('./ancestor::w:tr')[0] + + @property + def _tr_above(self): + """ + The tr element prior in sequence to the tr this cell appears in. + Raises |ValueError| if called on a cell in the top-most row. + """ + tr_lst = self._tbl.tr_lst + tr_idx = tr_lst.index(self._tr) + if tr_idx == 0: + raise ValueError('no tr above topmost tr') + return tr_lst[tr_idx-1] + + @property + def _tr_below(self): + """ + The tr element next in sequence after the tr this cell appears in, or + |None| if this cell appears in the last row. + """ + tr_lst = self._tbl.tr_lst + tr_idx = tr_lst.index(self._tr) + try: + return tr_lst[tr_idx+1] + except IndexError: + return None + + @property + def _tr_idx(self): + """ + The row index of the tr element this tc element appears in. + """ + return self._tbl.tr_lst.index(self._tr) + class CT_TcPr(BaseOxmlElement): """ ```` element, defining table cell properties """ - tcW = ZeroOrOne('w:tcW', successors=( - 'w:gridSpan', 'w:hMerge', 'w:vMerge', 'w:tcBorders', 'w:shd', - 'w:noWrap', 'w:tcMar', 'w:textDirection', 'w:tcFitText', 'w:vAlign', - 'w:hideMark', 'w:headers', 'w:cellIns', 'w:cellDel', 'w:cellMerge', - 'w:tcPrChange' - )) + _tag_seq = ( + 'w:cnfStyle', 'w:tcW', 'w:gridSpan', 'w:hMerge', 'w:vMerge', + 'w:tcBorders', 'w:shd', 'w:noWrap', 'w:tcMar', 'w:textDirection', + 'w:tcFitText', 'w:vAlign', 'w:hideMark', 'w:headers', 'w:cellIns', + 'w:cellDel', 'w:cellMerge', 'w:tcPrChange' + ) + tcW = ZeroOrOne('w:tcW', successors=_tag_seq[2:]) + gridSpan = ZeroOrOne('w:gridSpan', successors=_tag_seq[3:]) + vMerge = ZeroOrOne('w:vMerge', successors=_tag_seq[5:]) + del _tag_seq + + @property + def grid_span(self): + """ + The integer number of columns this cell spans. Determined by + ./w:gridSpan/@val, it defaults to 1. + """ + gridSpan = self.gridSpan + if gridSpan is None: + return 1 + return gridSpan.val + + @grid_span.setter + def grid_span(self, value): + self._remove_gridSpan() + if value > 1: + self.get_or_add_gridSpan().val = value + + @property + def vMerge_val(self): + """ + The value of the ./w:vMerge/@val attribute, or |None| if the + w:vMerge element is not present. + """ + vMerge = self.vMerge + if vMerge is None: + return None + return vMerge.val + + @vMerge_val.setter + def vMerge_val(self, value): + self._remove_vMerge() + if value is not None: + self._add_vMerge().val = value @property def width(self): @@ -248,3 +774,10 @@ def width(self): def width(self, value): tcW = self.get_or_add_tcW() tcW.width = value + + +class CT_VMerge(BaseOxmlElement): + """ + ```` element, specifying vertical merging behavior of a cell. + """ + val = OptionalAttribute('w:val', ST_Merge, default=ST_Merge.CONTINUE) diff --git a/docx/oxml/text.py b/docx/oxml/text.py deleted file mode 100644 index 9fdd1d64b..000000000 --- a/docx/oxml/text.py +++ /dev/null @@ -1,431 +0,0 @@ -# encoding: utf-8 - -""" -Custom element classes related to text, such as paragraph (CT_P) and runs -(CT_R). -""" - -from ..enum.text import WD_ALIGN_PARAGRAPH, WD_UNDERLINE -from .ns import qn -from .simpletypes import ST_BrClear, ST_BrType -from .xmlchemy import ( - BaseOxmlElement, OptionalAttribute, OxmlElement, RequiredAttribute, - ZeroOrMore, ZeroOrOne -) - - -class CT_Br(BaseOxmlElement): - """ - ```` element, indicating a line, page, or column break in a run. - """ - type = OptionalAttribute('w:type', ST_BrType) - clear = OptionalAttribute('w:clear', ST_BrClear) - - -class CT_Jc(BaseOxmlElement): - """ - ```` element, specifying paragraph justification. - """ - val = RequiredAttribute('w:val', WD_ALIGN_PARAGRAPH) - - -class CT_P(BaseOxmlElement): - """ - ```` element, containing the properties and text for a paragraph. - """ - pPr = ZeroOrOne('w:pPr') - r = ZeroOrMore('w:r') - - def _insert_pPr(self, pPr): - self.insert(0, pPr) - return pPr - - def add_p_before(self): - """ - Return a new ```` element inserted directly prior to this one. - """ - new_p = OxmlElement('w:p') - self.addprevious(new_p) - return new_p - - @property - def alignment(self): - """ - The value of the ```` grandchild element or |None| if not - present. - """ - pPr = self.pPr - if pPr is None: - return None - return pPr.alignment - - @alignment.setter - def alignment(self, value): - pPr = self.get_or_add_pPr() - pPr.alignment = value - - def clear_content(self): - """ - Remove all child elements, except the ```` element if present. - """ - for child in self[:]: - if child.tag == qn('w:pPr'): - continue - self.remove(child) - - def set_sectPr(self, sectPr): - """ - Unconditionally replace or add *sectPr* as a grandchild in the - correct sequence. - """ - pPr = self.get_or_add_pPr() - pPr._remove_sectPr() - pPr._insert_sectPr(sectPr) - - @property - def style(self): - """ - String contained in w:val attribute of ./w:pPr/w:pStyle grandchild, - or |None| if not present. - """ - pPr = self.pPr - if pPr is None: - return None - return pPr.style - - @style.setter - def style(self, style): - pPr = self.get_or_add_pPr() - pPr.style = style - - -class CT_PPr(BaseOxmlElement): - """ - ```` element, containing the properties for a paragraph. - """ - __child_sequence__ = ( - 'w:pStyle', 'w:keepNext', 'w:keepLines', 'w:pageBreakBefore', - 'w:framePr', 'w:widowControl', 'w:numPr', 'w:suppressLineNumbers', - 'w:pBdr', 'w:shd', 'w:tabs', 'w:suppressAutoHyphens', 'w:kinsoku', - 'w:wordWrap', 'w:overflowPunct', 'w:topLinePunct', 'w:autoSpaceDE', - 'w:autoSpaceDN', 'w:bidi', 'w:adjustRightInd', 'w:snapToGrid', - 'w:spacing', 'w:ind', 'w:contextualSpacing', 'w:mirrorIndents', - 'w:suppressOverlap', 'w:jc', 'w:textDirection', 'w:textAlignment', - 'w:textboxTightWrap', 'w:outlineLvl', 'w:divId', 'w:cnfStyle', - 'w:rPr', 'w:sectPr', 'w:pPrChange' - ) - pStyle = ZeroOrOne('w:pStyle') - numPr = ZeroOrOne('w:numPr', successors=__child_sequence__[7:]) - jc = ZeroOrOne('w:jc', successors=__child_sequence__[27:]) - sectPr = ZeroOrOne('w:sectPr', successors=('w:pPrChange',)) - - def _insert_pStyle(self, pStyle): - self.insert(0, pStyle) - return pStyle - - @property - def alignment(self): - """ - The value of the ```` child element or |None| if not present. - """ - jc = self.jc - if jc is None: - return None - return jc.val - - @alignment.setter - def alignment(self, value): - if value is None: - self._remove_jc() - return - jc = self.get_or_add_jc() - jc.val = value - - @property - def style(self): - """ - String contained in child, or None if that element is not - present. - """ - pStyle = self.pStyle - if pStyle is None: - return None - return pStyle.val - - @style.setter - def style(self, style): - """ - Set val attribute of child element to *style*, adding a - new element if necessary. If *style* is |None|, remove the - element if present. - """ - if style is None: - self._remove_pStyle() - return - pStyle = self.get_or_add_pStyle() - pStyle.val = style - - -class CT_R(BaseOxmlElement): - """ - ```` element, containing the properties and text for a run. - """ - rPr = ZeroOrOne('w:rPr') - t = ZeroOrMore('w:t') - br = ZeroOrMore('w:br') - cr = ZeroOrMore('w:cr') - tab = ZeroOrMore('w:tab') - drawing = ZeroOrMore('w:drawing') - - def _insert_rPr(self, rPr): - self.insert(0, rPr) - return rPr - - def add_t(self, text): - """ - Return a newly added ```` element containing *text*. - """ - t = self._add_t(text=text) - if len(text.strip()) < len(text): - t.set(qn('xml:space'), 'preserve') - return t - - def add_drawing(self, inline_or_anchor): - """ - Return a newly appended ``CT_Drawing`` (````) child - element having *inline_or_anchor* as its child. - """ - drawing = self._add_drawing() - drawing.append(inline_or_anchor) - return drawing - - def clear_content(self): - """ - Remove all child elements except the ```` element if present. - """ - content_child_elms = self[1:] if self.rPr is not None else self[:] - for child in content_child_elms: - self.remove(child) - - @property - def style(self): - """ - String contained in w:val attribute of grandchild, or - |None| if that element is not present. - """ - rPr = self.rPr - if rPr is None: - return None - return rPr.style - - @style.setter - def style(self, style): - """ - Set the character style of this element to *style*. If *style* - is None, remove the style element. - """ - rPr = self.get_or_add_rPr() - rPr.style = style - - @property - def text(self): - """ - A string representing the textual content of this run, with content - child elements like ```` translated to their Python - equivalent. - """ - text = '' - for child in self: - if child.tag == qn('w:t'): - t_text = child.text - text += t_text if t_text is not None else '' - elif child.tag == qn('w:tab'): - text += '\t' - elif child.tag in (qn('w:br'), qn('w:cr')): - text += '\n' - return text - - @text.setter - def text(self, text): - self.clear_content() - _RunContentAppender.append_to_run_from_text(self, text) - - @property - def underline(self): - """ - String contained in w:val attribute of ./w:rPr/w:u grandchild, or - |None| if not present. - """ - rPr = self.rPr - if rPr is None: - return None - return rPr.underline - - @underline.setter - def underline(self, value): - rPr = self.get_or_add_rPr() - rPr.underline = value - - -class CT_RPr(BaseOxmlElement): - """ - ```` element, containing the properties for a run. - """ - rStyle = ZeroOrOne('w:rStyle', successors=('w:rPrChange',)) - b = ZeroOrOne('w:b', successors=('w:rPrChange',)) - bCs = ZeroOrOne('w:bCs', successors=('w:rPrChange',)) - caps = ZeroOrOne('w:caps', successors=('w:rPrChange',)) - cs = ZeroOrOne('w:cs', successors=('w:rPrChange',)) - dstrike = ZeroOrOne('w:dstrike', successors=('w:rPrChange',)) - emboss = ZeroOrOne('w:emboss', successors=('w:rPrChange',)) - i = ZeroOrOne('w:i', successors=('w:rPrChange',)) - iCs = ZeroOrOne('w:iCs', successors=('w:rPrChange',)) - imprint = ZeroOrOne('w:imprint', successors=('w:rPrChange',)) - noProof = ZeroOrOne('w:noProof', successors=('w:rPrChange',)) - oMath = ZeroOrOne('w:oMath', successors=('w:rPrChange',)) - outline = ZeroOrOne('w:outline', successors=('w:rPrChange',)) - rtl = ZeroOrOne('w:rtl', successors=('w:rPrChange',)) - shadow = ZeroOrOne('w:shadow', successors=('w:rPrChange',)) - smallCaps = ZeroOrOne('w:smallCaps', successors=('w:rPrChange',)) - snapToGrid = ZeroOrOne('w:snapToGrid', successors=('w:rPrChange',)) - specVanish = ZeroOrOne('w:specVanish', successors=('w:rPrChange',)) - strike = ZeroOrOne('w:strike', successors=('w:rPrChange',)) - u = ZeroOrOne('w:u', successors=('w:rPrChange',)) - vanish = ZeroOrOne('w:vanish', successors=('w:rPrChange',)) - webHidden = ZeroOrOne('w:webHidden', successors=('w:rPrChange',)) - - @property - def style(self): - """ - String contained in child, or None if that element is not - present. - """ - rStyle = self.rStyle - if rStyle is None: - return None - return rStyle.val - - @style.setter - def style(self, style): - """ - Set val attribute of child element to *style*, adding a - new element if necessary. If *style* is |None|, remove the - element if present. - """ - if style is None: - self._remove_rStyle() - elif self.rStyle is None: - self._add_rStyle(val=style) - else: - self.rStyle.val = style - - @property - def underline(self): - """ - Underline type specified in child, or None if that element is - not present. - """ - u = self.u - if u is None: - return None - return u.val - - @underline.setter - def underline(self, value): - self._remove_u() - if value is not None: - u = self._add_u() - u.val = value - - -class CT_Text(BaseOxmlElement): - """ - ```` element, containing a sequence of characters within a run. - """ - - -class CT_Underline(BaseOxmlElement): - """ - ```` element, specifying the underlining style for a run. - """ - @property - def val(self): - """ - The underline type corresponding to the ``w:val`` attribute value. - """ - val = self.get(qn('w:val')) - underline = WD_UNDERLINE.from_xml(val) - if underline == WD_UNDERLINE.SINGLE: - return True - if underline == WD_UNDERLINE.NONE: - return False - return underline - - @val.setter - def val(self, value): - # works fine without these two mappings, but only because True == 1 - # and False == 0, which happen to match the mapping for WD_UNDERLINE - # .SINGLE and .NONE respectively. - if value is True: - value = WD_UNDERLINE.SINGLE - elif value is False: - value = WD_UNDERLINE.NONE - - val = WD_UNDERLINE.to_xml(value) - self.set(qn('w:val'), val) - - -class _RunContentAppender(object): - """ - Service object that knows how to translate a Python string into run - content elements appended to a specified ```` element. Contiguous - sequences of regular characters are appended in a single ```` - element. Each tab character ('\t') causes a ```` element to be - appended. Likewise a newline or carriage return character ('\n', '\r') - causes a ```` element to be appended. - """ - def __init__(self, r): - self._r = r - self._bfr = [] - - @classmethod - def append_to_run_from_text(cls, r, text): - """ - Create a "one-shot" ``_RunContentAppender`` instance and use it to - append the run content elements corresponding to *text* to the - ```` element *r*. - """ - appender = cls(r) - appender.add_text(text) - - def add_text(self, text): - """ - Append the run content elements corresponding to *text* to the - ```` element of this instance. - """ - for char in text: - self.add_char(char) - self.flush() - - def add_char(self, char): - """ - Process the next character of input through the translation finite - state maching (FSM). There are two possible states, buffer pending - and not pending, but those are hidden behind the ``.flush()`` method - which must be called at the end of text to ensure any pending - ```` element is written. - """ - if char == '\t': - self.flush() - self._r.add_tab() - elif char in '\r\n': - self.flush() - self._r.add_br() - else: - self._bfr.append(char) - - def flush(self): - text = ''.join(self._bfr) - if text: - self._r.add_t(text) - del self._bfr[:] diff --git a/docx/oxml/text/__init__.py b/docx/oxml/text/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/docx/oxml/text/font.py b/docx/oxml/text/font.py new file mode 100644 index 000000000..f92ad0426 --- /dev/null +++ b/docx/oxml/text/font.py @@ -0,0 +1,293 @@ +# encoding: utf-8 + +""" +Custom element classes related to run properties (font). +""" + +from .. import parse_xml +from ...enum.dml import MSO_THEME_COLOR +from ...enum.text import WD_UNDERLINE +from ..ns import nsdecls, qn +from ..simpletypes import ( + ST_HexColor, ST_HpsMeasure, ST_String, ST_VerticalAlignRun +) +from ..xmlchemy import ( + BaseOxmlElement, OptionalAttribute, RequiredAttribute, ZeroOrOne +) + + +class CT_Color(BaseOxmlElement): + """ + `w:color` element, specifying the color of a font and perhaps other + objects. + """ + val = RequiredAttribute('w:val', ST_HexColor) + themeColor = OptionalAttribute('w:themeColor', MSO_THEME_COLOR) + + +class CT_Fonts(BaseOxmlElement): + """ + ```` element, specifying typeface name for the various language + types. + """ + ascii = OptionalAttribute('w:ascii', ST_String) + hAnsi = OptionalAttribute('w:hAnsi', ST_String) + + +class CT_HpsMeasure(BaseOxmlElement): + """ + Used for ```` element and others, specifying font size in + half-points. + """ + val = RequiredAttribute('w:val', ST_HpsMeasure) + + +class CT_RPr(BaseOxmlElement): + """ + ```` element, containing the properties for a run. + """ + _tag_seq = ( + 'w:rStyle', 'w:rFonts', 'w:b', 'w:bCs', 'w:i', 'w:iCs', 'w:caps', + 'w:smallCaps', 'w:strike', 'w:dstrike', 'w:outline', 'w:shadow', + 'w:emboss', 'w:imprint', 'w:noProof', 'w:snapToGrid', 'w:vanish', + 'w:webHidden', 'w:color', 'w:spacing', 'w:w', 'w:kern', 'w:position', + 'w:sz', 'w:szCs', 'w:highlight', 'w:u', 'w:effect', 'w:bdr', 'w:shd', + 'w:fitText', 'w:vertAlign', 'w:rtl', 'w:cs', 'w:em', 'w:lang', + 'w:eastAsianLayout', 'w:specVanish', 'w:oMath' + ) + rStyle = ZeroOrOne('w:rStyle', successors=_tag_seq[1:]) + rFonts = ZeroOrOne('w:rFonts', successors=_tag_seq[2:]) + b = ZeroOrOne('w:b', successors=_tag_seq[3:]) + bCs = ZeroOrOne('w:bCs', successors=_tag_seq[4:]) + i = ZeroOrOne('w:i', successors=_tag_seq[5:]) + iCs = ZeroOrOne('w:iCs', successors=_tag_seq[6:]) + caps = ZeroOrOne('w:caps', successors=_tag_seq[7:]) + smallCaps = ZeroOrOne('w:smallCaps', successors=_tag_seq[8:]) + strike = ZeroOrOne('w:strike', successors=_tag_seq[9:]) + dstrike = ZeroOrOne('w:dstrike', successors=_tag_seq[10:]) + outline = ZeroOrOne('w:outline', successors=_tag_seq[11:]) + shadow = ZeroOrOne('w:shadow', successors=_tag_seq[12:]) + emboss = ZeroOrOne('w:emboss', successors=_tag_seq[13:]) + imprint = ZeroOrOne('w:imprint', successors=_tag_seq[14:]) + noProof = ZeroOrOne('w:noProof', successors=_tag_seq[15:]) + snapToGrid = ZeroOrOne('w:snapToGrid', successors=_tag_seq[16:]) + vanish = ZeroOrOne('w:vanish', successors=_tag_seq[17:]) + webHidden = ZeroOrOne('w:webHidden', successors=_tag_seq[18:]) + color = ZeroOrOne('w:color', successors=_tag_seq[19:]) + sz = ZeroOrOne('w:sz', successors=_tag_seq[24:]) + u = ZeroOrOne('w:u', successors=_tag_seq[27:]) + vertAlign = ZeroOrOne('w:vertAlign', successors=_tag_seq[32:]) + rtl = ZeroOrOne('w:rtl', successors=_tag_seq[33:]) + cs = ZeroOrOne('w:cs', successors=_tag_seq[34:]) + specVanish = ZeroOrOne('w:specVanish', successors=_tag_seq[38:]) + oMath = ZeroOrOne('w:oMath', successors=_tag_seq[39:]) + del _tag_seq + + def _new_color(self): + """ + Override metaclass method to set `w:color/@val` to RGB black on + create. + """ + return parse_xml('' % nsdecls('w')) + + @property + def rFonts_ascii(self): + """ + The value of `w:rFonts/@w:ascii` or |None| if not present. Represents + the assigned typeface name. The rFonts element also specifies other + special-case typeface names; this method handles the case where just + the common name is required. + """ + rFonts = self.rFonts + if rFonts is None: + return None + return rFonts.ascii + + @rFonts_ascii.setter + def rFonts_ascii(self, value): + if value is None: + self._remove_rFonts() + return + rFonts = self.get_or_add_rFonts() + rFonts.ascii = value + + @property + def rFonts_hAnsi(self): + """ + The value of `w:rFonts/@w:hAnsi` or |None| if not present. + """ + rFonts = self.rFonts + if rFonts is None: + return None + return rFonts.hAnsi + + @rFonts_hAnsi.setter + def rFonts_hAnsi(self, value): + if value is None and self.rFonts is None: + return + rFonts = self.get_or_add_rFonts() + rFonts.hAnsi = value + + @property + def style(self): + """ + String contained in child, or None if that element is not + present. + """ + rStyle = self.rStyle + if rStyle is None: + return None + return rStyle.val + + @style.setter + def style(self, style): + """ + Set val attribute of child element to *style*, adding a + new element if necessary. If *style* is |None|, remove the + element if present. + """ + if style is None: + self._remove_rStyle() + elif self.rStyle is None: + self._add_rStyle(val=style) + else: + self.rStyle.val = style + + @property + def subscript(self): + """ + |True| if `w:vertAlign/@w:val` is 'subscript'. |False| if + `w:vertAlign/@w:val` contains any other value. |None| if + `w:vertAlign` is not present. + """ + vertAlign = self.vertAlign + if vertAlign is None: + return None + if vertAlign.val == ST_VerticalAlignRun.SUBSCRIPT: + return True + return False + + @subscript.setter + def subscript(self, value): + if value is None: + self._remove_vertAlign() + elif bool(value) is True: + self.get_or_add_vertAlign().val = ST_VerticalAlignRun.SUBSCRIPT + elif self.vertAlign is None: + return + elif self.vertAlign.val == ST_VerticalAlignRun.SUBSCRIPT: + self._remove_vertAlign() + + @property + def superscript(self): + """ + |True| if `w:vertAlign/@w:val` is 'superscript'. |False| if + `w:vertAlign/@w:val` contains any other value. |None| if + `w:vertAlign` is not present. + """ + vertAlign = self.vertAlign + if vertAlign is None: + return None + if vertAlign.val == ST_VerticalAlignRun.SUPERSCRIPT: + return True + return False + + @superscript.setter + def superscript(self, value): + if value is None: + self._remove_vertAlign() + elif bool(value) is True: + self.get_or_add_vertAlign().val = ST_VerticalAlignRun.SUPERSCRIPT + elif self.vertAlign is None: + return + elif self.vertAlign.val == ST_VerticalAlignRun.SUPERSCRIPT: + self._remove_vertAlign() + + @property + def sz_val(self): + """ + The value of `w:sz/@w:val` or |None| if not present. + """ + sz = self.sz + if sz is None: + return None + return sz.val + + @sz_val.setter + def sz_val(self, value): + if value is None: + self._remove_sz() + return + sz = self.get_or_add_sz() + sz.val = value + + @property + def u_val(self): + """ + Value of `w:u/@val`, or None if not present. + """ + u = self.u + if u is None: + return None + return u.val + + @u_val.setter + def u_val(self, value): + self._remove_u() + if value is not None: + self._add_u().val = value + + def _get_bool_val(self, name): + """ + Return the value of the boolean child element having *name*, e.g. + 'b', 'i', and 'smallCaps'. + """ + element = getattr(self, name) + if element is None: + return None + return element.val + + def _set_bool_val(self, name, value): + if value is None: + getattr(self, '_remove_%s' % name)() + return + element = getattr(self, 'get_or_add_%s' % name)() + element.val = value + + +class CT_Underline(BaseOxmlElement): + """ + ```` element, specifying the underlining style for a run. + """ + @property + def val(self): + """ + The underline type corresponding to the ``w:val`` attribute value. + """ + val = self.get(qn('w:val')) + underline = WD_UNDERLINE.from_xml(val) + if underline == WD_UNDERLINE.SINGLE: + return True + if underline == WD_UNDERLINE.NONE: + return False + return underline + + @val.setter + def val(self, value): + # works fine without these two mappings, but only because True == 1 + # and False == 0, which happen to match the mapping for WD_UNDERLINE + # .SINGLE and .NONE respectively. + if value is True: + value = WD_UNDERLINE.SINGLE + elif value is False: + value = WD_UNDERLINE.NONE + + val = WD_UNDERLINE.to_xml(value) + self.set(qn('w:val'), val) + + +class CT_VerticalAlignRun(BaseOxmlElement): + """ + ```` element, specifying subscript or superscript. + """ + val = RequiredAttribute('w:val', ST_VerticalAlignRun) diff --git a/docx/oxml/text/hyperlink.py b/docx/oxml/text/hyperlink.py new file mode 100644 index 000000000..98b861814 --- /dev/null +++ b/docx/oxml/text/hyperlink.py @@ -0,0 +1,48 @@ +# encoding: utf-8 + +""" +Custom element classes related to hyperlinks (CT_Hyperlink). +""" + +from ..ns import qn +from ..simpletypes import ST_RelationshipId +from ..xmlchemy import ( + BaseOxmlElement, RequiredAttribute, ZeroOrMore +) + + +class CT_Hyperlink(BaseOxmlElement): + """ + ```` element, containing the properties and text for a hyperlink. + + The ```` contains a ```` element which holds all the + visible content. The ```` has an attribute ``r:id`` which + holds an ID relating a URL in the document's relationships. + """ + r = ZeroOrMore('w:r') + rid = RequiredAttribute('r:id', ST_RelationshipId) + + @property + def relationship(self): + """ + String contained in ``r:id`` attribute of . It should + point to a URL in the document's relationships. + """ + val = self.get(qn('r:id')) + return val + + @relationship.setter + def relationship(self, rId): + self.set(qn('r:id'), rId) + self.set(qn('w:history'), '1') + + def clear_content(self): + """ + Remove all child r elements + """ + r_to_rm = [] + for child in self[:]: + if child.tag == qn('w:r'): + r_to_rm.append(child) + for r in r_to_rm: + self.remove(r) diff --git a/docx/oxml/text/paragraph.py b/docx/oxml/text/paragraph.py new file mode 100644 index 000000000..7a29adc13 --- /dev/null +++ b/docx/oxml/text/paragraph.py @@ -0,0 +1,79 @@ +# encoding: utf-8 + +""" +Custom element classes related to paragraphs (CT_P). +""" + +from ..ns import qn +from ..xmlchemy import BaseOxmlElement, OxmlElement, ZeroOrMore, ZeroOrOne + + +class CT_P(BaseOxmlElement): + """ + ```` element, containing the properties and text for a paragraph. + """ + pPr = ZeroOrOne('w:pPr') + r = ZeroOrMore('w:r') + hyperlink = ZeroOrMore('w:hyperlink') + + def _insert_pPr(self, pPr): + self.insert(0, pPr) + return pPr + + def add_p_before(self): + """ + Return a new ```` element inserted directly prior to this one. + """ + new_p = OxmlElement('w:p') + self.addprevious(new_p) + return new_p + + @property + def alignment(self): + """ + The value of the ```` grandchild element or |None| if not + present. + """ + pPr = self.pPr + if pPr is None: + return None + return pPr.jc_val + + @alignment.setter + def alignment(self, value): + pPr = self.get_or_add_pPr() + pPr.jc_val = value + + def clear_content(self): + """ + Remove all child elements, except the ```` element if present. + """ + for child in self[:]: + if child.tag == qn('w:pPr'): + continue + self.remove(child) + + def set_sectPr(self, sectPr): + """ + Unconditionally replace or add *sectPr* as a grandchild in the + correct sequence. + """ + pPr = self.get_or_add_pPr() + pPr._remove_sectPr() + pPr._insert_sectPr(sectPr) + + @property + def style(self): + """ + String contained in w:val attribute of ./w:pPr/w:pStyle grandchild, + or |None| if not present. + """ + pPr = self.pPr + if pPr is None: + return None + return pPr.style + + @style.setter + def style(self, style): + pPr = self.get_or_add_pPr() + pPr.style = style diff --git a/docx/oxml/text/parfmt.py b/docx/oxml/text/parfmt.py new file mode 100644 index 000000000..f34cb0e5d --- /dev/null +++ b/docx/oxml/text/parfmt.py @@ -0,0 +1,313 @@ +# encoding: utf-8 + +""" +Custom element classes related to paragraph properties (CT_PPr). +""" + +from ...enum.text import WD_ALIGN_PARAGRAPH, WD_LINE_SPACING +from ...shared import Length +from ..simpletypes import ST_SignedTwipsMeasure, ST_TwipsMeasure +from ..xmlchemy import ( + BaseOxmlElement, OptionalAttribute, RequiredAttribute, ZeroOrOne +) + + +class CT_Ind(BaseOxmlElement): + """ + ```` element, specifying paragraph indentation. + """ + left = OptionalAttribute('w:left', ST_SignedTwipsMeasure) + right = OptionalAttribute('w:right', ST_SignedTwipsMeasure) + firstLine = OptionalAttribute('w:firstLine', ST_TwipsMeasure) + hanging = OptionalAttribute('w:hanging', ST_TwipsMeasure) + + +class CT_Jc(BaseOxmlElement): + """ + ```` element, specifying paragraph justification. + """ + val = RequiredAttribute('w:val', WD_ALIGN_PARAGRAPH) + + +class CT_PPr(BaseOxmlElement): + """ + ```` element, containing the properties for a paragraph. + """ + _tag_seq = ( + 'w:pStyle', 'w:keepNext', 'w:keepLines', 'w:pageBreakBefore', + 'w:framePr', 'w:widowControl', 'w:numPr', 'w:suppressLineNumbers', + 'w:pBdr', 'w:shd', 'w:tabs', 'w:suppressAutoHyphens', 'w:kinsoku', + 'w:wordWrap', 'w:overflowPunct', 'w:topLinePunct', 'w:autoSpaceDE', + 'w:autoSpaceDN', 'w:bidi', 'w:adjustRightInd', 'w:snapToGrid', + 'w:spacing', 'w:ind', 'w:contextualSpacing', 'w:mirrorIndents', + 'w:suppressOverlap', 'w:jc', 'w:textDirection', 'w:textAlignment', + 'w:textboxTightWrap', 'w:outlineLvl', 'w:divId', 'w:cnfStyle', + 'w:rPr', 'w:sectPr', 'w:pPrChange' + ) + pStyle = ZeroOrOne('w:pStyle', successors=_tag_seq[1:]) + keepNext = ZeroOrOne('w:keepNext', successors=_tag_seq[2:]) + keepLines = ZeroOrOne('w:keepLines', successors=_tag_seq[3:]) + pageBreakBefore = ZeroOrOne('w:pageBreakBefore', successors=_tag_seq[4:]) + widowControl = ZeroOrOne('w:widowControl', successors=_tag_seq[6:]) + numPr = ZeroOrOne('w:numPr', successors=_tag_seq[7:]) + spacing = ZeroOrOne('w:spacing', successors=_tag_seq[22:]) + ind = ZeroOrOne('w:ind', successors=_tag_seq[23:]) + jc = ZeroOrOne('w:jc', successors=_tag_seq[27:]) + sectPr = ZeroOrOne('w:sectPr', successors=_tag_seq[35:]) + del _tag_seq + + @property + def first_line_indent(self): + """ + A |Length| value calculated from the values of `w:ind/@w:firstLine` + and `w:ind/@w:hanging`. Returns |None| if the `w:ind` child is not + present. + """ + ind = self.ind + if ind is None: + return None + hanging = ind.hanging + if hanging is not None: + return Length(-hanging) + firstLine = ind.firstLine + if firstLine is None: + return None + return firstLine + + @first_line_indent.setter + def first_line_indent(self, value): + if self.ind is None and value is None: + return + ind = self.get_or_add_ind() + ind.firstLine = ind.hanging = None + if value is None: + return + elif value < 0: + ind.hanging = -value + else: + ind.firstLine = value + + @property + def ind_left(self): + """ + The value of `w:ind/@w:left` or |None| if not present. + """ + ind = self.ind + if ind is None: + return None + return ind.left + + @ind_left.setter + def ind_left(self, value): + if value is None and self.ind is None: + return + ind = self.get_or_add_ind() + ind.left = value + + @property + def ind_right(self): + """ + The value of `w:ind/@w:right` or |None| if not present. + """ + ind = self.ind + if ind is None: + return None + return ind.right + + @ind_right.setter + def ind_right(self, value): + if value is None and self.ind is None: + return + ind = self.get_or_add_ind() + ind.right = value + + @property + def jc_val(self): + """ + The value of the ```` child element or |None| if not present. + """ + jc = self.jc + if jc is None: + return None + return jc.val + + @jc_val.setter + def jc_val(self, value): + if value is None: + self._remove_jc() + return + self.get_or_add_jc().val = value + + @property + def keepLines_val(self): + """ + The value of `keepLines/@val` or |None| if not present. + """ + keepLines = self.keepLines + if keepLines is None: + return None + return keepLines.val + + @keepLines_val.setter + def keepLines_val(self, value): + if value is None: + self._remove_keepLines() + else: + self.get_or_add_keepLines().val = value + + @property + def keepNext_val(self): + """ + The value of `keepNext/@val` or |None| if not present. + """ + keepNext = self.keepNext + if keepNext is None: + return None + return keepNext.val + + @keepNext_val.setter + def keepNext_val(self, value): + if value is None: + self._remove_keepNext() + else: + self.get_or_add_keepNext().val = value + + @property + def pageBreakBefore_val(self): + """ + The value of `pageBreakBefore/@val` or |None| if not present. + """ + pageBreakBefore = self.pageBreakBefore + if pageBreakBefore is None: + return None + return pageBreakBefore.val + + @pageBreakBefore_val.setter + def pageBreakBefore_val(self, value): + if value is None: + self._remove_pageBreakBefore() + else: + self.get_or_add_pageBreakBefore().val = value + + @property + def spacing_after(self): + """ + The value of `w:spacing/@w:after` or |None| if not present. + """ + spacing = self.spacing + if spacing is None: + return None + return spacing.after + + @spacing_after.setter + def spacing_after(self, value): + if value is None and self.spacing is None: + return + self.get_or_add_spacing().after = value + + @property + def spacing_before(self): + """ + The value of `w:spacing/@w:before` or |None| if not present. + """ + spacing = self.spacing + if spacing is None: + return None + return spacing.before + + @spacing_before.setter + def spacing_before(self, value): + if value is None and self.spacing is None: + return + self.get_or_add_spacing().before = value + + @property + def spacing_line(self): + """ + The value of `w:spacing/@w:line` or |None| if not present. + """ + spacing = self.spacing + if spacing is None: + return None + return spacing.line + + @spacing_line.setter + def spacing_line(self, value): + if value is None and self.spacing is None: + return + self.get_or_add_spacing().line = value + + @property + def spacing_lineRule(self): + """ + The value of `w:spacing/@w:lineRule` as a member of the + :ref:`WdLineSpacing` enumeration. Only the `MULTIPLE`, `EXACTLY`, and + `AT_LEAST` members are used. It is the responsibility of the client + to calculate the use of `SINGLE`, `DOUBLE`, and `MULTIPLE` based on + the value of `w:spacing/@w:line` if that behavior is desired. + """ + spacing = self.spacing + if spacing is None: + return None + lineRule = spacing.lineRule + if lineRule is None and spacing.line is not None: + return WD_LINE_SPACING.MULTIPLE + return lineRule + + @spacing_lineRule.setter + def spacing_lineRule(self, value): + if value is None and self.spacing is None: + return + self.get_or_add_spacing().lineRule = value + + @property + def style(self): + """ + String contained in child, or None if that element is not + present. + """ + pStyle = self.pStyle + if pStyle is None: + return None + return pStyle.val + + @style.setter + def style(self, style): + """ + Set val attribute of child element to *style*, adding a + new element if necessary. If *style* is |None|, remove the + element if present. + """ + if style is None: + self._remove_pStyle() + return + pStyle = self.get_or_add_pStyle() + pStyle.val = style + + @property + def widowControl_val(self): + """ + The value of `widowControl/@val` or |None| if not present. + """ + widowControl = self.widowControl + if widowControl is None: + return None + return widowControl.val + + @widowControl_val.setter + def widowControl_val(self, value): + if value is None: + self._remove_widowControl() + else: + self.get_or_add_widowControl().val = value + + +class CT_Spacing(BaseOxmlElement): + """ + ```` element, specifying paragraph spacing attributes such as + space before and line spacing. + """ + after = OptionalAttribute('w:after', ST_TwipsMeasure) + before = OptionalAttribute('w:before', ST_TwipsMeasure) + line = OptionalAttribute('w:line', ST_SignedTwipsMeasure) + lineRule = OptionalAttribute('w:lineRule', WD_LINE_SPACING) diff --git a/docx/oxml/text/run.py b/docx/oxml/text/run.py new file mode 100644 index 000000000..8f0a62e82 --- /dev/null +++ b/docx/oxml/text/run.py @@ -0,0 +1,166 @@ +# encoding: utf-8 + +""" +Custom element classes related to text runs (CT_R). +""" + +from ..ns import qn +from ..simpletypes import ST_BrClear, ST_BrType +from ..xmlchemy import ( + BaseOxmlElement, OptionalAttribute, ZeroOrMore, ZeroOrOne +) + + +class CT_Br(BaseOxmlElement): + """ + ```` element, indicating a line, page, or column break in a run. + """ + type = OptionalAttribute('w:type', ST_BrType) + clear = OptionalAttribute('w:clear', ST_BrClear) + + +class CT_R(BaseOxmlElement): + """ + ```` element, containing the properties and text for a run. + """ + rPr = ZeroOrOne('w:rPr') + t = ZeroOrMore('w:t') + br = ZeroOrMore('w:br') + cr = ZeroOrMore('w:cr') + tab = ZeroOrMore('w:tab') + drawing = ZeroOrMore('w:drawing') + + def _insert_rPr(self, rPr): + self.insert(0, rPr) + return rPr + + def add_t(self, text): + """ + Return a newly added ```` element containing *text*. + """ + t = self._add_t(text=text) + if len(text.strip()) < len(text): + t.set(qn('xml:space'), 'preserve') + return t + + def add_drawing(self, inline_or_anchor): + """ + Return a newly appended ``CT_Drawing`` (````) child + element having *inline_or_anchor* as its child. + """ + drawing = self._add_drawing() + drawing.append(inline_or_anchor) + return drawing + + def clear_content(self): + """ + Remove all child elements except the ```` element if present. + """ + content_child_elms = self[1:] if self.rPr is not None else self[:] + for child in content_child_elms: + self.remove(child) + + @property + def style(self): + """ + String contained in w:val attribute of grandchild, or + |None| if that element is not present. + """ + rPr = self.rPr + if rPr is None: + return None + return rPr.style + + @style.setter + def style(self, style): + """ + Set the character style of this element to *style*. If *style* + is None, remove the style element. + """ + rPr = self.get_or_add_rPr() + rPr.style = style + + @property + def text(self): + """ + A string representing the textual content of this run, with content + child elements like ```` translated to their Python + equivalent. + """ + text = '' + for child in self: + if child.tag == qn('w:t'): + t_text = child.text + text += t_text if t_text is not None else '' + elif child.tag == qn('w:tab'): + text += '\t' + elif child.tag in (qn('w:br'), qn('w:cr')): + text += '\n' + return text + + @text.setter + def text(self, text): + self.clear_content() + _RunContentAppender.append_to_run_from_text(self, text) + + +class CT_Text(BaseOxmlElement): + """ + ```` element, containing a sequence of characters within a run. + """ + + +class _RunContentAppender(object): + """ + Service object that knows how to translate a Python string into run + content elements appended to a specified ```` element. Contiguous + sequences of regular characters are appended in a single ```` + element. Each tab character ('\t') causes a ```` element to be + appended. Likewise a newline or carriage return character ('\n', '\r') + causes a ```` element to be appended. + """ + def __init__(self, r): + self._r = r + self._bfr = [] + + @classmethod + def append_to_run_from_text(cls, r, text): + """ + Create a "one-shot" ``_RunContentAppender`` instance and use it to + append the run content elements corresponding to *text* to the + ```` element *r*. + """ + appender = cls(r) + appender.add_text(text) + + def add_text(self, text): + """ + Append the run content elements corresponding to *text* to the + ```` element of this instance. + """ + for char in text: + self.add_char(char) + self.flush() + + def add_char(self, char): + """ + Process the next character of input through the translation finite + state maching (FSM). There are two possible states, buffer pending + and not pending, but those are hidden behind the ``.flush()`` method + which must be called at the end of text to ensure any pending + ```` element is written. + """ + if char == '\t': + self.flush() + self._r.add_tab() + elif char in '\r\n': + self.flush() + self._r.add_br() + else: + self._bfr.append(char) + + def flush(self): + text = ''.join(self._bfr) + if text: + self._r.add_t(text) + del self._bfr[:] diff --git a/docx/parts/document.py b/docx/parts/document.py index e7ff08e8b..2225da130 100644 --- a/docx/parts/document.py +++ b/docx/parts/document.py @@ -8,62 +8,71 @@ absolute_import, division, print_function, unicode_literals ) -from collections import Sequence - -from ..blkcntnr import BlockItemContainer -from ..enum.section import WD_SECTION +from ..document import Document +from .numbering import NumberingPart from ..opc.constants import RELATIONSHIP_TYPE as RT -from ..opc.package import XmlPart -from ..section import Section -from ..shape import InlineShape -from ..shared import lazyproperty, Parented +from ..opc.part import XmlPart +from ..oxml.shape import CT_Inline +from ..shape import InlineShapes +from ..shared import lazyproperty +from .styles import StylesPart class DocumentPart(XmlPart): """ Main document part of a WordprocessingML (WML) package, aka a .docx file. + Acts as broker to other parts such as image, core properties, and style + parts. It also acts as a convenient delegate when a mid-document object + needs a service involving a remote ancestor. The `Parented.part` property + inherited by many content objects provides access to this part object for + that purpose. """ - def add_paragraph(self, text='', style=None): + @property + def core_properties(self): """ - Return a paragraph newly added to the end of body content. + A |CoreProperties| object providing read/write access to the core + properties of this document. """ - return self.body.add_paragraph(text, style) + return self.package.core_properties - def add_section(self, start_type=WD_SECTION.NEW_PAGE): + @property + def document(self): """ - Return a |Section| object representing a new section added at the end - of the document. + A |Document| object providing access to the content of this document. """ - new_sectPr = self._element.body.add_section_break() - new_sectPr.start_type = start_type - return Section(new_sectPr) + return Document(self._element, self) - def add_table(self, rows, cols): + def get_or_add_image(self, image_descriptor): """ - Return a table having *rows* rows and *cols* columns, newly appended - to the main document story. + Return an (rId, image) 2-tuple for the image identified by + *image_descriptor*. *image* is an |Image| instance providing access + to the properties of the image, such as dimensions and image type. + *rId* is the key for the relationship between this document part and + the image part, reused if already present, newly created if not. """ - return self.body.add_table(rows, cols) + image_part = self._package.image_parts.get_or_add_image_part( + image_descriptor + ) + rId = self.relate_to(image_part, RT.IMAGE) + return rId, image_part.image - @lazyproperty - def body(self): + def get_style(self, style_id, style_type): """ - The |_Body| instance containing the content for this document. + Return the style in this document matching *style_id*. Returns the + default style for *style_type* if *style_id* is |None| or does not + match a defined style of *style_type*. """ - return _Body(self._element.body, self) + return self.styles.get_by_id(style_id, style_type) - def get_or_add_image_part(self, image_descriptor): + def get_style_id(self, style_or_name, style_type): """ - Return an ``(image_part, rId)`` 2-tuple for the image identified by - *image_descriptor*. *image_part* is an |Image| instance corresponding - to the image, newly created if no matching image part is found. *rId* - is the key for the relationship between this document part and the - image part, reused if already present, newly created if not. + Return the style_id (|str|) of the style of *style_type* matching + *style_or_name*. Returns |None| if the style resolves to the default + style for *style_type* or if *style_or_name* is itself |None|. Raises + if *style_or_name* is a style of the wrong type or names a style not + present in the document. """ - image_parts = self._package.image_parts - image_part = image_parts.get_or_add_image_part(image_descriptor) - rId = self.relate_to(image_part, RT.IMAGE) - return (image_part, rId) + return self.styles.get_style_id(style_or_name, style_type) @lazyproperty def inline_shapes(self): @@ -73,6 +82,17 @@ def inline_shapes(self): """ return InlineShapes(self._element.body, self) + def new_pic_inline(self, image_descriptor, width, height): + """ + Return a newly-created `w:inline` element containing the image + specified by *image_descriptor* and scaled based on the values of + *width* and *height*. + """ + rId, image = self.get_or_add_image(image_descriptor) + cx, cy = image.scaled_dimensions(width, height) + shape_id, filename = self.next_id, image.filename + return CT_Inline.new_pic_inline(shape_id, rId, filename, cx, cy) + @property def next_id(self): """ @@ -86,116 +106,44 @@ def next_id(self): if n not in used_ids: return n - @property - def paragraphs(self): - """ - A list of |Paragraph| instances corresponding to the paragraphs in - the document, in document order. Note that paragraphs within revision - marks such as inserted or deleted do not appear in this list. - """ - return self.body.paragraphs - @lazyproperty - def sections(self): + def numbering_part(self): """ - The |Sections| instance organizing the sections in this document. + A |NumberingPart| object providing access to the numbering + definitions for this document. Creates an empty numbering part if one + is not present. """ - return Sections(self._element) + try: + return self.part_related_by(RT.NUMBERING) + except KeyError: + numbering_part = NumberingPart.new() + self.relate_to(numbering_part, RT.NUMBERING) + return numbering_part - @property - def tables(self): + def save(self, path_or_stream): """ - A list of |Table| instances corresponding to the tables in the - document, in document order. Note that tables within revision marks - such as ```` or ```` do not appear in this list. + Save this document to *path_or_stream*, which can be either a path to + a filesystem location (a string) or a file-like object. """ - return self.body.tables - - -class _Body(BlockItemContainer): - """ - Proxy for ```` element in this document, having primarily a - container role. - """ - def __init__(self, body_elm, parent): - super(_Body, self).__init__(body_elm, parent) - self._body = body_elm + self.package.save(path_or_stream) - def clear_content(self): + @property + def styles(self): """ - Return this |_Body| instance after clearing it of all content. - Section properties for the main document story, if present, are - preserved. + A |Styles| object providing access to the styles in the styles part + of this document. """ - self._body.clear_content() - return self - + return self._styles_part.styles -class InlineShapes(Parented): - """ - Sequence of |InlineShape| instances, supporting len(), iteration, and - indexed access. - """ - def __init__(self, body_elm, parent): - super(InlineShapes, self).__init__(parent) - self._body = body_elm - - def __getitem__(self, idx): + @property + def _styles_part(self): """ - Provide indexed access, e.g. 'inline_shapes[idx]' + Instance of |StylesPart| for this document. Creates an empty styles + part if one is not present. """ try: - inline = self._inline_lst[idx] - except IndexError: - msg = "inline shape index [%d] out of range" % idx - raise IndexError(msg) - return InlineShape(inline) - - def __iter__(self): - return (InlineShape(inline) for inline in self._inline_lst) - - def __len__(self): - return len(self._inline_lst) - - def add_picture(self, image_descriptor, run): - """ - Return an |InlineShape| instance containing the picture identified by - *image_descriptor* and added to the end of *run*. The picture shape - has the native size of the image. *image_descriptor* can be a path (a - string) or a file-like object containing a binary image. - """ - image_part, rId = self.part.get_or_add_image_part(image_descriptor) - shape_id = self.part.next_id - r = run._r - picture = InlineShape.new_picture(r, image_part, rId, shape_id) - return picture - - @property - def _inline_lst(self): - body = self._body - xpath = '//w:p/w:r/w:drawing/wp:inline' - return body.xpath(xpath) - - -class Sections(Sequence): - """ - Sequence of |Section| objects corresponding to the sections in the - document. Supports ``len()``, iteration, and indexed access. - """ - def __init__(self, document_elm): - super(Sections, self).__init__() - self._document_elm = document_elm - - def __getitem__(self, key): - if isinstance(key, slice): - sectPr_lst = self._document_elm.sectPr_lst[key] - return [Section(sectPr) for sectPr in sectPr_lst] - sectPr = self._document_elm.sectPr_lst[key] - return Section(sectPr) - - def __iter__(self): - for sectPr in self._document_elm.sectPr_lst: - yield Section(sectPr) - - def __len__(self): - return len(self._document_elm.sectPr_lst) + return self.part_related_by(RT.STYLES) + except KeyError: + styles_part = StylesPart.default(self.package) + self.relate_to(styles_part, RT.STYLES) + return styles_part diff --git a/docx/parts/image.py b/docx/parts/image.py index 9cc698697..6ece20d80 100644 --- a/docx/parts/image.py +++ b/docx/parts/image.py @@ -11,7 +11,7 @@ import hashlib from docx.image.image import Image -from docx.opc.package import Part +from docx.opc.part import Part from docx.shared import Emu, Inches diff --git a/docx/parts/numbering.py b/docx/parts/numbering.py index e9c8f713d..e324c5aac 100644 --- a/docx/parts/numbering.py +++ b/docx/parts/numbering.py @@ -8,7 +8,7 @@ absolute_import, division, print_function, unicode_literals ) -from ..opc.package import XmlPart +from ..opc.part import XmlPart from ..shared import lazyproperty diff --git a/docx/parts/styles.py b/docx/parts/styles.py index d9f4cfda9..00c7cb3c3 100644 --- a/docx/parts/styles.py +++ b/docx/parts/styles.py @@ -8,8 +8,13 @@ absolute_import, division, print_function, unicode_literals ) -from ..opc.package import XmlPart -from ..shared import lazyproperty +import os + +from ..opc.constants import CONTENT_TYPE as CT +from ..opc.packuri import PackURI +from ..opc.part import XmlPart +from ..oxml import parse_xml +from ..styles.styles import Styles class StylesPart(XmlPart): @@ -18,30 +23,33 @@ class StylesPart(XmlPart): or glossary. """ @classmethod - def new(cls): + def default(cls, package): """ - Return newly created empty styles part, containing only the root - ```` element. + Return a newly created styles part, containing a default set of + elements. """ - raise NotImplementedError + partname = PackURI('/word/styles.xml') + content_type = CT.WML_STYLES + element = parse_xml(cls._default_styles_xml()) + return cls(partname, content_type, element, package) - @lazyproperty + @property def styles(self): """ The |_Styles| instance containing the styles ( element proxies) for this styles part. """ - return _Styles(self._element) - - -class _Styles(object): - """ - Collection of |_Style| instances corresponding to the ```` - elements in a styles part. - """ - def __init__(self, styles_elm): - super(_Styles, self).__init__() - self._styles_elm = styles_elm + return Styles(self.element) - def __len__(self): - return len(self._styles_elm.style_lst) + @classmethod + def _default_styles_xml(cls): + """ + Return a bytestream containing XML for a default styles part. + """ + path = os.path.join( + os.path.split(__file__)[0], '..', 'templates', + 'default-styles.xml' + ) + with open(path, 'rb') as f: + xml_bytes = f.read() + return xml_bytes diff --git a/docx/section.py b/docx/section.py index 0bdcd17dd..16221243b 100644 --- a/docx/section.py +++ b/docx/section.py @@ -6,6 +6,32 @@ from __future__ import absolute_import, print_function, unicode_literals +from collections import Sequence + + +class Sections(Sequence): + """ + Sequence of |Section| objects corresponding to the sections in the + document. Supports ``len()``, iteration, and indexed access. + """ + def __init__(self, document_elm): + super(Sections, self).__init__() + self._document_elm = document_elm + + def __getitem__(self, key): + if isinstance(key, slice): + sectPr_lst = self._document_elm.sectPr_lst[key] + return [Section(sectPr) for sectPr in sectPr_lst] + sectPr = self._document_elm.sectPr_lst[key] + return Section(sectPr) + + def __iter__(self): + for sectPr in self._document_elm.sectPr_lst: + yield Section(sectPr) + + def __len__(self): + return len(self._document_elm.sectPr_lst) + class Section(object): """ diff --git a/docx/shape.py b/docx/shape.py index c1fe9742a..e4f885d73 100644 --- a/docx/shape.py +++ b/docx/shape.py @@ -10,8 +10,41 @@ ) from .enum.shape import WD_INLINE_SHAPE -from .oxml.shape import CT_Inline, CT_Picture from .oxml.ns import nsmap +from .shared import Parented + + +class InlineShapes(Parented): + """ + Sequence of |InlineShape| instances, supporting len(), iteration, and + indexed access. + """ + def __init__(self, body_elm, parent): + super(InlineShapes, self).__init__(parent) + self._body = body_elm + + def __getitem__(self, idx): + """ + Provide indexed access, e.g. 'inline_shapes[idx]' + """ + try: + inline = self._inline_lst[idx] + except IndexError: + msg = "inline shape index [%d] out of range" % idx + raise IndexError(msg) + return InlineShape(inline) + + def __iter__(self): + return (InlineShape(inline) for inline in self._inline_lst) + + def __len__(self): + return len(self._inline_lst) + + @property + def _inline_lst(self): + body = self._body + xpath = '//w:p/w:r/w:drawing/wp:inline' + return body.xpath(xpath) class InlineShape(object): @@ -33,25 +66,8 @@ def height(self): @height.setter def height(self, cy): - assert isinstance(cy, int) - assert 0 < cy self._inline.extent.cy = cy - - @classmethod - def new_picture(cls, r, image_part, rId, shape_id): - """ - Return a new |InlineShape| instance containing an inline picture - placement of *image_part* appended to run *r* and uniquely identified - by *shape_id*. - """ - cx, cy, filename = ( - image_part.default_cx, image_part.default_cy, image_part.filename - ) - pic_id = 0 - pic = CT_Picture.new(pic_id, filename, rId, cx, cy) - inline = CT_Inline.new(cx, cy, shape_id, pic) - r.add_drawing(inline) - return cls(inline) + self._inline.graphic.graphicData.pic.spPr.cy = cy @property def type(self): @@ -83,6 +99,5 @@ def width(self): @width.setter def width(self, cx): - assert isinstance(cx, int) - assert 0 < cx self._inline.extent.cx = cx + self._inline.graphic.graphicData.pic.spPr.cx = cx diff --git a/docx/shared.py b/docx/shared.py index f7cd4e147..919964325 100644 --- a/docx/shared.py +++ b/docx/shared.py @@ -17,7 +17,7 @@ class Length(int): _EMUS_PER_INCH = 914400 _EMUS_PER_CM = 360000 _EMUS_PER_MM = 36000 - _EMUS_PER_PX = 12700 + _EMUS_PER_PT = 12700 _EMUS_PER_TWIP = 635 def __new__(cls, emu): @@ -52,10 +52,11 @@ def mm(self): return self / float(self._EMUS_PER_MM) @property - def px(self): - # round can somtimes return values like x.999999 which are truncated - # to x by int(); adding the 0.1 prevents this - return int(round(self / float(self._EMUS_PER_PX)) + 0.1) + def pt(self): + """ + Floating point length in points + """ + return self / float(self._EMUS_PER_PT) @property def twips(self): @@ -104,23 +105,12 @@ def __new__(cls, mm): return Length.__new__(cls, emu) -class Pt(int): - """ - Convenience class for setting font sizes in points - """ - _UNITS_PER_POINT = 100 - - def __new__(cls, pts): - units = int(pts * Pt._UNITS_PER_POINT) - return int.__new__(cls, units) - - -class Px(Length): +class Pt(Length): """ - Convenience constructor for length in pixels. + Convenience value class for specifying a length in points """ - def __new__(cls, px): - emu = int(px * Length._EMUS_PER_PX) + def __new__(cls, points): + emu = int(points * Length._EMUS_PER_PT) return Length.__new__(cls, emu) @@ -134,6 +124,37 @@ def __new__(cls, twips): return Length.__new__(cls, emu) +class RGBColor(tuple): + """ + Immutable value object defining a particular RGB color. + """ + def __new__(cls, r, g, b): + msg = 'RGBColor() takes three integer values 0-255' + for val in (r, g, b): + if not isinstance(val, int) or val < 0 or val > 255: + raise ValueError(msg) + return super(RGBColor, cls).__new__(cls, (r, g, b)) + + def __repr__(self): + return 'RGBColor(0x%02x, 0x%02x, 0x%02x)' % self + + def __str__(self): + """ + Return a hex string rgb value, like '3C2F80' + """ + return '%02X%02X%02X' % self + + @classmethod + def from_string(cls, rgb_hex_str): + """ + Return a new instance from an RGB color hex string like ``'3C2F80'``. + """ + r = int(rgb_hex_str[:2], 16) + g = int(rgb_hex_str[2:4], 16) + b = int(rgb_hex_str[4:], 16) + return cls(r, g, b) + + def lazyproperty(f): """ @lazyprop decorator. Decorated method will be called only on first access @@ -164,6 +185,52 @@ def write_only_property(f): return property(fset=f, doc=docstring) +class ElementProxy(object): + """ + Base class for lxml element proxy classes. An element proxy class is one + whose primary responsibilities are fulfilled by manipulating the + attributes and child elements of an XML element. They are the most common + type of class in python-docx other than custom element (oxml) classes. + """ + + __slots__ = ('_element', '_parent') + + def __init__(self, element, parent=None): + self._element = element + self._parent = parent + + def __eq__(self, other): + """ + Return |True| if this proxy object refers to the same oxml element as + does *other*. ElementProxy objects are value objects and should + maintain no mutable local state. Equality for proxy objects is + defined as referring to the same XML element, whether or not they are + the same proxy object instance. + """ + if not isinstance(other, ElementProxy): + return False + return self._element is other._element + + def __ne__(self, other): + if not isinstance(other, ElementProxy): + return True + return self._element is not other._element + + @property + def element(self): + """ + The lxml element proxied by this object. + """ + return self._element + + @property + def part(self): + """ + The package part containing this object + """ + return self._parent.part + + class Parented(object): """ Provides common services for document elements that occur below a part diff --git a/docx/styles/__init__.py b/docx/styles/__init__.py new file mode 100644 index 000000000..3eff43e55 --- /dev/null +++ b/docx/styles/__init__.py @@ -0,0 +1,50 @@ +# encoding: utf-8 + +""" +Sub-package module for docx.styles sub-package. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + + +class BabelFish(object): + """ + Translates special-case style names from UI name (e.g. Heading 1) to + internal/styles.xml name (e.g. heading 1) and back. + """ + + style_aliases = ( + ('Caption', 'caption'), + ('Heading 1', 'heading 1'), + ('Heading 2', 'heading 2'), + ('Heading 3', 'heading 3'), + ('Heading 4', 'heading 4'), + ('Heading 5', 'heading 5'), + ('Heading 6', 'heading 6'), + ('Heading 7', 'heading 7'), + ('Heading 8', 'heading 8'), + ('Heading 9', 'heading 9'), + ) + + internal_style_names = dict(style_aliases) + ui_style_names = dict((item[1], item[0]) for item in style_aliases) + + @classmethod + def ui2internal(cls, ui_style_name): + """ + Return the internal style name corresponding to *ui_style_name*, such + as 'heading 1' for 'Heading 1'. + """ + return cls.internal_style_names.get(ui_style_name, ui_style_name) + + @classmethod + def internal2ui(cls, internal_style_name): + """ + Return the user interface style name corresponding to + *internal_style_name*, such as 'Heading 1' for 'heading 1'. + """ + return cls.ui_style_names.get( + internal_style_name, internal_style_name + ) diff --git a/docx/styles/latent.py b/docx/styles/latent.py new file mode 100644 index 000000000..99b1514ff --- /dev/null +++ b/docx/styles/latent.py @@ -0,0 +1,224 @@ +# encoding: utf-8 + +""" +Latent style-related objects. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from . import BabelFish +from ..shared import ElementProxy + + +class LatentStyles(ElementProxy): + """ + Provides access to the default behaviors for latent styles in this + document and to the collection of |_LatentStyle| objects that define + overrides of those defaults for a particular named latent style. + """ + + __slots__ = () + + def __getitem__(self, key): + """ + Enables dictionary-style access to a latent style by name. + """ + style_name = BabelFish.ui2internal(key) + lsdException = self._element.get_by_name(style_name) + if lsdException is None: + raise KeyError("no latent style with name '%s'" % key) + return _LatentStyle(lsdException) + + def __iter__(self): + return (_LatentStyle(ls) for ls in self._element.lsdException_lst) + + def __len__(self): + return len(self._element.lsdException_lst) + + def add_latent_style(self, name): + """ + Return a newly added |_LatentStyle| object to override the inherited + defaults defined in this latent styles object for the built-in style + having *name*. + """ + lsdException = self._element.add_lsdException() + lsdException.name = BabelFish.ui2internal(name) + return _LatentStyle(lsdException) + + @property + def default_priority(self): + """ + Integer between 0 and 99 inclusive specifying the default sort order + for latent styles in style lists and the style gallery. |None| if no + value is assigned, which causes Word to use the default value 99. + """ + return self._element.defUIPriority + + @default_priority.setter + def default_priority(self, value): + self._element.defUIPriority = value + + @property + def default_to_hidden(self): + """ + Boolean specifying whether the default behavior for latent styles is + to be hidden. A hidden style does not appear in the recommended list + or in the style gallery. + """ + return self._element.bool_prop('defSemiHidden') + + @default_to_hidden.setter + def default_to_hidden(self, value): + self._element.set_bool_prop('defSemiHidden', value) + + @property + def default_to_locked(self): + """ + Boolean specifying whether the default behavior for latent styles is + to be locked. A locked style does not appear in the styles panel or + the style gallery and cannot be applied to document content. This + behavior is only active when formatting protection is turned on for + the document (via the Developer menu). + """ + return self._element.bool_prop('defLockedState') + + @default_to_locked.setter + def default_to_locked(self, value): + self._element.set_bool_prop('defLockedState', value) + + @property + def default_to_quick_style(self): + """ + Boolean specifying whether the default behavior for latent styles is + to appear in the style gallery when not hidden. + """ + return self._element.bool_prop('defQFormat') + + @default_to_quick_style.setter + def default_to_quick_style(self, value): + self._element.set_bool_prop('defQFormat', value) + + @property + def default_to_unhide_when_used(self): + """ + Boolean specifying whether the default behavior for latent styles is + to be unhidden when first applied to content. + """ + return self._element.bool_prop('defUnhideWhenUsed') + + @default_to_unhide_when_used.setter + def default_to_unhide_when_used(self, value): + self._element.set_bool_prop('defUnhideWhenUsed', value) + + @property + def load_count(self): + """ + Integer specifying the number of built-in styles to initialize to the + defaults specified in this |LatentStyles| object. |None| if there is + no setting in the XML (very uncommon). The default Word 2011 template + sets this value to 276, accounting for the built-in styles in Word + 2010. + """ + return self._element.count + + @load_count.setter + def load_count(self, value): + self._element.count = value + + +class _LatentStyle(ElementProxy): + """ + Proxy for an `w:lsdException` element, which specifies display behaviors + for a built-in style when no definition for that style is stored yet in + the `styles.xml` part. The values in this element override the defaults + specified in the parent `w:latentStyles` element. + """ + + __slots__ = () + + def delete(self): + """ + Remove this latent style definition such that the defaults defined in + the containing |LatentStyles| object provide the effective value for + each of its attributes. Attempting to access any attributes on this + object after calling this method will raise |AttributeError|. + """ + self._element.delete() + self._element = None + + @property + def hidden(self): + """ + Tri-state value specifying whether this latent style should appear in + the recommended list. |None| indicates the effective value is + inherited from the parent ```` element. + """ + return self._element.on_off_prop('semiHidden') + + @hidden.setter + def hidden(self, value): + self._element.set_on_off_prop('semiHidden', value) + + @property + def locked(self): + """ + Tri-state value specifying whether this latent styles is locked. + A locked style does not appear in the styles panel or the style + gallery and cannot be applied to document content. This behavior is + only active when formatting protection is turned on for the document + (via the Developer menu). + """ + return self._element.on_off_prop('locked') + + @locked.setter + def locked(self, value): + self._element.set_on_off_prop('locked', value) + + @property + def name(self): + """ + The name of the built-in style this exception applies to. + """ + return BabelFish.internal2ui(self._element.name) + + @property + def priority(self): + """ + The integer sort key for this latent style in the Word UI. + """ + return self._element.uiPriority + + @priority.setter + def priority(self, value): + self._element.uiPriority = value + + @property + def quick_style(self): + """ + Tri-state value specifying whether this latent style should appear in + the Word styles gallery when not hidden. |None| indicates the + effective value should be inherited from the default values in its + parent |LatentStyles| object. + """ + return self._element.on_off_prop('qFormat') + + @quick_style.setter + def quick_style(self, value): + self._element.set_on_off_prop('qFormat', value) + + @property + def unhide_when_used(self): + """ + Tri-state value specifying whether this style should have its + :attr:`hidden` attribute set |False| the next time the style is + applied to content. |None| indicates the effective value should be + inherited from the default specified by its parent |LatentStyles| + object. + """ + return self._element.on_off_prop('unhideWhenUsed') + + @unhide_when_used.setter + def unhide_when_used(self, value): + self._element.set_on_off_prop('unhideWhenUsed', value) diff --git a/docx/styles/style.py b/docx/styles/style.py new file mode 100644 index 000000000..24371b231 --- /dev/null +++ b/docx/styles/style.py @@ -0,0 +1,265 @@ +# encoding: utf-8 + +""" +Style object hierarchy. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from . import BabelFish +from ..enum.style import WD_STYLE_TYPE +from ..shared import ElementProxy +from ..text.font import Font +from ..text.parfmt import ParagraphFormat + + +def StyleFactory(style_elm): + """ + Return a style object of the appropriate |BaseStyle| subclass, according + to the type of *style_elm*. + """ + style_cls = { + WD_STYLE_TYPE.PARAGRAPH: _ParagraphStyle, + WD_STYLE_TYPE.CHARACTER: _CharacterStyle, + WD_STYLE_TYPE.TABLE: _TableStyle, + WD_STYLE_TYPE.LIST: _NumberingStyle + }[style_elm.type] + + return style_cls(style_elm) + + +class BaseStyle(ElementProxy): + """ + Base class for the various types of style object, paragraph, character, + table, and numbering. These properties and methods are inherited by all + style objects. + """ + + __slots__ = () + + @property + def builtin(self): + """ + Read-only. |True| if this style is a built-in style. |False| + indicates it is a custom (user-defined) style. Note this value is + based on the presence of a `customStyle` attribute in the XML, not on + specific knowledge of which styles are built into Word. + """ + return not self._element.customStyle + + def delete(self): + """ + Remove this style definition from the document. Note that calling + this method does not remove or change the style applied to any + document content. Content items having the deleted style will be + rendered using the default style, as is any content with a style not + defined in the document. + """ + self._element.delete() + self._element = None + + @property + def hidden(self): + """ + |True| if display of this style in the style gallery and list of + recommended styles is suppressed. |False| otherwise. In order to be + shown in the style gallery, this value must be |False| and + :attr:`.quick_style` must be |True|. + """ + return self._element.semiHidden_val + + @hidden.setter + def hidden(self, value): + self._element.semiHidden_val = value + + @property + def locked(self): + """ + Read/write Boolean. |True| if this style is locked. A locked style + does not appear in the styles panel or the style gallery and cannot + be applied to document content. This behavior is only active when + formatting protection is turned on for the document (via the + Developer menu). + """ + return self._element.locked_val + + @locked.setter + def locked(self, value): + self._element.locked_val = value + + @property + def name(self): + """ + The UI name of this style. + """ + name = self._element.name_val + if name is None: + return None + return BabelFish.internal2ui(name) + + @name.setter + def name(self, value): + self._element.name_val = value + + @property + def priority(self): + """ + The integer sort key governing display sequence of this style in the + Word UI. |None| indicates no setting is defined, causing Word to use + the default value of 0. Style name is used as a secondary sort key to + resolve ordering of styles having the same priority value. + """ + return self._element.uiPriority_val + + @priority.setter + def priority(self, value): + self._element.uiPriority_val = value + + @property + def quick_style(self): + """ + |True| if this style should be displayed in the style gallery when + :attr:`.hidden` is |False|. Read/write Boolean. + """ + return self._element.qFormat_val + + @quick_style.setter + def quick_style(self, value): + self._element.qFormat_val = value + + @property + def style_id(self): + """ + The unique key name (string) for this style. This value is subject to + rewriting by Word and should generally not be changed unless you are + familiar with the internals involved. + """ + return self._element.styleId + + @style_id.setter + def style_id(self, value): + self._element.styleId = value + + @property + def type(self): + """ + Member of :ref:`WdStyleType` corresponding to the type of this style, + e.g. ``WD_STYLE_TYPE.PARAGRAPH``. + """ + type = self._element.type + if type is None: + return WD_STYLE_TYPE.PARAGRAPH + return type + + @property + def unhide_when_used(self): + """ + |True| if an application should make this style visible the next time + it is applied to content. False otherwise. Note that |docx| does not + automatically unhide a style having |True| for this attribute when it + is applied to content. + """ + return self._element.unhideWhenUsed_val + + @unhide_when_used.setter + def unhide_when_used(self, value): + self._element.unhideWhenUsed_val = value + + +class _CharacterStyle(BaseStyle): + """ + A character style. A character style is applied to a |Run| object and + primarily provides character-level formatting via the |Font| object in + its :attr:`.font` property. + """ + + __slots__ = () + + @property + def base_style(self): + """ + Style object this style inherits from or |None| if this style is + not based on another style. + """ + base_style = self._element.base_style + if base_style is None: + return None + return StyleFactory(base_style) + + @base_style.setter + def base_style(self, style): + style_id = style.style_id if style is not None else None + self._element.basedOn_val = style_id + + @property + def font(self): + """ + The |Font| object providing access to the character formatting + properties for this style, such as font name and size. + """ + return Font(self._element) + + +class _ParagraphStyle(_CharacterStyle): + """ + A paragraph style. A paragraph style provides both character formatting + and paragraph formatting such as indentation and line-spacing. + """ + + __slots__ = () + + def __repr__(self): + return '_ParagraphStyle(\'%s\') id: %s' % (self.name, id(self)) + + @property + def next_paragraph_style(self): + """ + |_ParagraphStyle| object representing the style to be applied + automatically to a new paragraph inserted after a paragraph of this + style. Returns self if no next paragraph style is defined. Assigning + |None| or *self* removes the setting such that new paragraphs are + created using this same style. + """ + next_style_elm = self._element.next_style + if next_style_elm is None: + return self + if next_style_elm.type != WD_STYLE_TYPE.PARAGRAPH: + return self + return StyleFactory(next_style_elm) + + @next_paragraph_style.setter + def next_paragraph_style(self, style): + if style is None or style.style_id == self.style_id: + self._element._remove_next() + else: + self._element.get_or_add_next().val = style.style_id + + @property + def paragraph_format(self): + """ + The |ParagraphFormat| object providing access to the paragraph + formatting properties for this style such as indentation. + """ + return ParagraphFormat(self._element) + + +class _TableStyle(_ParagraphStyle): + """ + A table style. A table style provides character and paragraph formatting + for its contents as well as special table formatting properties. + """ + + __slots__ = () + + def __repr__(self): + return '_TableStyle(\'%s\') id: %s' % (self.name, id(self)) + + +class _NumberingStyle(BaseStyle): + """ + A numbering style. Not yet implemented. + """ + + __slots__ = () diff --git a/docx/styles/styles.py b/docx/styles/styles.py new file mode 100644 index 000000000..eabe53b20 --- /dev/null +++ b/docx/styles/styles.py @@ -0,0 +1,157 @@ +# encoding: utf-8 + +""" +Styles object, container for all objects in the styles part. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from warnings import warn + +from . import BabelFish +from .latent import LatentStyles +from ..shared import ElementProxy +from .style import BaseStyle, StyleFactory + + +class Styles(ElementProxy): + """ + A collection providing access to the styles defined in a document. + Accessed using the :attr:`.Document.styles` property. Supports ``len()``, + iteration, and dictionary-style access by style name. + """ + + __slots__ = () + + def __contains__(self, name): + """ + Enables `in` operator on style name. + """ + internal_name = BabelFish.ui2internal(name) + for style in self._element.style_lst: + if style.name_val == internal_name: + return True + return False + + def __getitem__(self, key): + """ + Enables dictionary-style access by UI name. Lookup by style id is + deprecated, triggers a warning, and will be removed in a near-future + release. + """ + style_elm = self._element.get_by_name(BabelFish.ui2internal(key)) + if style_elm is not None: + return StyleFactory(style_elm) + + style_elm = self._element.get_by_id(key) + if style_elm is not None: + msg = ( + 'style lookup by style_id is deprecated. Use style name as ' + 'key instead.' + ) + warn(msg, UserWarning) + return StyleFactory(style_elm) + + raise KeyError("no style with name '%s'" % key) + + def __iter__(self): + return (StyleFactory(style) for style in self._element.style_lst) + + def __len__(self): + return len(self._element.style_lst) + + def add_style(self, name, style_type, builtin=False): + """ + Return a newly added style object of *style_type* and identified + by *name*. A builtin style can be defined by passing True for the + optional *builtin* argument. + """ + style_name = BabelFish.ui2internal(name) + if style_name in self: + raise ValueError("document already contains style '%s'" % name) + style = self._element.add_style_of_type( + style_name, style_type, builtin + ) + return StyleFactory(style) + + def default(self, style_type): + """ + Return the default style for *style_type* or |None| if no default is + defined for that type (not common). + """ + style = self._element.default_for(style_type) + if style is None: + return None + return StyleFactory(style) + + def get_by_id(self, style_id, style_type): + """ + Return the style of *style_type* matching *style_id*. Returns the + default for *style_type* if *style_id* is not found or is |None|, or + if the style having *style_id* is not of *style_type*. + """ + if style_id is None: + return self.default(style_type) + return self._get_by_id(style_id, style_type) + + def get_style_id(self, style_or_name, style_type): + """ + Return the id of the style corresponding to *style_or_name*, or + |None| if *style_or_name* is |None|. If *style_or_name* is not + a style object, the style is looked up using *style_or_name* as + a style name, raising |ValueError| if no style with that name is + defined. Raises |ValueError| if the target style is not of + *style_type*. + """ + if style_or_name is None: + return None + elif isinstance(style_or_name, BaseStyle): + return self._get_style_id_from_style(style_or_name, style_type) + else: + return self._get_style_id_from_name(style_or_name, style_type) + + @property + def latent_styles(self): + """ + A |LatentStyles| object providing access to the default behaviors for + latent styles and the collection of |_LatentStyle| objects that + define overrides of those defaults for a particular named latent + style. + """ + return LatentStyles(self._element.get_or_add_latentStyles()) + + def _get_by_id(self, style_id, style_type): + """ + Return the style of *style_type* matching *style_id*. Returns the + default for *style_type* if *style_id* is not found or if the style + having *style_id* is not of *style_type*. + """ + style = self._element.get_by_id(style_id) + if style is None or style.type != style_type: + return self.default(style_type) + return StyleFactory(style) + + def _get_style_id_from_name(self, style_name, style_type): + """ + Return the id of the style of *style_type* corresponding to + *style_name*. Returns |None| if that style is the default style for + *style_type*. Raises |ValueError| if the named style is not found in + the document or does not match *style_type*. + """ + return self._get_style_id_from_style(self[style_name], style_type) + + def _get_style_id_from_style(self, style, style_type): + """ + Return the id of *style*, or |None| if it is the default style of + *style_type*. Raises |ValueError| if style is not of *style_type*. + """ + if style.type != style_type: + raise ValueError( + "assigned style is type %s, need type %s" % + (style.type, style_type) + ) + if style == self.default(style_type): + return None + return style.style_id diff --git a/docx/table.py b/docx/table.py index 544553b1e..d0b472fc8 100644 --- a/docx/table.py +++ b/docx/table.py @@ -7,7 +7,9 @@ from __future__ import absolute_import, print_function, unicode_literals from .blkcntnr import BlockItemContainer -from .shared import lazyproperty, Parented, write_only_property +from .enum.style import WD_STYLE_TYPE +from .oxml.simpletypes import ST_Merge +from .shared import Inches, lazyproperty, Parented class Table(Parented): @@ -16,17 +18,20 @@ class Table(Parented): """ def __init__(self, tbl, parent): super(Table, self).__init__(parent) - self._tbl = tbl + self._element = self._tbl = tbl - def add_column(self): + def add_column(self, width): """ - Return a |_Column| instance, newly added rightmost to the table. + Return a |_Column| object of *width*, newly added rightmost to the + table. """ tblGrid = self._tbl.tblGrid gridCol = tblGrid.add_gridCol() + gridCol.w = width for tr in self._tbl.tr_lst: - tr.add_tc() - return _Column(gridCol, self._tbl, self) + tc = tr.add_tc() + tc.width = width + return _Column(gridCol, self) def add_row(self): """ @@ -35,9 +40,24 @@ def add_row(self): tbl = self._tbl tr = tbl.add_tr() for gridCol in tbl.tblGrid.gridCol_lst: - tr.add_tc() + tc = tr.add_tc() + tc.width = gridCol.w return _Row(tr, self) + @property + def alignment(self): + """ + Read/write. A member of :ref:`WdRowAlignment` or None, specifying the + positioning of this table between the page margins. |None| if no + setting is specified, causing the effective value to be inherited + from the style hierarchy. + """ + return self._tblPr.alignment + + @alignment.setter + def alignment(self, value): + self._tblPr.alignment = value + @property def autofit(self): """ @@ -57,16 +77,34 @@ def cell(self, row_idx, col_idx): Return |_Cell| instance correponding to table cell at *row_idx*, *col_idx* intersection, where (0, 0) is the top, left-most cell. """ - row = self.rows[row_idx] - return row.cells[col_idx] + cell_idx = col_idx + (row_idx * self._column_count) + return self._cells[cell_idx] + + def column_cells(self, column_idx): + """ + Sequence of cells in the column at *column_idx* in this table. + """ + cells = self._cells + idxs = range(column_idx, len(cells), self._column_count) + return [cells[idx] for idx in idxs] @lazyproperty def columns(self): """ - |_Columns| instance containing the sequence of rows in this table. + |_Columns| instance representing the sequence of columns in this + table. """ return _Columns(self._tbl, self) + def row_cells(self, row_idx): + """ + Sequence of cells in the row at *row_idx* in this table. + """ + column_count = self._column_count + start = row_idx * column_count + end = start + column_count + return self._cells[start:end] + @lazyproperty def rows(self): """ @@ -77,15 +115,74 @@ def rows(self): @property def style(self): """ - String name of style to be applied to this table, e.g. - 'LightShading-Accent1'. Name is derived by removing spaces from the - table style name displayed in the Word UI. + Read/write. A |_TableStyle| object representing the style applied to + this table. The default table style for the document (often `Normal + Table`) is returned if the table has no directly-applied style. + Assigning |None| to this property removes any directly-applied table + style causing it to inherit the default table style of the document. + Note that the style name of a table style differs slightly from that + displayed in the user interface; a hyphen, if it appears, must be + removed. For example, `Light Shading - Accent 1` becomes `Light + Shading Accent 1`. """ - return self._tblPr.style + style_id = self._tbl.tblStyle_val + return self.part.get_style(style_id, WD_STYLE_TYPE.TABLE) @style.setter - def style(self, value): - self._tblPr.style = value + def style(self, style_or_name): + style_id = self.part.get_style_id( + style_or_name, WD_STYLE_TYPE.TABLE + ) + self._tbl.tblStyle_val = style_id + + @property + def table(self): + """ + Provide child objects with reference to the |Table| object they + belong to, without them having to know their direct parent is + a |Table| object. This is the terminus of a series of `parent._table` + calls from an arbitrary child through its ancestors. + """ + return self + + @property + def table_direction(self): + """ + A member of :ref:`WdTableDirection` indicating the direction in which + the table cells are ordered, e.g. `WD_TABLE_DIRECTION.LTR`. |None| + indicates the value is inherited from the style hierarchy. + """ + return self._element.bidiVisual_val + + @table_direction.setter + def table_direction(self, value): + self._element.bidiVisual_val = value + + @property + def _cells(self): + """ + A sequence of |_Cell| objects, one for each cell of the layout grid. + If the table contains a span, one or more |_Cell| object references + are repeated. + """ + col_count = self._column_count + cells = [] + for tc in self._tbl.iter_tcs(): + for grid_span_idx in range(tc.grid_span): + if tc.vMerge == ST_Merge.CONTINUE: + cells.append(cells[-col_count]) + elif grid_span_idx > 0: + cells.append(cells[-1]) + else: + cells.append(_Cell(tc, self)) + return cells + + @property + def _column_count(self): + """ + The number of grid columns in this table. + """ + return self._tbl.col_count @property def _tblPr(self): @@ -121,9 +218,20 @@ def add_table(self, rows, cols): added after the table because Word requires a paragraph element as the last element in every cell. """ - new_table = super(_Cell, self).add_table(rows, cols) + width = self.width if self.width is not None else Inches(1) + table = super(_Cell, self).add_table(rows, cols, width) self.add_paragraph() - return new_table + return table + + def merge(self, other_cell): + """ + Return a merged cell created by spanning the rectangular region + having this cell and *other_cell* as diagonal corners. Raises + |InvalidSpanError| if the cells do not define a rectangular region. + """ + tc, tc_2 = self._tc, other_cell._tc + merged_tc = tc.merge(tc_2) + return _Cell(merged_tc, self._parent) @property def paragraphs(self): @@ -141,7 +249,16 @@ def tables(self): """ return super(_Cell, self).tables - @write_only_property + @property + def text(self): + """ + The entire contents of this cell as a string of text. Assigning + a string to this property replaces all existing content with a single + paragraph containing the assigned text in a single run. + """ + return '\n'.join(p.text for p in self.paragraphs) + + @text.setter def text(self, text): """ Write-only. Set entire contents of cell to the string *text*. Any @@ -169,18 +286,23 @@ class _Column(Parented): """ Table column """ - def __init__(self, gridCol, tbl, parent): + def __init__(self, gridCol, parent): super(_Column, self).__init__(parent) self._gridCol = gridCol - self._tbl = tbl - @lazyproperty + @property def cells(self): """ Sequence of |_Cell| instances corresponding to cells in this column. - Supports ``len()``, iteration and indexed access. """ - return _ColumnCells(self._tbl, self._gridCol, self) + return tuple(self.table.column_cells(self._index)) + + @property + def table(self): + """ + Reference to the |Table| object this column belongs to. + """ + return self._parent.table @property def width(self): @@ -194,45 +316,12 @@ def width(self): def width(self, value): self._gridCol.w = value - -class _ColumnCells(Parented): - """ - Sequence of |_Cell| instances corresponding to the cells in a table - column. - """ - def __init__(self, tbl, gridCol, parent): - super(_ColumnCells, self).__init__(parent) - self._tbl = tbl - self._gridCol = gridCol - - def __getitem__(self, idx): + @property + def _index(self): """ - Provide indexed access, (e.g. 'cells[0]') + Index of this column in its table, starting from zero. """ - try: - tr = self._tr_lst[idx] - except IndexError: - msg = "cell index [%d] is out of range" % idx - raise IndexError(msg) - tc = tr.tc_lst[self._col_idx] - return _Cell(tc, self) - - def __iter__(self): - for tr in self._tr_lst: - tc = tr.tc_lst[self._col_idx] - yield _Cell(tc, self) - - def __len__(self): - return len(self._tr_lst) - - @property - def _col_idx(self): - gridCol_lst = self._tbl.tblGrid.gridCol_lst - return gridCol_lst.index(self._gridCol) - - @property - def _tr_lst(self): - return self._tbl.tr_lst + return self._gridCol.gridCol_idx class _Columns(Parented): @@ -253,15 +342,22 @@ def __getitem__(self, idx): except IndexError: msg = "column index [%d] is out of range" % idx raise IndexError(msg) - return _Column(gridCol, self._tbl, self) + return _Column(gridCol, self) def __iter__(self): for gridCol in self._gridCol_lst: - yield _Column(gridCol, self._tbl, self) + yield _Column(gridCol, self) def __len__(self): return len(self._gridCol_lst) + @property + def table(self): + """ + Reference to the |Table| object this column collection belongs to. + """ + return self._parent.table + @property def _gridCol_lst(self): """ @@ -280,45 +376,32 @@ def __init__(self, tr, parent): super(_Row, self).__init__(parent) self._tr = tr - @lazyproperty + @property def cells(self): """ Sequence of |_Cell| instances corresponding to cells in this row. - Supports ``len()``, iteration and indexed access. """ - return _RowCells(self._tr, self) - - -class _RowCells(Parented): - """ - Sequence of |_Cell| instances corresponding to the cells in a table row. - """ - def __init__(self, tr, parent): - super(_RowCells, self).__init__(parent) - self._tr = tr + return tuple(self.table.row_cells(self._index)) - def __getitem__(self, idx): + @property + def table(self): """ - Provide indexed access, (e.g. 'cells[0]') + Reference to the |Table| object this row belongs to. """ - try: - tc = self._tr.tc_lst[idx] - except IndexError: - msg = "cell index [%d] is out of range" % idx - raise IndexError(msg) - return _Cell(tc, self) - - def __iter__(self): - return (_Cell(tc, self) for tc in self._tr.tc_lst) + return self._parent.table - def __len__(self): - return len(self._tr.tc_lst) + @property + def _index(self): + """ + Index of this row in its table, starting from zero. + """ + return self._tr.tr_idx class _Rows(Parented): """ - Sequence of |_Row| instances corresponding to the rows in a table. - Supports ``len()``, iteration and indexed access. + Sequence of |_Row| objects corresponding to the rows in a table. + Supports ``len()``, iteration, indexed access, and slicing. """ def __init__(self, tbl, parent): super(_Rows, self).__init__(parent) @@ -328,15 +411,17 @@ def __getitem__(self, idx): """ Provide indexed access, (e.g. 'rows[0]') """ - try: - tr = self._tbl.tr_lst[idx] - except IndexError: - msg = "row index [%d] out of range" % idx - raise IndexError(msg) - return _Row(tr, self) + return list(self)[idx] def __iter__(self): return (_Row(tr, self) for tr in self._tbl.tr_lst) def __len__(self): return len(self._tbl.tr_lst) + + @property + def table(self): + """ + Reference to the |Table| object this row collection belongs to. + """ + return self._parent.table diff --git a/docx/templates/default-styles.xml b/docx/templates/default-styles.xml new file mode 100644 index 000000000..b8b97bc70 --- /dev/null +++ b/docx/templates/default-styles.xml @@ -0,0 +1,190 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docx/templates/default.docx b/docx/templates/default.docx index 62c580eb5..85201dd1d 100644 Binary files a/docx/templates/default.docx and b/docx/templates/default.docx differ diff --git a/docx/text.py b/docx/text.py deleted file mode 100644 index 0c551beeb..000000000 --- a/docx/text.py +++ /dev/null @@ -1,489 +0,0 @@ -# encoding: utf-8 - -""" -Text-related proxy types for python-docx, such as Paragraph and Run. -""" - -from __future__ import absolute_import, print_function, unicode_literals - -from .enum.text import WD_BREAK -from .shared import Parented - - -def boolproperty(f): - """ - @boolproperty decorator. Decorated method must return the XML element - name of the boolean property element occuring under rPr. Causes - a read/write tri-state property to be added to the class having the name - of the decorated function. - """ - def _get_prop_value(parent, attr_name): - return getattr(parent, attr_name) - - def _remove_prop(parent, attr_name): - remove_method_name = '_remove_%s' % attr_name - remove_method = getattr(parent, remove_method_name) - remove_method() - - def _add_prop(parent, attr_name): - add_method_name = '_add_%s' % attr_name - add_method = getattr(parent, add_method_name) - return add_method() - - def getter(obj): - r, attr_name = obj._r, f(obj) - if r.rPr is None: - return None - prop_value = _get_prop_value(r.rPr, attr_name) - if prop_value is None: - return None - return prop_value.val - - def setter(obj, value): - if value not in (True, False, None): - raise ValueError( - "assigned value must be True, False, or None, got '%s'" - % value - ) - r, attr_name = obj._r, f(obj) - rPr = r.get_or_add_rPr() - _remove_prop(rPr, attr_name) - if value is not None: - elm = _add_prop(rPr, attr_name) - elm.val = value - - return property(getter, setter, doc=f.__doc__) - - -class Paragraph(Parented): - """ - Proxy object wrapping ```` element. - """ - def __init__(self, p, parent): - super(Paragraph, self).__init__(parent) - self._p = p - - def add_run(self, text=None, style=None): - """ - Append a run to this paragraph containing *text* and having character - style identified by style ID *style*. *text* can contain tab - (``\\t``) characters, which are converted to the appropriate XML form - for a tab. *text* can also include newline (``\\n``) or carriage - return (``\\r``) characters, each of which is converted to a line - break. - """ - r = self._p.add_r() - run = Run(r, self) - if text: - run.text = text - if style: - run.style = style - return run - - @property - def alignment(self): - """ - A member of the :ref:`WdParagraphAlignment` enumeration specifying - the justification setting for this paragraph. A value of |None| - indicates the paragraph has no directly-applied alignment value and - will inherit its alignment value from its style hierarchy. Assigning - |None| to this property removes any directly-applied alignment value. - """ - return self._p.alignment - - @alignment.setter - def alignment(self, value): - self._p.alignment = value - - def clear(self): - """ - Return this same paragraph after removing all its content. - Paragraph-level formatting, such as style, is preserved. - """ - self._p.clear_content() - return self - - def insert_paragraph_before(self, text=None, style=None): - """ - Return a newly created paragraph, inserted directly before this - paragraph. If *text* is supplied, the new paragraph contains that - text in a single run. If *style* is provided, that style is assigned - to the new paragraph. - """ - p = self._p.add_p_before() - paragraph = Paragraph(p, self._parent) - if text: - paragraph.add_run(text) - if style is not None: - paragraph.style = style - return paragraph - - @property - def runs(self): - """ - Sequence of |Run| instances corresponding to the elements in - this paragraph. - """ - return [Run(r, self) for r in self._p.r_lst] - - @property - def style(self): - """ - Paragraph style for this paragraph. Read/Write. - """ - style = self._p.style - return style if style is not None else 'Normal' - - @style.setter - def style(self, style): - self._p.style = None if style == 'Normal' else style - - @property - def text(self): - """ - String formed by concatenating the text of each run in the paragraph. - Tabs and line breaks in the XML are mapped to ``\\t`` and ``\\n`` - characters respectively. - - Assigning text to this property causes all existing paragraph content - to be replaced with a single run containing the assigned text. - A ``\\t`` character in the text is mapped to a ```` element - and each ``\\n`` or ``\\r`` character is mapped to a line break. - Paragraph-level formatting, such as style, is preserved. All - run-level formatting, such as bold or italic, is removed. - """ - text = '' - for run in self.runs: - text += run.text - return text - - @text.setter - def text(self, text): - self.clear() - self.add_run(text) - - -class Run(Parented): - """ - Proxy object wrapping ```` element. Several of the properties on Run - take a tri-state value, |True|, |False|, or |None|. |True| and |False| - correspond to on and off respectively. |None| indicates the property is - not specified directly on the run and its effective value is taken from - the style hierarchy. - """ - def __init__(self, r, parent): - super(Run, self).__init__(parent) - self._r = r - - def add_break(self, break_type=WD_BREAK.LINE): - """ - Add a break element of *break_type* to this run. *break_type* can - take the values `WD_BREAK.LINE`, `WD_BREAK.PAGE`, and - `WD_BREAK.COLUMN` where `WD_BREAK` is imported from `docx.enum.text`. - *break_type* defaults to `WD_BREAK.LINE`. - """ - type_, clear = { - WD_BREAK.LINE: (None, None), - WD_BREAK.PAGE: ('page', None), - WD_BREAK.COLUMN: ('column', None), - WD_BREAK.LINE_CLEAR_LEFT: ('textWrapping', 'left'), - WD_BREAK.LINE_CLEAR_RIGHT: ('textWrapping', 'right'), - WD_BREAK.LINE_CLEAR_ALL: ('textWrapping', 'all'), - }[break_type] - br = self._r.add_br() - if type_ is not None: - br.type = type_ - if clear is not None: - br.clear = clear - - def add_picture(self, image_path_or_stream, width=None, height=None): - """ - Return an |InlineShape| instance containing the image identified by - *image_path_or_stream*, added to the end of this run. - *image_path_or_stream* can be a path (a string) or a file-like object - containing a binary image. If neither width nor height is specified, - the picture appears at its native size. If only one is specified, it - is used to compute a scaling factor that is then applied to the - unspecified dimension, preserving the aspect ratio of the image. The - native size of the picture is calculated using the dots-per-inch - (dpi) value specified in the image file, defaulting to 72 dpi if no - value is specified, as is often the case. - """ - inline_shapes = self.part.inline_shapes - picture = inline_shapes.add_picture(image_path_or_stream, self) - - # scale picture dimensions if width and/or height provided - if width is not None or height is not None: - native_width, native_height = picture.width, picture.height - if width is None: - scaling_factor = float(height) / float(native_height) - width = int(round(native_width * scaling_factor)) - elif height is None: - scaling_factor = float(width) / float(native_width) - height = int(round(native_height * scaling_factor)) - # set picture to scaled dimensions - picture.width = width - picture.height = height - - return picture - - def add_tab(self): - """ - Add a ```` element at the end of the run, which Word - interprets as a tab character. - """ - self._r._add_tab() - - def add_text(self, text): - """ - Returns a newly appended |Text| object (corresponding to a new - ```` child element) to the run, containing *text*. Compare with - the possibly more friendly approach of assigning text to the - :attr:`Run.text` property. - """ - t = self._r.add_t(text) - return Text(t) - - @boolproperty - def all_caps(self): - """ - Read/write. Causes the text of the run to appear in capital letters. - """ - return 'caps' - - @boolproperty - def bold(self): - """ - Read/write. Causes the text of the run to appear in bold. - """ - return 'b' - - def clear(self): - """ - Return reference to this run after removing all its content. All run - formatting is preserved. - """ - self._r.clear_content() - return self - - @boolproperty - def complex_script(self): - """ - Read/write tri-state value. When |True|, causes the characters in the - run to be treated as complex script regardless of their Unicode - values. - """ - return 'cs' - - @boolproperty - def cs_bold(self): - """ - Read/write tri-state value. When |True|, causes the complex script - characters in the run to be displayed in bold typeface. - """ - return 'bCs' - - @boolproperty - def cs_italic(self): - """ - Read/write tri-state value. When |True|, causes the complex script - characters in the run to be displayed in italic typeface. - """ - return 'iCs' - - @boolproperty - def double_strike(self): - """ - Read/write tri-state value. When |True|, causes the text in the run - to appear with double strikethrough. - """ - return 'dstrike' - - @boolproperty - def emboss(self): - """ - Read/write tri-state value. When |True|, causes the text in the run - to appear as if raised off the page in relief. - """ - return 'emboss' - - @boolproperty - def hidden(self): - """ - Read/write tri-state value. When |True|, causes the text in the run - to be hidden from display, unless applications settings force hidden - text to be shown. - """ - return 'vanish' - - @boolproperty - def italic(self): - """ - Read/write tri-state value. When |True|, causes the text of the run - to appear in italics. - """ - return 'i' - - @boolproperty - def imprint(self): - """ - Read/write tri-state value. When |True|, causes the text in the run - to appear as if pressed into the page. - """ - return 'imprint' - - @boolproperty - def math(self): - """ - Read/write tri-state value. When |True|, specifies this run contains - WML that should be handled as though it was Office Open XML Math. - """ - return 'oMath' - - @boolproperty - def no_proof(self): - """ - Read/write tri-state value. When |True|, specifies that the contents - of this run should not report any errors when the document is scanned - for spelling and grammar. - """ - return 'noProof' - - @boolproperty - def outline(self): - """ - Read/write tri-state value. When |True| causes the characters in the - run to appear as if they have an outline, by drawing a one pixel wide - border around the inside and outside borders of each character glyph. - """ - return 'outline' - - @boolproperty - def rtl(self): - """ - Read/write tri-state value. When |True| causes the text in the run - to have right-to-left characteristics. - """ - return 'rtl' - - @boolproperty - def shadow(self): - """ - Read/write tri-state value. When |True| causes the text in the run - to appear as if each character has a shadow. - """ - return 'shadow' - - @boolproperty - def small_caps(self): - """ - Read/write tri-state value. When |True| causes the lowercase - characters in the run to appear as capital letters two points smaller - than the font size specified for the run. - """ - return 'smallCaps' - - @boolproperty - def snap_to_grid(self): - """ - Read/write tri-state value. When |True| causes the run to use the - document grid characters per line settings defined in the docGrid - element when laying out the characters in this run. - """ - return 'snapToGrid' - - @boolproperty - def spec_vanish(self): - """ - Read/write tri-state value. When |True|, specifies that the given run - shall always behave as if it is hidden, even when hidden text is - being displayed in the current document. The property has a very - narrow, specialized use related to the table of contents. Consult the - spec (§17.3.2.36) for more details. - """ - return 'specVanish' - - @boolproperty - def strike(self): - """ - Read/write tri-state value. When |True| causes the text in the run - to appear with a single horizontal line through the center of the - line. - """ - return 'strike' - - @property - def style(self): - """ - Read/write. The string style ID of the character style applied to - this run, or |None| if it has no directly-applied character style. - Setting this property to |None| causes any directly-applied character - style to be removed such that the run inherits character formatting - from its containing paragraph. - """ - return self._r.style - - @style.setter - def style(self, char_style): - self._r.style = char_style - - @property - def text(self): - """ - String formed by concatenating the text equivalent of each run - content child element into a Python string. Each ```` element - adds the text characters it contains. A ```` element adds - a ``\\t`` character. A ```` or ```` element each add - a ``\\n`` character. Note that a ```` element can indicate - a page break or column break as well as a line break. All ```` - elements translate to a single ``\\n`` character regardless of their - type. All other content child elements, such as ````, are - ignored. - - Assigning text to this property has the reverse effect, translating - each ``\\t`` character to a ```` element and each ``\\n`` or - ``\\r`` character to a ```` element. Any existing run content - is replaced. Run formatting is preserved. - """ - return self._r.text - - @text.setter - def text(self, text): - self._r.text = text - - @property - def underline(self): - """ - The underline style for this |Run|, one of |None|, |True|, |False|, - or a value from :ref:`WdUnderline`. A value of |None| indicates the - run has no directly-applied underline value and so will inherit the - underline value of its containing paragraph. Assigning |None| to this - property removes any directly-applied underline value. A value of - |False| indicates a directly-applied setting of no underline, - overriding any inherited value. A value of |True| indicates single - underline. The values from :ref:`WdUnderline` are used to specify - other outline styles such as double, wavy, and dotted. - """ - return self._r.underline - - @underline.setter - def underline(self, value): - self._r.underline = value - - @boolproperty - def web_hidden(self): - """ - Read/write tri-state value. When |True|, specifies that the contents - of this run shall be hidden when the document is displayed in web - page view. - """ - return 'webHidden' - - -class Text(object): - """ - Proxy object wrapping ```` element. - """ - def __init__(self, t_elm): - super(Text, self).__init__() - self._t = t_elm diff --git a/docx/text/__init__.py b/docx/text/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/docx/text/font.py b/docx/text/font.py new file mode 100644 index 000000000..d6dbe2927 --- /dev/null +++ b/docx/text/font.py @@ -0,0 +1,395 @@ +# encoding: utf-8 + +""" +Font-related proxy objects. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from ..dml.color import ColorFormat +from ..shared import ElementProxy + + +class Font(ElementProxy): + """ + Proxy object wrapping the parent of a ```` element and providing + access to character properties such as font name, font size, bold, and + subscript. + """ + + __slots__ = () + + @property + def all_caps(self): + """ + Read/write. Causes text in this font to appear in capital letters. + """ + return self._get_bool_prop('caps') + + @all_caps.setter + def all_caps(self, value): + self._set_bool_prop('caps', value) + + @property + def bold(self): + """ + Read/write. Causes text in this font to appear in bold. + """ + return self._get_bool_prop('b') + + @bold.setter + def bold(self, value): + self._set_bool_prop('b', value) + + @property + def color(self): + """ + A |ColorFormat| object providing a way to get and set the text color + for this font. + """ + return ColorFormat(self._element) + + @property + def complex_script(self): + """ + Read/write tri-state value. When |True|, causes the characters in the + run to be treated as complex script regardless of their Unicode + values. + """ + return self._get_bool_prop('cs') + + @complex_script.setter + def complex_script(self, value): + self._set_bool_prop('cs', value) + + @property + def cs_bold(self): + """ + Read/write tri-state value. When |True|, causes the complex script + characters in the run to be displayed in bold typeface. + """ + return self._get_bool_prop('bCs') + + @cs_bold.setter + def cs_bold(self, value): + self._set_bool_prop('bCs', value) + + @property + def cs_italic(self): + """ + Read/write tri-state value. When |True|, causes the complex script + characters in the run to be displayed in italic typeface. + """ + return self._get_bool_prop('iCs') + + @cs_italic.setter + def cs_italic(self, value): + self._set_bool_prop('iCs', value) + + @property + def double_strike(self): + """ + Read/write tri-state value. When |True|, causes the text in the run + to appear with double strikethrough. + """ + return self._get_bool_prop('dstrike') + + @double_strike.setter + def double_strike(self, value): + self._set_bool_prop('dstrike', value) + + @property + def emboss(self): + """ + Read/write tri-state value. When |True|, causes the text in the run + to appear as if raised off the page in relief. + """ + return self._get_bool_prop('emboss') + + @emboss.setter + def emboss(self, value): + self._set_bool_prop('emboss', value) + + @property + def hidden(self): + """ + Read/write tri-state value. When |True|, causes the text in the run + to be hidden from display, unless applications settings force hidden + text to be shown. + """ + return self._get_bool_prop('vanish') + + @hidden.setter + def hidden(self, value): + self._set_bool_prop('vanish', value) + + @property + def italic(self): + """ + Read/write tri-state value. When |True|, causes the text of the run + to appear in italics. |None| indicates the effective value is + inherited from the style hierarchy. + """ + return self._get_bool_prop('i') + + @italic.setter + def italic(self, value): + self._set_bool_prop('i', value) + + @property + def imprint(self): + """ + Read/write tri-state value. When |True|, causes the text in the run + to appear as if pressed into the page. + """ + return self._get_bool_prop('imprint') + + @imprint.setter + def imprint(self, value): + self._set_bool_prop('imprint', value) + + @property + def math(self): + """ + Read/write tri-state value. When |True|, specifies this run contains + WML that should be handled as though it was Office Open XML Math. + """ + return self._get_bool_prop('oMath') + + @math.setter + def math(self, value): + self._set_bool_prop('oMath', value) + + @property + def name(self): + """ + Get or set the typeface name for this |Font| instance, causing the + text it controls to appear in the named font, if a matching font is + found. |None| indicates the typeface is inherited from the style + hierarchy. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.rFonts_ascii + + @name.setter + def name(self, value): + rPr = self._element.get_or_add_rPr() + rPr.rFonts_ascii = value + rPr.rFonts_hAnsi = value + + @property + def no_proof(self): + """ + Read/write tri-state value. When |True|, specifies that the contents + of this run should not report any errors when the document is scanned + for spelling and grammar. + """ + return self._get_bool_prop('noProof') + + @no_proof.setter + def no_proof(self, value): + self._set_bool_prop('noProof', value) + + @property + def outline(self): + """ + Read/write tri-state value. When |True| causes the characters in the + run to appear as if they have an outline, by drawing a one pixel wide + border around the inside and outside borders of each character glyph. + """ + return self._get_bool_prop('outline') + + @outline.setter + def outline(self, value): + self._set_bool_prop('outline', value) + + @property + def rtl(self): + """ + Read/write tri-state value. When |True| causes the text in the run + to have right-to-left characteristics. + """ + return self._get_bool_prop('rtl') + + @rtl.setter + def rtl(self, value): + self._set_bool_prop('rtl', value) + + @property + def shadow(self): + """ + Read/write tri-state value. When |True| causes the text in the run + to appear as if each character has a shadow. + """ + return self._get_bool_prop('shadow') + + @shadow.setter + def shadow(self, value): + self._set_bool_prop('shadow', value) + + @property + def size(self): + """ + Read/write |Length| value or |None|, indicating the font height in + English Metric Units (EMU). |None| indicates the font size should be + inherited from the style hierarchy. |Length| is a subclass of |int| + having properties for convenient conversion into points or other + length units. The :class:`docx.shared.Pt` class allows convenient + specification of point values:: + + >> font.size = Pt(24) + >> font.size + 304800 + >> font.size.pt + 24.0 + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.sz_val + + @size.setter + def size(self, emu): + rPr = self._element.get_or_add_rPr() + rPr.sz_val = emu + + @property + def small_caps(self): + """ + Read/write tri-state value. When |True| causes the lowercase + characters in the run to appear as capital letters two points smaller + than the font size specified for the run. + """ + return self._get_bool_prop('smallCaps') + + @small_caps.setter + def small_caps(self, value): + self._set_bool_prop('smallCaps', value) + + @property + def snap_to_grid(self): + """ + Read/write tri-state value. When |True| causes the run to use the + document grid characters per line settings defined in the docGrid + element when laying out the characters in this run. + """ + return self._get_bool_prop('snapToGrid') + + @snap_to_grid.setter + def snap_to_grid(self, value): + self._set_bool_prop('snapToGrid', value) + + @property + def spec_vanish(self): + """ + Read/write tri-state value. When |True|, specifies that the given run + shall always behave as if it is hidden, even when hidden text is + being displayed in the current document. The property has a very + narrow, specialized use related to the table of contents. Consult the + spec (§17.3.2.36) for more details. + """ + return self._get_bool_prop('specVanish') + + @spec_vanish.setter + def spec_vanish(self, value): + self._set_bool_prop('specVanish', value) + + @property + def strike(self): + """ + Read/write tri-state value. When |True| causes the text in the run + to appear with a single horizontal line through the center of the + line. + """ + return self._get_bool_prop('strike') + + @strike.setter + def strike(self, value): + self._set_bool_prop('strike', value) + + @property + def subscript(self): + """ + Boolean indicating whether the characters in this |Font| appear as + subscript. |None| indicates the subscript/subscript value is + inherited from the style hierarchy. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.subscript + + @subscript.setter + def subscript(self, value): + rPr = self._element.get_or_add_rPr() + rPr.subscript = value + + @property + def superscript(self): + """ + Boolean indicating whether the characters in this |Font| appear as + superscript. |None| indicates the subscript/superscript value is + inherited from the style hierarchy. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.superscript + + @superscript.setter + def superscript(self, value): + rPr = self._element.get_or_add_rPr() + rPr.superscript = value + + @property + def underline(self): + """ + The underline style for this |Font|, one of |None|, |True|, |False|, + or a value from :ref:`WdUnderline`. |None| indicates the font + inherits its underline value from the style hierarchy. |False| + indicates no underline. |True| indicates single underline. The values + from :ref:`WdUnderline` are used to specify other outline styles such + as double, wavy, and dotted. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr.u_val + + @underline.setter + def underline(self, value): + rPr = self._element.get_or_add_rPr() + rPr.u_val = value + + @property + def web_hidden(self): + """ + Read/write tri-state value. When |True|, specifies that the contents + of this run shall be hidden when the document is displayed in web + page view. + """ + return self._get_bool_prop('webHidden') + + @web_hidden.setter + def web_hidden(self, value): + self._set_bool_prop('webHidden', value) + + def _get_bool_prop(self, name): + """ + Return the value of boolean child of `w:rPr` having *name*. + """ + rPr = self._element.rPr + if rPr is None: + return None + return rPr._get_bool_val(name) + + def _set_bool_prop(self, name, value): + """ + Assign *value* to the boolean child *name* of `w:rPr`. + """ + rPr = self._element.get_or_add_rPr() + rPr._set_bool_val(name, value) diff --git a/docx/text/hyperlink.py b/docx/text/hyperlink.py new file mode 100644 index 000000000..0526c6984 --- /dev/null +++ b/docx/text/hyperlink.py @@ -0,0 +1,88 @@ +# encoding: utf-8 + +""" +Hyperlink proxy objects. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) +from .run import Run +from ..shared import Parented +from docx.opc.constants import RELATIONSHIP_TYPE as RT + + +class Hyperlink(Parented): + """ + Proxy object wrapping ```` element, which in turn contains a + ```` element. It has two main properties: The *url* it points to and + the *text* that is shown on the page. + """ + def __init__(self, hyperlink, parent): + super(Hyperlink, self).__init__(parent) + self._hyperlink = self.element = hyperlink + + @property + def url(self): + """ + Read/write. The relationship ID the Hyperlink points to, or |None| if + it has no directly-applied relationship. Setting this property sets + the The ``r:id`` attribute of the ```` element inside the + hyperlink. + """ + part = self.part + rId = self._hyperlink.relationship + url = part.target_ref(rId) if rId else '' + return url + + @url.setter + def url(self, url): + part = self.part + rId = part.relate_to(url, RT.HYPERLINK, is_external=True) + self._hyperlink.relationship = rId + + @property + def runs(self): + """ + Sequence of |Run| instances corresponding to the elements in + this hyperlink. + """ + return [Run(r, self) for r in self._hyperlink.r_lst] + + def add_run(self, text=None, style=None): + """ + Append a run to this hyperlink containing *text* and having character + style identified by style ID *style*. *text* can contain tab + (``\\t``) characters, which are converted to the appropriate XML form + for a tab. *text* can also include newline (``\\n``) or carriage + return (``\\r``) characters, each of which is converted to a line + break. + """ + r = self._hyperlink.add_r() + run = Run(r, self) + if text: + run.text = text + if style: + run.style = style + return run + + @property + def text(self): + text = '' + for run in self.runs: + text += run.text + return text + + @text.setter + def text(self, text): + self._hyperlink.clear_content() + self.add_run(text) + + +class Text(object): + """ + Proxy object wrapping ```` element. + """ + def __init__(self, t_elm): + super(Text, self).__init__() + self._t = t_elm diff --git a/docx/text/paragraph.py b/docx/text/paragraph.py new file mode 100644 index 000000000..2aecd67d6 --- /dev/null +++ b/docx/text/paragraph.py @@ -0,0 +1,171 @@ +# encoding: utf-8 + +""" +Paragraph-related proxy types. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from ..enum.style import WD_STYLE_TYPE +from .parfmt import ParagraphFormat +from .run import Run +from .hyperlink import Hyperlink +from ..shared import Parented + + +class Paragraph(Parented): + """ + Proxy object wrapping ```` element. + """ + def __init__(self, p, parent): + super(Paragraph, self).__init__(parent) + self._p = self._element = p + + def add_run(self, text=None, style=None): + """ + Append a run to this paragraph containing *text* and having character + style identified by style ID *style*. *text* can contain tab + (``\\t``) characters, which are converted to the appropriate XML form + for a tab. *text* can also include newline (``\\n``) or carriage + return (``\\r``) characters, each of which is converted to a line + break. + """ + r = self._p.add_r() + run = Run(r, self) + if text: + run.text = text + if style: + run.style = style + return run + + def add_hyperlink(self, text=None, url=None, style=None): + """ + Append a run to this paragraph containing *text* and having character + style identified by style ID *style*. *text* can contain tab + (``\\t``) characters, which are converted to the appropriate XML form + for a tab. *text* can also include newline (``\\n``) or carriage + return (``\\r``) characters, each of which is converted to a line + break. + """ + + h = self._p.add_hyperlink() + hyperlink = Hyperlink(h, self) + + r = h.add_r() + run = Run(r, hyperlink) + + if text: + run.text = text + if style: + run.style = style + + if url: + hyperlink.url = url + return hyperlink + + @property + def alignment(self): + """ + A member of the :ref:`WdParagraphAlignment` enumeration specifying + the justification setting for this paragraph. A value of |None| + indicates the paragraph has no directly-applied alignment value and + will inherit its alignment value from its style hierarchy. Assigning + |None| to this property removes any directly-applied alignment value. + """ + return self._p.alignment + + @alignment.setter + def alignment(self, value): + self._p.alignment = value + + def clear(self): + """ + Return this same paragraph after removing all its content. + Paragraph-level formatting, such as style, is preserved. + """ + self._p.clear_content() + return self + + def insert_paragraph_before(self, text=None, style=None): + """ + Return a newly created paragraph, inserted directly before this + paragraph. If *text* is supplied, the new paragraph contains that + text in a single run. If *style* is provided, that style is assigned + to the new paragraph. + """ + paragraph = self._insert_paragraph_before() + if text: + paragraph.add_run(text) + if style is not None: + paragraph.style = style + return paragraph + + @property + def paragraph_format(self): + """ + The |ParagraphFormat| object providing access to the formatting + properties for this paragraph, such as line spacing and indentation. + """ + return ParagraphFormat(self._element) + + @property + def runs(self): + """ + Sequence of |Run| instances corresponding to the elements in + this paragraph. + """ + return [Run(r, self) for r in self._p.r_lst] + + @property + def style(self): + """ + Read/Write. |_ParagraphStyle| object representing the style assigned + to this paragraph. If no explicit style is assigned to this + paragraph, its value is the default paragraph style for the document. + A paragraph style name can be assigned in lieu of a paragraph style + object. Assigning |None| removes any applied style, making its + effective value the default paragraph style for the document. + """ + style_id = self._p.style + return self.part.get_style(style_id, WD_STYLE_TYPE.PARAGRAPH) + + @style.setter + def style(self, style_or_name): + style_id = self.part.get_style_id( + style_or_name, WD_STYLE_TYPE.PARAGRAPH + ) + self._p.style = style_id + + @property + def text(self): + """ + String formed by concatenating the text of each run in the paragraph. + Tabs and line breaks in the XML are mapped to ``\\t`` and ``\\n`` + characters respectively. + + Assigning text to this property causes all existing paragraph content + to be replaced with a single run containing the assigned text. + A ``\\t`` character in the text is mapped to a ```` element + and each ``\\n`` or ``\\r`` character is mapped to a line break. + Paragraph-level formatting, such as style, is preserved. All + run-level formatting, such as bold or italic, is removed. + """ + text = '' + for run in self.runs: + text += run.text + return text + + @text.setter + def text(self, text): + self.clear() + self.add_run(text) + + def _insert_paragraph_before(self): + """ + Return a newly created paragraph, inserted directly before this + paragraph. + """ + p = self._p.add_p_before() + return Paragraph(p, self._parent) diff --git a/docx/text/parfmt.py b/docx/text/parfmt.py new file mode 100644 index 000000000..710da9733 --- /dev/null +++ b/docx/text/parfmt.py @@ -0,0 +1,293 @@ +# encoding: utf-8 + +""" +Paragraph-related proxy types. +""" + +from __future__ import ( + absolute_import, division, print_function, unicode_literals +) + +from ..enum.text import WD_LINE_SPACING +from ..shared import ElementProxy, Emu, Length, Pt, Twips + + +class ParagraphFormat(ElementProxy): + """ + Provides access to paragraph formatting such as justification, + indentation, line spacing, space before and after, and widow/orphan + control. + """ + + __slots__ = () + + @property + def alignment(self): + """ + A member of the :ref:`WdParagraphAlignment` enumeration specifying + the justification setting for this paragraph. A value of |None| + indicates paragraph alignment is inherited from the style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.jc_val + + @alignment.setter + def alignment(self, value): + pPr = self._element.get_or_add_pPr() + pPr.jc_val = value + + @property + def first_line_indent(self): + """ + |Length| value specifying the relative difference in indentation for + the first line of the paragraph. A positive value causes the first + line to be indented. A negative value produces a hanging indent. + |None| indicates first line indentation is inherited from the style + hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.first_line_indent + + @first_line_indent.setter + def first_line_indent(self, value): + pPr = self._element.get_or_add_pPr() + pPr.first_line_indent = value + + @property + def keep_together(self): + """ + |True| if the paragraph should be kept "in one piece" and not broken + across a page boundary when the document is rendered. |None| + indicates its effective value is inherited from the style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.keepLines_val + + @keep_together.setter + def keep_together(self, value): + self._element.get_or_add_pPr().keepLines_val = value + + @property + def keep_with_next(self): + """ + |True| if the paragraph should be kept on the same page as the + subsequent paragraph when the document is rendered. For example, this + property could be used to keep a section heading on the same page as + its first paragraph. |None| indicates its effective value is + inherited from the style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.keepNext_val + + @keep_with_next.setter + def keep_with_next(self, value): + self._element.get_or_add_pPr().keepNext_val = value + + @property + def left_indent(self): + """ + |Length| value specifying the space between the left margin and the + left side of the paragraph. |None| indicates the left indent value is + inherited from the style hierarchy. Use an |Inches| value object as + a convenient way to apply indentation in units of inches. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.ind_left + + @left_indent.setter + def left_indent(self, value): + pPr = self._element.get_or_add_pPr() + pPr.ind_left = value + + @property + def line_spacing(self): + """ + |float| or |Length| value specifying the space between baselines in + successive lines of the paragraph. A value of |None| indicates line + spacing is inherited from the style hierarchy. A float value, e.g. + ``2.0`` or ``1.75``, indicates spacing is applied in multiples of + line heights. A |Length| value such as ``Pt(12)`` indicates spacing + is a fixed height. The |Pt| value class is a convenient way to apply + line spacing in units of points. Assigning |None| resets line spacing + to inherit from the style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return self._line_spacing(pPr.spacing_line, pPr.spacing_lineRule) + + @line_spacing.setter + def line_spacing(self, value): + pPr = self._element.get_or_add_pPr() + if value is None: + pPr.spacing_line = None + pPr.spacing_lineRule = None + elif isinstance(value, Length): + pPr.spacing_line = value + if pPr.spacing_lineRule != WD_LINE_SPACING.AT_LEAST: + pPr.spacing_lineRule = WD_LINE_SPACING.EXACTLY + else: + pPr.spacing_line = Emu(value * Twips(240)) + pPr.spacing_lineRule = WD_LINE_SPACING.MULTIPLE + + @property + def line_spacing_rule(self): + """ + A member of the :ref:`WdLineSpacing` enumeration indicating how the + value of :attr:`line_spacing` should be interpreted. Assigning any of + the :ref:`WdLineSpacing` members :attr:`SINGLE`, :attr:`DOUBLE`, or + :attr:`ONE_POINT_FIVE` will cause the value of :attr:`line_spacing` + to be updated to produce the corresponding line spacing. + """ + pPr = self._element.pPr + if pPr is None: + return None + return self._line_spacing_rule( + pPr.spacing_line, pPr.spacing_lineRule + ) + + @line_spacing_rule.setter + def line_spacing_rule(self, value): + pPr = self._element.get_or_add_pPr() + if value == WD_LINE_SPACING.SINGLE: + pPr.spacing_line = Twips(240) + pPr.spacing_lineRule = WD_LINE_SPACING.MULTIPLE + elif value == WD_LINE_SPACING.ONE_POINT_FIVE: + pPr.spacing_line = Twips(360) + pPr.spacing_lineRule = WD_LINE_SPACING.MULTIPLE + elif value == WD_LINE_SPACING.DOUBLE: + pPr.spacing_line = Twips(480) + pPr.spacing_lineRule = WD_LINE_SPACING.MULTIPLE + else: + pPr.spacing_lineRule = value + + @property + def page_break_before(self): + """ + |True| if the paragraph should appear at the top of the page + following the prior paragraph. |None| indicates its effective value + is inherited from the style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.pageBreakBefore_val + + @page_break_before.setter + def page_break_before(self, value): + self._element.get_or_add_pPr().pageBreakBefore_val = value + + @property + def right_indent(self): + """ + |Length| value specifying the space between the right margin and the + right side of the paragraph. |None| indicates the right indent value + is inherited from the style hierarchy. Use a |Cm| value object as + a convenient way to apply indentation in units of centimeters. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.ind_right + + @right_indent.setter + def right_indent(self, value): + pPr = self._element.get_or_add_pPr() + pPr.ind_right = value + + @property + def space_after(self): + """ + |Length| value specifying the spacing to appear between this + paragraph and the subsequent paragraph. |None| indicates this value + is inherited from the style hierarchy. |Length| objects provide + convenience properties, such as :attr:`~.Length.pt` and + :attr:`~.Length.inches`, that allow easy conversion to various length + units. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.spacing_after + + @space_after.setter + def space_after(self, value): + self._element.get_or_add_pPr().spacing_after = value + + @property + def space_before(self): + """ + |Length| value specifying the spacing to appear between this + paragraph and the prior paragraph. |None| indicates this value is + inherited from the style hierarchy. |Length| objects provide + convenience properties, such as :attr:`~.Length.pt` and + :attr:`~.Length.cm`, that allow easy conversion to various length + units. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.spacing_before + + @space_before.setter + def space_before(self, value): + self._element.get_or_add_pPr().spacing_before = value + + @property + def widow_control(self): + """ + |True| if the first and last lines in the paragraph remain on the + same page as the rest of the paragraph when Word repaginates the + document. |None| indicates its effective value is inherited from the + style hierarchy. + """ + pPr = self._element.pPr + if pPr is None: + return None + return pPr.widowControl_val + + @widow_control.setter + def widow_control(self, value): + self._element.get_or_add_pPr().widowControl_val = value + + @staticmethod + def _line_spacing(spacing_line, spacing_lineRule): + """ + Return the line spacing value calculated from the combination of + *spacing_line* and *spacing_lineRule*. Returns a |float| number of + lines when *spacing_lineRule* is ``WD_LINE_SPACING.MULTIPLE``, + otherwise a |Length| object of absolute line height is returned. + Returns |None| when *spacing_line* is |None|. + """ + if spacing_line is None: + return None + if spacing_lineRule == WD_LINE_SPACING.MULTIPLE: + return spacing_line / Pt(12) + return spacing_line + + @staticmethod + def _line_spacing_rule(line, lineRule): + """ + Return the line spacing rule value calculated from the combination of + *line* and *lineRule*. Returns special members of the + :ref:`WdLineSpacing` enumeration when line spacing is single, double, + or 1.5 lines. + """ + if lineRule == WD_LINE_SPACING.MULTIPLE: + if line == Twips(240): + return WD_LINE_SPACING.SINGLE + if line == Twips(360): + return WD_LINE_SPACING.ONE_POINT_FIVE + if line == Twips(480): + return WD_LINE_SPACING.DOUBLE + return lineRule diff --git a/docx/text/run.py b/docx/text/run.py new file mode 100644 index 000000000..97d6da7db --- /dev/null +++ b/docx/text/run.py @@ -0,0 +1,191 @@ +# encoding: utf-8 + +""" +Run-related proxy objects for python-docx, Run in particular. +""" + +from __future__ import absolute_import, print_function, unicode_literals + +from ..enum.style import WD_STYLE_TYPE +from ..enum.text import WD_BREAK +from .font import Font +from ..shape import InlineShape +from ..shared import Parented + + +class Run(Parented): + """ + Proxy object wrapping ```` element. Several of the properties on Run + take a tri-state value, |True|, |False|, or |None|. |True| and |False| + correspond to on and off respectively. |None| indicates the property is + not specified directly on the run and its effective value is taken from + the style hierarchy. + """ + def __init__(self, r, parent): + super(Run, self).__init__(parent) + self._r = self._element = self.element = r + + def add_break(self, break_type=WD_BREAK.LINE): + """ + Add a break element of *break_type* to this run. *break_type* can + take the values `WD_BREAK.LINE`, `WD_BREAK.PAGE`, and + `WD_BREAK.COLUMN` where `WD_BREAK` is imported from `docx.enum.text`. + *break_type* defaults to `WD_BREAK.LINE`. + """ + type_, clear = { + WD_BREAK.LINE: (None, None), + WD_BREAK.PAGE: ('page', None), + WD_BREAK.COLUMN: ('column', None), + WD_BREAK.LINE_CLEAR_LEFT: ('textWrapping', 'left'), + WD_BREAK.LINE_CLEAR_RIGHT: ('textWrapping', 'right'), + WD_BREAK.LINE_CLEAR_ALL: ('textWrapping', 'all'), + }[break_type] + br = self._r.add_br() + if type_ is not None: + br.type = type_ + if clear is not None: + br.clear = clear + + def add_picture(self, image_path_or_stream, width=None, height=None): + """ + Return an |InlineShape| instance containing the image identified by + *image_path_or_stream*, added to the end of this run. + *image_path_or_stream* can be a path (a string) or a file-like object + containing a binary image. If neither width nor height is specified, + the picture appears at its native size. If only one is specified, it + is used to compute a scaling factor that is then applied to the + unspecified dimension, preserving the aspect ratio of the image. The + native size of the picture is calculated using the dots-per-inch + (dpi) value specified in the image file, defaulting to 72 dpi if no + value is specified, as is often the case. + """ + inline = self.part.new_pic_inline(image_path_or_stream, width, height) + self._r.add_drawing(inline) + return InlineShape(inline) + + def add_tab(self): + """ + Add a ```` element at the end of the run, which Word + interprets as a tab character. + """ + self._r._add_tab() + + def add_text(self, text): + """ + Returns a newly appended |_Text| object (corresponding to a new + ```` child element) to the run, containing *text*. Compare with + the possibly more friendly approach of assigning text to the + :attr:`Run.text` property. + """ + t = self._r.add_t(text) + return _Text(t) + + @property + def bold(self): + """ + Read/write. Causes the text of the run to appear in bold. + """ + return self.font.bold + + @bold.setter + def bold(self, value): + self.font.bold = value + + def clear(self): + """ + Return reference to this run after removing all its content. All run + formatting is preserved. + """ + self._r.clear_content() + return self + + @property + def font(self): + """ + The |Font| object providing access to the character formatting + properties for this run, such as font name and size. + """ + return Font(self._element) + + @property + def italic(self): + """ + Read/write tri-state value. When |True|, causes the text of the run + to appear in italics. + """ + return self.font.italic + + @italic.setter + def italic(self, value): + self.font.italic = value + + @property + def style(self): + """ + Read/write. A |_CharacterStyle| object representing the character + style applied to this run. The default character style for the + document (often `Default Character Font`) is returned if the run has + no directly-applied character style. Setting this property to |None| + removes any directly-applied character style. + """ + style_id = self._r.style + return self.part.get_style(style_id, WD_STYLE_TYPE.CHARACTER) + + @style.setter + def style(self, style_or_name): + style_id = self.part.get_style_id( + style_or_name, WD_STYLE_TYPE.CHARACTER + ) + self._r.style = style_id + + @property + def text(self): + """ + String formed by concatenating the text equivalent of each run + content child element into a Python string. Each ```` element + adds the text characters it contains. A ```` element adds + a ``\\t`` character. A ```` or ```` element each add + a ``\\n`` character. Note that a ```` element can indicate + a page break or column break as well as a line break. All ```` + elements translate to a single ``\\n`` character regardless of their + type. All other content child elements, such as ````, are + ignored. + + Assigning text to this property has the reverse effect, translating + each ``\\t`` character to a ```` element and each ``\\n`` or + ``\\r`` character to a ```` element. Any existing run content + is replaced. Run formatting is preserved. + """ + return self._r.text + + @text.setter + def text(self, text): + self._r.text = text + + @property + def underline(self): + """ + The underline style for this |Run|, one of |None|, |True|, |False|, + or a value from :ref:`WdUnderline`. A value of |None| indicates the + run has no directly-applied underline value and so will inherit the + underline value of its containing paragraph. Assigning |None| to this + property removes any directly-applied underline value. A value of + |False| indicates a directly-applied setting of no underline, + overriding any inherited value. A value of |True| indicates single + underline. The values from :ref:`WdUnderline` are used to specify + other outline styles such as double, wavy, and dotted. + """ + return self.font.underline + + @underline.setter + def underline(self, value): + self.font.underline = value + + +class _Text(object): + """ + Proxy object wrapping ```` element. + """ + def __init__(self, t_elm): + super(_Text, self).__init__() + self._t = t_elm diff --git a/features/api-add-heading.feature b/features/api-add-heading.feature deleted file mode 100644 index 145f818f3..000000000 --- a/features/api-add-heading.feature +++ /dev/null @@ -1,28 +0,0 @@ -Feature: Add a section heading with text - In order add a section heading to a document - As a programmer using the basic python-docx API - I need a method to add a heading with its text in a single step - - Scenario: Add a heading specifying only its text - Given a document - When I add a heading specifying only its text - Then the style of the last paragraph is 'Heading1' - And the last paragraph contains the heading text - - Scenario Outline: Add a heading specifying level - Given a document - When I add a heading specifying level= - Then the style of the last paragraph is '' - - Examples: Heading level styles - | heading level | paragraph style | - | 0 | Title | - | 1 | Heading1 | - | 2 | Heading2 | - | 3 | Heading3 | - | 4 | Heading4 | - | 5 | Heading5 | - | 6 | Heading6 | - | 7 | Heading7 | - | 8 | Heading8 | - | 9 | Heading9 | diff --git a/features/api-add-table.feature b/features/api-add-table.feature deleted file mode 100644 index 555385502..000000000 --- a/features/api-add-table.feature +++ /dev/null @@ -1,16 +0,0 @@ -Feature: Add a table - In order to include tablular information in a document - As a programmer using the basic python-docx API - I need a method that adds a table at the end of the document - - Scenario: Add a table specifying only row and column count - Given a document - When I add a 2 x 2 table specifying only row and column count - Then the document contains a 2 x 2 table - And the table style is 'LightShading-Accent1' - - Scenario: Add a table specifying style - Given a document - When I add a 2 x 2 table specifying style 'foobar' - Then the document contains a 2 x 2 table - And the table style is 'foobar' diff --git a/features/api-open-document.feature b/features/api-open-document.feature new file mode 100644 index 000000000..9f9f67c70 --- /dev/null +++ b/features/api-open-document.feature @@ -0,0 +1,16 @@ +Feature: Open a document + In order work on a document + As a developer using python-docx + I need a way to open a document + + + Scenario: Open a specified document + Given I have python-docx installed + When I call docx.Document() with the path of a .docx file + Then document is a Document object + + + Scenario: Open the default document + Given I have python-docx installed + When I call docx.Document() with no arguments + Then document is a Document object diff --git a/features/blk-add-paragraph.feature b/features/blk-add-paragraph.feature index 73e42c4c2..f873b3775 100644 --- a/features/blk-add-paragraph.feature +++ b/features/blk-add-paragraph.feature @@ -1,6 +1,6 @@ Feature: Add a paragraph of text In order to populate the text of a document - As an python-docx developer + As a developer using python-docx I need the ability to add a paragraph Scenario: Add a paragraph using low-level text API diff --git a/features/blk-add-table.feature b/features/blk-add-table.feature index 3e3696a0f..e13143e56 100644 --- a/features/blk-add-table.feature +++ b/features/blk-add-table.feature @@ -1,6 +1,6 @@ Feature: Add a table In order to fulfill a requirement for a table in a document - As an python-docx developer + As a developer using python-docx I need the ability to add a table Scenario: Access a table diff --git a/features/cel-add-table.feature b/features/cel-add-table.feature new file mode 100644 index 000000000..5aabcee8f --- /dev/null +++ b/features/cel-add-table.feature @@ -0,0 +1,12 @@ +Feature: Add a table into a table cell + In order to nest a table within a table cell + As a developer using python-docx + I need a way to add a table to a table cell + + + Scenario: Add a table into a table cell + Given a table cell + When I add a 2 x 2 table into the first cell + Then cell.tables[0] is a 2 x 2 table + And the width of each column is 1.5375 inches + And the width of each cell is 1.5375 inches diff --git a/features/cel-text.feature b/features/cel-text.feature index 2bd8fb055..8373f8ae7 100644 --- a/features/cel-text.feature +++ b/features/cel-text.feature @@ -1,6 +1,6 @@ Feature: Set table cell text In order to quickly populate a table cell with regular text - As an python-docx developer working with a table + As a developer using python-docx I need the ability to set the text of a table cell Scenario: Set table cell text diff --git a/features/doc-access-collections.feature b/features/doc-access-collections.feature new file mode 100644 index 000000000..0233d5989 --- /dev/null +++ b/features/doc-access-collections.feature @@ -0,0 +1,29 @@ +Feature: Access document collections + In order to operate on objects related to a document + As a developer using python-docx + I need a way to access each of the document's collections + + + Scenario: Access the inline shapes collection of a document + Given a document having inline shapes + Then document.inline_shapes is an InlineShapes object + + + Scenario: Access the paragraphs in the document body as a list + Given a document containing three paragraphs + Then document.paragraphs is a list containing three paragraphs + + + Scenario: Access the section collection of a document + Given a document having sections + Then document.sections is a Sections object + + + Scenario: Access the styles collection of a document + Given a document having styles + Then document.styles is a Styles object + + + Scenario: Access the tables collection of a document + Given a document having three tables + Then document.tables is a list containing three tables diff --git a/features/doc-access-sections.feature b/features/doc-access-sections.feature index 8cb836c42..ad2a58ad8 100644 --- a/features/doc-access-sections.feature +++ b/features/doc-access-sections.feature @@ -1,16 +1,11 @@ Feature: Access document sections - In order to discover and apply section-level settings + In order to operate on an individual section As a developer using python-docx - I need a way to access document sections - - - Scenario: Access section collection of a document - Given a document having three sections - Then I can access the section collection of the document - And the length of the section collection is 3 + I need access to each section in the section collection Scenario: Access section in section collection - Given a section collection - Then I can iterate over the sections + Given a section collection containing 3 sections + Then len(sections) is 3 + And I can iterate over the sections And I can access a section by index diff --git a/features/doc-add-heading.feature b/features/doc-add-heading.feature new file mode 100644 index 000000000..8c23137b7 --- /dev/null +++ b/features/doc-add-heading.feature @@ -0,0 +1,25 @@ +Feature: Add a heading paragraph + In order add a heading to a document + As a developer using python-docx + I need a way to add a heading with its text and level in a single step + + + Scenario: Add a heading specifying only its text + Given a document having built-in styles + When I add a heading specifying only its text + Then the style of the last paragraph is 'Heading 1' + And the last paragraph contains the heading text + + + Scenario Outline: Add a heading specifying level + Given a document having built-in styles + When I add a heading specifying level= + Then the style of the last paragraph is '