diff --git a/Doc/data/stable_abi.dat b/Doc/data/stable_abi.dat index 5cbf3771950fc0..95e032655cf0cc 100644 --- a/Doc/data/stable_abi.dat +++ b/Doc/data/stable_abi.dat @@ -160,6 +160,7 @@ func,PyDict_Merge,3.2,, func,PyDict_MergeFromSeq2,3.2,, func,PyDict_New,3.2,, func,PyDict_Next,3.2,, +func,PyDict_SetDefaultRef,3.15,, func,PyDict_SetItem,3.2,, func,PyDict_SetItemString,3.2,, func,PyDict_Size,3.2,, diff --git a/Doc/howto/a-conceptual-overview-of-asyncio.rst b/Doc/howto/a-conceptual-overview-of-asyncio.rst index af1e39480cc1f6..6800a24bc9565d 100644 --- a/Doc/howto/a-conceptual-overview-of-asyncio.rst +++ b/Doc/howto/a-conceptual-overview-of-asyncio.rst @@ -1,7 +1,7 @@ .. _a-conceptual-overview-of-asyncio: **************************************** -A Conceptual Overview of :mod:`!asyncio` +A conceptual overview of :mod:`!asyncio` **************************************** This :ref:`HOWTO ` article seeks to help you build a sturdy mental @@ -37,15 +37,15 @@ In part 1, we'll cover the main, high-level building blocks of :mod:`!asyncio`: the event loop, coroutine functions, coroutine objects, tasks, and ``await``. ========== -Event Loop +Event loop ========== Everything in :mod:`!asyncio` happens relative to the event loop. -It's the star of the show. +It's the star of the show, but prefers to work behind the scenes, managing +and coordinating resources. It's like an orchestra conductor. -It's behind the scenes managing resources. Some power is explicitly granted to it, but a lot of its ability to get things -done comes from the respect and cooperation of its worker bees. +done comes from the respect and cooperation of its band members. In more technical terms, the event loop contains a collection of jobs to be run. Some jobs are added directly by you, and some indirectly by :mod:`!asyncio`. @@ -59,7 +59,7 @@ This process repeats indefinitely, with the event loop cycling endlessly onwards. If there are no more jobs pending execution, the event loop is smart enough to rest and avoid needlessly wasting CPU cycles, and will come back when there's -more work to be done. +more work to be done, such as when I/O operations complete or timers expire. Effective execution relies on jobs sharing well and cooperating; a greedy job could hog control and leave the other jobs to starve, rendering the overall @@ -170,14 +170,14 @@ Roughly speaking, :ref:`tasks ` are coroutines (not coroutine functions) tied to an event loop. A task also maintains a list of callback functions whose importance will become clear in a moment when we discuss :keyword:`await`. -The recommended way to create tasks is via :func:`asyncio.create_task`. Creating a task automatically schedules it for execution (by adding a callback to run it in the event loop's to-do list, that is, collection of jobs). +The recommended way to create tasks is via :func:`asyncio.create_task`. -Since there's only one event loop (in each thread), :mod:`!asyncio` takes care of -associating the task with the event loop for you. As such, there's no need -to specify the event loop. +Since there's only one event loop (in each thread), :mod:`!asyncio` takes +care of associating the task with the event loop for you. +As such, there's no need to specify the event loop. :: @@ -250,6 +250,10 @@ different ways:: In a crucial way, the behavior of ``await`` depends on the type of object being awaited. +^^^^^^^^^^^^^^ +Awaiting tasks +^^^^^^^^^^^^^^ + Awaiting a task will cede control from the current task or coroutine to the event loop. In the process of relinquishing control, a few important things happen. @@ -281,6 +285,10 @@ This is a basic, yet reliable mental model. In practice, the control handoffs are slightly more complex, but not by much. In part 2, we'll walk through the details that make this possible. +^^^^^^^^^^^^^^^^^^^ +Awaiting coroutines +^^^^^^^^^^^^^^^^^^^ + **Unlike tasks, awaiting a coroutine does not hand control back to the event loop!** Wrapping a coroutine in a task first, then awaiting that would cede @@ -347,8 +355,10 @@ The design intentionally trades off some conceptual clarity around usage of ``await`` for improved performance. Each time a task is awaited, control needs to be passed all the way up the call stack to the event loop. -That might sound minor, but in a large program with many ``await`` statements and a deep -call stack, that overhead can add up to a meaningful performance drag. +Then, the event loop needs to manage its internal state and work through +its processing logic to resume the next job. +That might sound minor, but in a large program with many ``await``\ s, that +overhead can add up to a non-negligible performance drag. ------------------------------------------------ A conceptual overview part 2: the nuts and bolts @@ -364,7 +374,8 @@ and how to make your own asynchronous operators. The inner workings of coroutines ================================ -:mod:`!asyncio` leverages four components to pass around control. +:mod:`!asyncio` leverages four components of Python to pass +around control. :meth:`coroutine.send(arg) ` is the method used to start or resume a coroutine. @@ -448,9 +459,9 @@ That might sound odd to you. You might be thinking: That causes the error: ``SyntaxError: yield from not allowed in a coroutine.`` This was intentionally designed for the sake of simplicity -- mandating only one way of using coroutines. + Despite that, ``yield from`` and ``await`` effectively do the same thing. Initially ``yield`` was barred as well, but was re-accepted to allow for async generators. - Despite that, ``yield from`` and ``await`` effectively do the same thing. ======= Futures diff --git a/Doc/library/gc.rst b/Doc/library/gc.rst index 2ef5c4b35a25cc..8e6f2342a2869a 100644 --- a/Doc/library/gc.rst +++ b/Doc/library/gc.rst @@ -108,10 +108,16 @@ The :mod:`gc` module provides the following functions: * ``uncollectable`` is the total number of objects which were found to be uncollectable (and were therefore moved to the :data:`garbage` - list) inside this generation. + list) inside this generation; + + * ``duration`` is the total time in seconds spent in collections for this + generation. .. versionadded:: 3.4 + .. versionchanged:: next + Add ``duration``. + .. function:: set_threshold(threshold0, [threshold1, [threshold2]]) @@ -313,6 +319,9 @@ values but should not rebind them): "uncollectable": When *phase* is "stop", the number of objects that could not be collected and were put in :data:`garbage`. + "duration": When *phase* is "stop", the time in seconds spent in the + collection. + Applications can add their own callbacks to this list. The primary use cases are: @@ -325,6 +334,9 @@ values but should not rebind them): .. versionadded:: 3.3 + .. versionchanged:: next + Add "duration". + The following constants are provided for use with :func:`set_debug`: diff --git a/Doc/reference/datamodel.rst b/Doc/reference/datamodel.rst index 882b05e87319fa..ebadbc215a0eed 100644 --- a/Doc/reference/datamodel.rst +++ b/Doc/reference/datamodel.rst @@ -2630,8 +2630,8 @@ Notes on using *__slots__*: descriptor directly from the base class). This renders the meaning of the program undefined. In the future, a check may be added to prevent this. -* :exc:`TypeError` will be raised if nonempty *__slots__* are defined for a - class derived from a +* :exc:`TypeError` will be raised if *__slots__* other than *__dict__* and + *__weakref__* are defined for a class derived from a :c:member:`"variable-length" built-in type ` such as :class:`int`, :class:`bytes`, and :class:`tuple`. @@ -2656,6 +2656,10 @@ Notes on using *__slots__*: of the iterator's values. However, the *__slots__* attribute will be an empty iterator. +.. versionchanged:: next + Allowed defining the *__dict__* and *__weakref__* *__slots__* for any class. + + .. _class-customization: Customizing class creation diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst index 5a98297d3f8847..d0af9212d55567 100644 --- a/Doc/whatsnew/3.15.rst +++ b/Doc/whatsnew/3.15.rst @@ -394,6 +394,10 @@ Other language changes syntax warnings by module name. (Contributed by Serhiy Storchaka in :gh:`135801`.) +* Allowed defining the *__dict__* and *__weakref__* :ref:`__slots__ ` + for any class. + (Contributed by Serhiy Storchaka in :gh:`41779`.) + New modules =========== diff --git a/Include/cpython/dictobject.h b/Include/cpython/dictobject.h index df9ec7050fca1a..5f2f7b6d4f56bd 100644 --- a/Include/cpython/dictobject.h +++ b/Include/cpython/dictobject.h @@ -39,16 +39,6 @@ Py_DEPRECATED(3.14) PyAPI_FUNC(PyObject *) _PyDict_GetItemStringWithError(PyObje PyAPI_FUNC(PyObject *) PyDict_SetDefault( PyObject *mp, PyObject *key, PyObject *defaultobj); -// Inserts `key` with a value `default_value`, if `key` is not already present -// in the dictionary. If `result` is not NULL, then the value associated -// with `key` is returned in `*result` (either the existing value, or the now -// inserted `default_value`). -// Returns: -// -1 on error -// 0 if `key` was not present and `default_value` was inserted -// 1 if `key` was present and `default_value` was not inserted -PyAPI_FUNC(int) PyDict_SetDefaultRef(PyObject *mp, PyObject *key, PyObject *default_value, PyObject **result); - /* Get the number of items of a dictionary. */ static inline Py_ssize_t PyDict_GET_SIZE(PyObject *op) { PyDictObject *mp; diff --git a/Include/dictobject.h b/Include/dictobject.h index 1bbeec1ab699e7..0384e3131dcdb5 100644 --- a/Include/dictobject.h +++ b/Include/dictobject.h @@ -68,6 +68,18 @@ PyAPI_FUNC(int) PyDict_GetItemRef(PyObject *mp, PyObject *key, PyObject **result PyAPI_FUNC(int) PyDict_GetItemStringRef(PyObject *mp, const char *key, PyObject **result); #endif +#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030F0000 +// Inserts `key` with a value `default_value`, if `key` is not already present +// in the dictionary. If `result` is not NULL, then the value associated +// with `key` is returned in `*result` (either the existing value, or the now +// inserted `default_value`). +// Returns: +// -1 on error +// 0 if `key` was not present and `default_value` was inserted +// 1 if `key` was present and `default_value` was not inserted +PyAPI_FUNC(int) PyDict_SetDefaultRef(PyObject *mp, PyObject *key, PyObject *default_value, PyObject **result); +#endif + #if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030A0000 PyAPI_FUNC(PyObject *) PyObject_GenericGetDict(PyObject *, void *); #endif diff --git a/Include/internal/pycore_interp_structs.h b/Include/internal/pycore_interp_structs.h index f861d3abd96d48..d9f5d444a2dc07 100644 --- a/Include/internal/pycore_interp_structs.h +++ b/Include/internal/pycore_interp_structs.h @@ -179,6 +179,8 @@ struct gc_collection_stats { Py_ssize_t collected; /* total number of uncollectable objects (put into gc.garbage) */ Py_ssize_t uncollectable; + // Duration of the collection in seconds: + double duration; }; /* Running stats per generation */ @@ -189,6 +191,8 @@ struct gc_generation_stats { Py_ssize_t collected; /* total number of uncollectable objects (put into gc.garbage) */ Py_ssize_t uncollectable; + // Duration of the collection in seconds: + double duration; }; enum _GCPhase { diff --git a/Lib/_colorize.py b/Lib/_colorize.py index 57b712bc068d4e..7d573274328826 100644 --- a/Lib/_colorize.py +++ b/Lib/_colorize.py @@ -1,4 +1,3 @@ -import io import os import sys @@ -330,7 +329,7 @@ def _safe_getenv(k: str, fallback: str | None = None) -> str | None: try: return os.isatty(file.fileno()) - except io.UnsupportedOperation: + except OSError: return hasattr(file, "isatty") and file.isatty() diff --git a/Lib/html/parser.py b/Lib/html/parser.py index e50620de800d63..80fb8c3f929f6b 100644 --- a/Lib/html/parser.py +++ b/Lib/html/parser.py @@ -24,6 +24,7 @@ entityref = re.compile('&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]') charref = re.compile('&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]') +incomplete_charref = re.compile('&#(?:[0-9]|[xX][0-9a-fA-F])') attr_charref = re.compile(r'&(#[0-9]+|#[xX][0-9a-fA-F]+|[a-zA-Z][a-zA-Z0-9]*)[;=]?') starttagopen = re.compile('<[a-zA-Z]') @@ -304,10 +305,20 @@ def goahead(self, end): k = k - 1 i = self.updatepos(i, k) continue + match = incomplete_charref.match(rawdata, i) + if match: + if end: + self.handle_charref(rawdata[i+2:]) + i = self.updatepos(i, n) + break + # incomplete + break + elif i + 3 < n: # larger than "&#x" + # not the end of the buffer, and can't be confused + # with some other construct + self.handle_data("&#") + i = self.updatepos(i, i + 2) else: - if ";" in rawdata[i:]: # bail by consuming &# - self.handle_data(rawdata[i:i+2]) - i = self.updatepos(i, i+2) break elif startswith('&', i): match = entityref.match(rawdata, i) @@ -321,15 +332,13 @@ def goahead(self, end): continue match = incomplete.match(rawdata, i) if match: - # match.group() will contain at least 2 chars - if end and match.group() == rawdata[i:]: - k = match.end() - if k <= i: - k = n - i = self.updatepos(i, i + 1) + if end: + self.handle_entityref(rawdata[i+1:]) + i = self.updatepos(i, n) + break # incomplete break - elif (i + 1) < n: + elif i + 1 < n: # not the end of the buffer, and can't be confused # with some other construct self.handle_data("&") diff --git a/Lib/test/test__colorize.py b/Lib/test/test__colorize.py index 25012466840f18..67e0595943d356 100644 --- a/Lib/test/test__colorize.py +++ b/Lib/test/test__colorize.py @@ -166,6 +166,17 @@ def test_colorized_detection_checks_for_file(self): file.isatty.return_value = False self.assertEqual(_colorize.can_colorize(file=file), False) + # The documentation for file.fileno says: + # > An OSError is raised if the IO object does not use a file descriptor. + # gh-141570: Check OSError is caught and handled + with unittest.mock.patch("os.isatty", side_effect=ZeroDivisionError): + file = unittest.mock.MagicMock() + file.fileno.side_effect = OSError + file.isatty.return_value = True + self.assertEqual(_colorize.can_colorize(file=file), True) + file.isatty.return_value = False + self.assertEqual(_colorize.can_colorize(file=file), False) + if __name__ == "__main__": unittest.main() diff --git a/Lib/test/test_descr.py b/Lib/test/test_descr.py index 14f94285d3f3c2..82a48ad4d1aced 100644 --- a/Lib/test/test_descr.py +++ b/Lib/test/test_descr.py @@ -1329,18 +1329,17 @@ class D(object): self.assertNotHasAttr(a, "__weakref__") a.foo = 42 self.assertEqual(a.__dict__, {"foo": 42}) + with self.assertRaises(TypeError): + weakref.ref(a) class W(object): __slots__ = ["__weakref__"] a = W() self.assertHasAttr(a, "__weakref__") self.assertNotHasAttr(a, "__dict__") - try: + with self.assertRaises(AttributeError): a.foo = 42 - except AttributeError: - pass - else: - self.fail("shouldn't be allowed to set a.foo") + self.assertIs(weakref.ref(a)(), a) class C1(W, D): __slots__ = [] @@ -1349,6 +1348,7 @@ class C1(W, D): self.assertHasAttr(a, "__weakref__") a.foo = 42 self.assertEqual(a.__dict__, {"foo": 42}) + self.assertIs(weakref.ref(a)(), a) class C2(D, W): __slots__ = [] @@ -1357,6 +1357,77 @@ class C2(D, W): self.assertHasAttr(a, "__weakref__") a.foo = 42 self.assertEqual(a.__dict__, {"foo": 42}) + self.assertIs(weakref.ref(a)(), a) + + @unittest.skipIf(_testcapi is None, 'need the _testcapi module') + def test_slots_special_before_items(self): + class D(_testcapi.HeapCCollection): + __slots__ = ["__dict__"] + a = D(1, 2, 3) + self.assertHasAttr(a, "__dict__") + self.assertNotHasAttr(a, "__weakref__") + a.foo = 42 + self.assertEqual(a.__dict__, {"foo": 42}) + with self.assertRaises(TypeError): + weakref.ref(a) + del a.__dict__ + self.assertNotHasAttr(a, "foo") + self.assertEqual(a.__dict__, {}) + self.assertEqual(list(a), [1, 2, 3]) + + class W(_testcapi.HeapCCollection): + __slots__ = ["__weakref__"] + a = W(1, 2, 3) + self.assertHasAttr(a, "__weakref__") + self.assertNotHasAttr(a, "__dict__") + with self.assertRaises(AttributeError): + a.foo = 42 + self.assertIs(weakref.ref(a)(), a) + + with self.assertRaises(TypeError): + class X(_testcapi.HeapCCollection): + __slots__ = ['x'] + + with self.assertRaises(TypeError): + class X(_testcapi.HeapCCollection): + __slots__ = ['__dict__', 'x'] + + @support.subTests(('base', 'arg'), [ + (tuple, (1, 2, 3)), + (int, 9876543210**2), + (bytes, b'ab'), + ]) + def test_slots_special_after_items(self, base, arg): + class D(base): + __slots__ = ["__dict__"] + a = D(arg) + self.assertHasAttr(a, "__dict__") + self.assertNotHasAttr(a, "__weakref__") + a.foo = 42 + self.assertEqual(a.__dict__, {"foo": 42}) + with self.assertRaises(TypeError): + weakref.ref(a) + del a.__dict__ + self.assertNotHasAttr(a, "foo") + self.assertEqual(a.__dict__, {}) + self.assertEqual(a, base(arg)) + + class W(base): + __slots__ = ["__weakref__"] + a = W(arg) + self.assertHasAttr(a, "__weakref__") + self.assertNotHasAttr(a, "__dict__") + with self.assertRaises(AttributeError): + a.foo = 42 + self.assertIs(weakref.ref(a)(), a) + self.assertEqual(a, base(arg)) + + with self.assertRaises(TypeError): + class X(base): + __slots__ = ['x'] + with self.assertRaises(TypeError): + class X(base): + __slots__ = ['__dict__', 'x'] def test_slots_special2(self): # Testing __qualname__ and __classcell__ in __slots__ diff --git a/Lib/test/test_gc.py b/Lib/test/test_gc.py index 10c3a622107714..e65da0f61d944f 100644 --- a/Lib/test/test_gc.py +++ b/Lib/test/test_gc.py @@ -847,10 +847,11 @@ def test_get_stats(self): for st in stats: self.assertIsInstance(st, dict) self.assertEqual(set(st), - {"collected", "collections", "uncollectable"}) + {"collected", "collections", "uncollectable", "duration"}) self.assertGreaterEqual(st["collected"], 0) self.assertGreaterEqual(st["collections"], 0) self.assertGreaterEqual(st["uncollectable"], 0) + self.assertGreaterEqual(st["duration"], 0) # Check that collection counts are incremented correctly if gc.isenabled(): self.addCleanup(gc.enable) @@ -861,11 +862,25 @@ def test_get_stats(self): self.assertEqual(new[0]["collections"], old[0]["collections"] + 1) self.assertEqual(new[1]["collections"], old[1]["collections"]) self.assertEqual(new[2]["collections"], old[2]["collections"]) + self.assertGreater(new[0]["duration"], old[0]["duration"]) + self.assertEqual(new[1]["duration"], old[1]["duration"]) + self.assertEqual(new[2]["duration"], old[2]["duration"]) + for stat in ["collected", "uncollectable"]: + self.assertGreaterEqual(new[0][stat], old[0][stat]) + self.assertEqual(new[1][stat], old[1][stat]) + self.assertEqual(new[2][stat], old[2][stat]) gc.collect(2) - new = gc.get_stats() - self.assertEqual(new[0]["collections"], old[0]["collections"] + 1) + old, new = new, gc.get_stats() + self.assertEqual(new[0]["collections"], old[0]["collections"]) self.assertEqual(new[1]["collections"], old[1]["collections"]) self.assertEqual(new[2]["collections"], old[2]["collections"] + 1) + self.assertEqual(new[0]["duration"], old[0]["duration"]) + self.assertEqual(new[1]["duration"], old[1]["duration"]) + self.assertGreater(new[2]["duration"], old[2]["duration"]) + for stat in ["collected", "uncollectable"]: + self.assertEqual(new[0][stat], old[0][stat]) + self.assertEqual(new[1][stat], old[1][stat]) + self.assertGreaterEqual(new[2][stat], old[2][stat]) def test_freeze(self): gc.freeze() @@ -1298,9 +1313,10 @@ def test_collect(self): # Check that we got the right info dict for all callbacks for v in self.visit: info = v[2] - self.assertTrue("generation" in info) - self.assertTrue("collected" in info) - self.assertTrue("uncollectable" in info) + self.assertIn("generation", info) + self.assertIn("collected", info) + self.assertIn("uncollectable", info) + self.assertIn("duration", info) def test_collect_generation(self): self.preclean() diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py index 19dde9362a43b6..e4eff1ea17a670 100644 --- a/Lib/test/test_htmlparser.py +++ b/Lib/test/test_htmlparser.py @@ -109,12 +109,13 @@ def get_events(self): class TestCaseBase(unittest.TestCase): - def get_collector(self): - return EventCollector(convert_charrefs=False) + def get_collector(self, convert_charrefs=False): + return EventCollector(convert_charrefs=convert_charrefs) - def _run_check(self, source, expected_events, collector=None): + def _run_check(self, source, expected_events, + *, collector=None, convert_charrefs=False): if collector is None: - collector = self.get_collector() + collector = self.get_collector(convert_charrefs=convert_charrefs) parser = collector for s in source: parser.feed(s) @@ -128,7 +129,7 @@ def _run_check(self, source, expected_events, collector=None): def _run_check_extra(self, source, events): self._run_check(source, events, - EventCollectorExtra(convert_charrefs=False)) + collector=EventCollectorExtra(convert_charrefs=False)) class HTMLParserTestCase(TestCaseBase): @@ -187,10 +188,87 @@ def test_malformatted_charref(self): ]) def test_unclosed_entityref(self): - self._run_check("&entityref foo", [ - ("entityref", "entityref"), - ("data", " foo"), - ]) + self._run_check('> <', [('entityref', 'gt'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('> <', [('data', '> <')], convert_charrefs=True) + + self._run_check('&undefined <', + [('entityref', 'undefined'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('&undefined <', [('data', '&undefined <')], + convert_charrefs=True) + + self._run_check('>undefined <', + [('entityref', 'gtundefined'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('>undefined <', [('data', '>undefined <')], + convert_charrefs=True) + + self._run_check('& <', [('data', '& '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('& <', [('data', '& <')], convert_charrefs=True) + + def test_eof_in_entityref(self): + self._run_check('>', [('entityref', 'gt')], convert_charrefs=False) + self._run_check('>', [('data', '>')], convert_charrefs=True) + + self._run_check('&g', [('entityref', 'g')], convert_charrefs=False) + self._run_check('&g', [('data', '&g')], convert_charrefs=True) + + self._run_check('&undefined', [('entityref', 'undefined')], + convert_charrefs=False) + self._run_check('&undefined', [('data', '&undefined')], + convert_charrefs=True) + + self._run_check('>undefined', [('entityref', 'gtundefined')], + convert_charrefs=False) + self._run_check('>undefined', [('data', '>undefined')], + convert_charrefs=True) + + self._run_check('&', [('data', '&')], convert_charrefs=False) + self._run_check('&', [('data', '&')], convert_charrefs=True) + + def test_unclosed_charref(self): + self._run_check('{ <', [('charref', '123'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('{ <', [('data', '{ <')], convert_charrefs=True) + self._run_check('« <', [('charref', 'xab'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('« <', [('data', '\xab <')], convert_charrefs=True) + + self._run_check('� <', + [('charref', '123456789'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('� <', [('data', '\ufffd <')], + convert_charrefs=True) + self._run_check('� <', + [('charref', 'x123456789'), ('data', ' '), ('entityref', 'lt')], + convert_charrefs=False) + self._run_check('� <', [('data', '\ufffd <')], + convert_charrefs=True) + + self._run_check('&# <', [('data', '&# '), ('entityref', 'lt')], convert_charrefs=False) + self._run_check('&# <', [('data', '&# <')], convert_charrefs=True) + self._run_check('&#x <', [('data', '&#x '), ('entityref', 'lt')], convert_charrefs=False) + self._run_check('&#x <', [('data', '&#x <')], convert_charrefs=True) + + def test_eof_in_charref(self): + self._run_check('{', [('charref', '123')], convert_charrefs=False) + self._run_check('{', [('data', '{')], convert_charrefs=True) + self._run_check('«', [('charref', 'xab')], convert_charrefs=False) + self._run_check('«', [('data', '\xab')], convert_charrefs=True) + + self._run_check('�', [('charref', '123456789')], + convert_charrefs=False) + self._run_check('�', [('data', '\ufffd')], convert_charrefs=True) + self._run_check('�', [('charref', 'x123456789')], + convert_charrefs=False) + self._run_check('�', [('data', '\ufffd')], convert_charrefs=True) + + self._run_check('&#', [('data', '&#')], convert_charrefs=False) + self._run_check('&#', [('data', '&#')], convert_charrefs=True) + self._run_check('&#x', [('data', '&#x')], convert_charrefs=False) + self._run_check('&#x', [('data', '&#x')], convert_charrefs=True) def test_bad_nesting(self): # Strangely, this *is* supposed to test that overlapping @@ -762,20 +840,6 @@ def test_correct_detection_of_start_tags(self): ] self._run_check(html, expected) - def test_EOF_in_charref(self): - # see #17802 - # This test checks that the UnboundLocalError reported in the issue - # is not raised, however I'm not sure the returned values are correct. - # Maybe HTMLParser should use self.unescape for these - data = [ - ('a&', [('data', 'a&')]), - ('a&b', [('data', 'ab')]), - ('a&b ', [('data', 'a'), ('entityref', 'b'), ('data', ' ')]), - ('a&b;', [('data', 'a'), ('entityref', 'b')]), - ] - for html, expected in data: - self._run_check(html, expected) - def test_eof_in_comments(self): data = [ ('