[go: up one dir, main page]

Menu

Code Merge Request #2: Implement HTML letters parsing with Beautiful Soup (rejected)

Merging...

Merged

Something went wrong. Please, merge manually

Checking if merge is possible...

Something went wrong. Please, merge manually

Igor Ippolitov wants to merge 1 commit from /u/iippolitov/roundup/ to default, 2017-10-14

Hello.

We are using roundup as an issue tracker mostly using email.
Most of our users send us emails in HTML. They either don't know how to switch to plain text or don't will to change email format to plain text.

This patch implements mostly trivial change: it parses HTML parts if there are any. Parsing relies on BS4 which becomes the only dependency of roundup.

If this is unacceptable solution - I can make this a conditional dependency.

Commit Date  
[2ee03a] (defaulttip) by Ippolitov A. Igor Ippolitov A. Igor

Implement HTML letters parsing with Beautiful Soup

2017-08-07 13:18:46 Tree

Discussion

  • Bernhard Reiter

    Bernhard Reiter - 2017-08-07

    Hi Igor, thanks for the merge request.

    Can you add some discussion about the security of this change?
    (Security concerns have been discussed a number of times regarding HTML attachments.)

    From my point of view I think that making BS and the HTML parsing optional will make it easier to merge the request.

    Best Regards,
    Bernhard

     
  • John Rouillard

    John Rouillard - 2017-10-09

    Beautiful soup is throwing errors on my system. Some undefined value from token.py.
    I do have a replacement html2text that works. I applied your patch and imported my html2text function.

    When I run: ./run_tests test/test_mailgw.py I get 5 failing tests.

    The failures are in two classes:

        elif content_type == 'text/html':
    
          content = html2text(self.getbody()).encode('utf-8')
    

    E UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

    which is a problem I assume with my html2text replacement. If I remove the encode things seems to work (tests are still running).

    in extract_content()
    ...
    elif unpack_rfc822 and content_type == 'message/rfc822':
    ...
    new_content, attachments = m.extract_content(m.gettype(), ig,

              unpack_rfc822)
    

    E ValueError: too many values to unpack

    This is easily fixable adding a html_part to the left hand side I hope.

    Is this merge request linked to the roundup issue tracker?

    -- rouilj

     
  • John Rouillard

    John Rouillard - 2017-10-14

    Igor, I merged/fixed your patch for mailgw.py. Since this was not a direct merge and required editing and creation of tests etc I am marking it as rejected.

    Also in writing tests, a text/html part followed by a text/plain part resulted in duplicate text/html attachment. To fix this I changed:

                 elif html_part_found:
    
    • text/plain part found after html

    • save html as attachment

    • attachments.append(cpart.as_attachment())

    to remove the second attachments call as the html file was attached in the code path that set html_part_found.

    Changesets that incorporate this patch and fixes are:

    changeset: 5305:e20f472fde7d
    changeset: 5306:91354bf0b683
    changeset: 5307:5b4931cfc182

    Bern, barring flaws in beautiful soup or html parsing libraries, I don't think there are any worse security implications to this patch. If there was a text/plain version of the file we can still attach the html version of the file to the issue which if downloaded as html could be used as an attack vector.

    -- rouilj

     
  • John Rouillard

    John Rouillard - 2017-10-14
    • Status: open --> rejected
     

Log in to post a comment.