formatting HTML -- anybody have a tool to do it ? -- the code I wrote - hope it helps.

Doug Easterbrook doug at artsman.com
Mon Oct 24 14:23:35 UTC 2022


A while back I asked if anybody had done an HTML Formatter.    I had a reply off list that I tried for a bit, which didn’t pan out as well as it promised (thanks Kelly).

so, I decided to try write my own and since I’d asked, I thought I should share the result. It requires use of TMOBJS version 2.59 which I posted about last week.  Please feel free to use it if it helps you.  

There are two sections to the email

1) the code (has two methods)
2) an example of how it cleans html up.


I’m in the habit of using the web site jsonlint.con to format json for readability

or if you want to pretty your json within omnis, there is an omnis command

Calculate convertedToJson as OJSON.$formatjson(pJsonField)


this is the HTML equivalent, as it were, within omnis.

Please also note, that I also have a routine that pre-parses the text and replaces accented characters with equivalents like È standard HTML replacements.   thats a separate topic.      this is just the guts of the indentation and prettification code for HTML

it may not be 100% perfect, but it doing the job at the moment.


————————————————————————————————— THE CODE ———————————————	
$FormatHTML(pFieldName,pErrorMessage)

Local variables are
lists:  commentList, tagList
Boolean:  stayAtSameLevel
Integers:  end, indent, indentCharCount, pos, start
char: content, contentNew, indentchars, originalContent, replaceHTMLtag, temp, textWithNoHTML, thisHTMLTag


# Try and format the HTML in style similar to TemplateCleaner for readablity
#
# Parameters
# pFieldName->Reference to the HTML field we are cleaning up .. if it in fact contains any HTML
# pErrorMessage->Returned Error Message

Calculate pErrorMessage as ''
Calculate indentCharCount as 4 ## Number of characters to add to cause an indent

# must be using latest TM objs 2.59 to get html words
If $ctask.$OmnisVersion<='10.2.00031315'
Quit method kTrue
End If

# try and format the html and indent.   Remove any field tags by substituting for text.   we'll replace them back later
# we don't want them showing up as HMTL hags
Calculate originalContent as low(pFieldName)
Do tStringFields.$replace(originalContent,'<field>','***=field=***',originalContent)
Do tStringFields.$replace(originalContent,'</field>','---=field=---',originalContent)

# $findtags uses regular expression using TMobjs.$findPatterns() to get all the tags in a trxt string.
# there is no HTML to format - so quit.  that leave any markdown or non html stuff untouched.
Calculate tagList as $cinst.$findHTMLTags(originalContent)
If tagList.$linecount=0
Quit method kTrue
End If

# now try substitute the items from the 'taglist' and format/indent
# there is a small trick to the indent. to make it work,  we add an indent for every tag that is not </ except BR's
# and then we actually subract one every time we show the display... and that accounts for then  there should be no indent
# level at all.

Calculate content as originalContent
Calculate contentNew as ''

For tagList.$line from 1 to tagList.$linecount
Calculate thisHTMLTag as mid(originalContent,tagList.start+1,tagList.end-tagList.start)
Calculate pos as pos(thisHTMLTag,low(content))
If pos=0
Calculate pErrorMessage as con('Unable to find tag ',thisHTMLTag,' when trying to format the HTML between positions ',tagList.start+1,' and ',tagList.end+1)
Quit method kFalse
End If

# find out if there is anything preceeding the tag.   if so, its text and it needs to be indented the current indent amount
If pos>1
# for text, remove leading and trailing CR's and spaces.  they'll come back with indents and trailing CR in a moment
Calculate temp as left(content,pos-1)
Do commentList.$define(textWithNoHTML)

# amongst the text with no html in it, there could be 'CR's.  Its fairlly common since web pages
# are there to display text.    Separate the text by CR's and then get rid of blank lines and strip leading and trailing spaces out
# of the text.      We'll put the leading spaces back in based on the 'indent' level we are currently at.
Do tStringFields.$stringtolist(temp,1,commentList,kCr,nam(textWithNoHTML),kFalse)
For commentList.$line from 1 to commentList.$linecount
# strip leading and trailing spaces and figure out what we want to add .  it is n each line of commentlist.textWithNoHTML
Do tStringFields.$trim(commentList.textWithNoHTML,' ')
If len(commentList.textWithNoHTML)>0
Calculate indentChars as jst('',con(indent,'P '))
Calculate contentNew as con(contentNew,indentChars,commentList.textWithNoHTML,kCr)
End If
End For
End If

# process tags we want to replace or pretty up.  currently only various incarnations of the <br> tag
Calculate stayAtSameLevel as kFalse
Calculate replaceHTMLtag as thisHTMLTag
If replaceHTMLtag='<br>'|replaceHTMLtag='</br>'|replaceHTMLtag='<br/>'
Calculate stayAtSameLevel as kTrue
Calculate replaceHTMLtag as '<br/>'
End If

# now see if we want to indent.  this occurs if the first two chars are NOT </-
# <br/> do not cause an indent as they are standalone and not paired.
If not(stayAtSameLevel)
If left(replaceHTMLtag,2)<>'</'
Calculate indent as indent+indentCharCount
End If
End If

# determine the number of characters to indent for this line of HTML
Calculate indentChars as jst('',con(indent-indentCharCount+stayAtSameLevel*indentCharCount,'P '))
Calculate contentNew as con(contentNew,indentChars,replaceHTMLtag,kCr)

# chop the characters we just replaced from the original content based on length of original tag.
# so that we can process the nexxt group of characters
Calculate content as mid(content,pos+len(thisHTMLTag))

# now see if we want to un-indent, but only if first two chars are </
# BR's dont' count, or anything that is deemed as 'stayAtSameLevel'
If not(stayAtSameLevel)
If left(replaceHTMLtag,2)='</'
Calculate indent as max(indent-indentCharCount,0)
End If
End If

End For

# put the field tags back since they are plain text and reformatted
Do tStringFields.$replace(contentNew,'***=field=***','<field>',contentNew)
Do tStringFields.$replace(contentNew,'---=field=---','</field>',contentNew)

# remove any double CR's from the updated html
While pos(con(kCr,kCr),contentNew)
Do tStringFields.$replace(contentNew,con(kCr,kCr),kCr,contentNew)
End While

# trim out leading and trailing blank lines, if we find any and return the result
Do tStringFields.$trim(contentNew,kCr)
Calculate pFieldName as contentNew

Quit method kTrue

Calculate start as start
Calculate end as end


—————— and method used in above

$findHTMLTags (pText)

pText is a string

local variables
lists:   Patterns, Matches


# RegEx meaning:
#
#   <             # Left angle bracket
#   \/?           # Forward slash (escaped - optional)
#  \s*            # Spaces (optional)
#  ([\w\d]+)  # Any letter or digit (many - grouped)
#  \s*            # Spaces (optional)
#  [^>]*        # Any character that isn't a right angle bracket (many - optional)
#  >              # Right angle bracket
#
Calculate patterns as TMObjs.$makeparamrow('tag','<\/?\s*([\w\d]+)\s*[^>]*>')

# matches.$define(Order, Name, Content, Start, End)  -- "Name" will always be "tag", due to patterns above
Calculate matches as tStringFields.$findpatterns(patterns,pText)
Quit method matches




————————————————— Example output ————————————————————	


what does it do to text?

here’s what I put into a text field.

<p class="text-center"><class="text-center"></class="text-center"></p><p>Renewal Deadline is November 18th, 2022</p>
<p>7:30pm Start Time<br>2:00pm Matinée Show on Sunday</p>
<p>Season Ticket Prices<br>Adult - $90<br>Student $39</p><p>We are excited to welcome you back to the Powerhouse theatre! We will be kicking off our season with <strong>Miracle on 34th Street the Musical.</strong> Book, music, and lyrics by Meredith Willson. When a department store Santa claims he’s the real Kris Kringle, a wave of love spreads across New York City that melts even the most cynical hearts. A holiday classic based on the movie of the same name, Miracle on 34th Street is a joyous, heartwarming musical for the whole family.</p>
<p>We follow this up with the smash hit play “<strong>The Mousetrap</strong>” by Agatha Christie, a “whodunit” with a <em>twist</em>. This thrilling production is THE genre-defining murder mystery from the best-selling novelist of all time… case closed! <br> <br>Our final production of the season, <strong>My Old Lady</strong> by Israel Horovitz, is a thoughtful and charming emotional drama. Mathias shows up in Paris, having been left a handsome apartment by his father; but to his horror, he soon finds he has inherited more than simply bricks and mortar…<br></p>


and what comes out becomes a more readable and HTML5 correct version like:


<p class="text-center">
    <class="text-center">
    </class="text-center">
</p>
<p>
    renewal deadline is november 18th, 2022
</p>
<p>
    7:30pm start time
    <br/>
    2:00pm matinée show on sunday
</p>
<p>
    season ticket prices
    <br/>
    adult - $90
    <br/>
    student $39
</p>
<p>
    we are excited to welcome you back to the powerhouse theatre! we will be kicking off our season with
    <strong>
        miracle on 34th street the musical.
    </strong>
    book, music, and lyrics by meredith willson. when a department store santa claims he’s the real kris kringle, a wave of love spreads across new york city that melts even the most cynical hearts. a holiday classic based on the movie of the same name, miracle on 34th street is a joyous, heartwarming musical for the whole family.
</p>
<p>
    we follow this up with the smash hit play “
    <strong>
        the mousetrap
    </strong>
    ” by agatha christie, a “whodunit” with a
    <em>
        twist
    </em>
    . this thrilling production is the genre-defining murder mystery from the best-selling novelist of all time… case closed!
    <br/>
    <br/>
    our final production of the season,
    <strong>
        my old lady
    </strong>
    by israel horovitz, is a thoughtful and charming emotional drama. mathias shows up in paris, having been left a handsome apartment by his father; but to his horror, he soon finds he has inherited more than simply bricks and mortar…
    <br/>
</p>








Doug Easterbrook
Arts Management Systems Ltd.
mailto:doug at artsman.com
http://www.artsman.com
Phone (403) 650-1978

> On Oct 13, 2022, at 3:18 PM, Doug Easterbrook via omnisdev-en <omnisdev-en at lists.omnis-dev.com> wrote:
> 
> hi all.
> 
> there is a JSON method on omnis called oJSON.$formatJSON(string) returns formattedJson.
> 
> I’ve taken to using it to pretty up json for display in a multi line text field.   Very useful.   makes JSON easy to read.
> 
> 
> I want to do the same with HTML that will take code like
> 
> <div class=“someclass"> <b> I am text with accentedcharacter </b></div> 
> 
> and turn it into a pretty format like below where if there is an accented character, it replaces it with the accepted HTML equivalent such as   è becomes  è
> 
> the pretty form would be like
> 
> <div class=“someclass”>
>  <b>
>   I am text with è
> </b>
> </div> 
> 
> 
> 
> 
> I have a small python program/library that does this, so I can go through the trouble of writing a file and converting it and reading it back in.
> 
> 
> I wondered if anybody had done this within omnis using some tool or had some code to do it that might be shared?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Doug Easterbrook
> Arts Management Systems Ltd.
> mailto:doug at artsman.com
> http://www.artsman.com
> Phone (403) 650-1978
> 
> _____________________________________________________________
> Manage your list subscriptions at https://lists.omnis-dev.com
> Start a new message -> mailto:omnisdev-en at lists.omnis-dev.com 



More information about the omnisdev-en mailing list