TG Telegram Group & Channel
GeekTips | United States America (US)
Create: Update:

mkdir -p splitmd ; for f in *.md ; do awk 'BEGIN { abbrevs["Mr."] = 1; abbrevs["Mrs."] = 1; abbrevs["Ms."] = 1; abbrevs["Dr."] = 1; abbrevs["U.S."] = 1; abbrevs["U.S.A."] = 1; abbrevs["i.e."] = 1; abbrevs["e.g."] = 1; sentence_count = 0; paragraph_sentences = "" } { gsub(/([.!?])\s+/, "\\1\n"); sentence = ""; for (i = 1; i <= NF; i++) { word = $i; if (abbrevs[word] || (i < NF && abbrevs[$i " " $(i+1)])) { sentence = sentence word " " } else if (match(word, /[.!?]$/)) { sentence = sentence word; sentence_count++; paragraph_sentences = (paragraph_sentences == "") ? sentence : paragraph_sentences " " sentence; if (sentence_count % 6 == 0) { printf "%s\n\n", paragraph_sentences; paragraph_sentences = "" } sentence = "" } else { sentence = sentence word " " } } if (paragraph_sentences != "") { printf "%s\n", paragraph_sentences } }' "$f" > splitmd/"$f" ; done


awk script to segment every six sentences into a paragraph. Only breaks a sentence at . ? or ! and not at the listed abbreviations Dr. Mrs. Mr. U.S.A, etc.

mkdir -p splitmd ; for f in *.md ; do awk 'BEGIN { abbrevs["Mr."] = 1; abbrevs["Mrs."] = 1; abbrevs["Ms."] = 1; abbrevs["Dr."] = 1; abbrevs["U.S."] = 1; abbrevs["U.S.A."] = 1; abbrevs["i.e."] = 1; abbrevs["e.g."] = 1; sentence_count = 0; paragraph_sentences = "" } { gsub(/([.!?])\s+/, "\\1\n"); sentence = ""; for (i = 1; i <= NF; i++) { word = $i; if (abbrevs[word] || (i < NF && abbrevs[$i " " $(i+1)])) { sentence = sentence word " " } else if (match(word, /[.!?]$/)) { sentence = sentence word; sentence_count++; paragraph_sentences = (paragraph_sentences == "") ? sentence : paragraph_sentences " " sentence; if (sentence_count % 6 == 0) { printf "%s\n\n", paragraph_sentences; paragraph_sentences = "" } sentence = "" } else { sentence = sentence word " " } } if (paragraph_sentences != "") { printf "%s\n", paragraph_sentences } }' "$f" > splitmd/"$f" ; done


awk script to segment every six sentences into a paragraph. Only breaks a sentence at . ? or ! and not at the listed abbreviations Dr. Mrs. Mr. U.S.A, etc.


>>Click here to continue<<

GeekTips




Share with your best friend
VIEW MORE

United States America Popular Telegram Group (US)