Preface
Assuming all your files are in the same format, you could simply use a regex replace.
Answer
You can view this in use on regex101 here
Explanation
This regex [\t ]|-{2,}\s*|^\||\|$ will:
- Catch all tab or space characters
- Catch all
- characters where two such characters follow each other (as well as any following whitespace characters)
- Catch all lines beginning with the
| character
- Catch all lines ending with the
| character
Note that you must ensure global g and multi-line m modifiers are active.
Code
Your final code should resemble the following:
import re
regex = r"[\t ]|-{2,}\s*|^\||\|$"
subst = ""
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
Where test_str contains the file's contents (such as below)
---------------------------------------------------------------------------
| MANDT|BUKRS|NETWR |UMSKS|UMSKZ|AUGDT |AUGBL|ZUONR |
---------------------------------------------------------------------------
| 100 |1000 |23.321- | | | | |TEXT I WANT TO KEEP|
| 100 |1000 |0.12 | | | | |TEXT I WANT TO KEEP|
| 100 |1500 |90 | | | | |TEXT I WANT TO KEEP|
---------------------------------------------------------------------------
Output
MANDT|BUKRS|NETWR|UMSKS|UMSKZ|AUGDT|AUGBL|ZUONR
100|1000|23.321-|||||TEXTIWANTTOKEEP
100|1000|0.12|||||TEXTIWANTTOKEEP
100|1500|90|||||TEXTIWANTTOKEEP
Edit
Answer
You can view this in use on regex101 here
Explanation
(?:^\|[\t ]*)|(?:[\t ]*\|$)|(?:(?<=\|)[\t ]*)|(?:[\t ]*(?=\|))|(?:-{2,}\s*)
The regex above will:
- Catch
| (only at the beginning of a line) followed by any number of tab or space characters
- Catch any number of tab or space characters followed by
| (only at the end of a line)
- Catch any number of tab or space characters that follow
|
- Catch any number of tab or space characters that precede
|
- Catch all
- characters where two such characters follow each other (as well as any following whitespace characters)
Note that you must ensure global g and multi-line m modifiers are active.
Code
Your final code should resemble the following:
import re
regex = r"(?:^\|[\t ]*)|(?:[\t ]*\|$)|(?:(?<=\|)[\t ]*)|(?:[\t ]*(?=\|))|(?:-{2,}\s*)"
subst = ""
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
Where test_str contains the file's contents (such as below)
---------------------------------------------------------------------------
| MANDT|BUKRS|NETWR |UMSKS|UMSKZ|AUGDT |AUGBL|ZUONR |
---------------------------------------------------------------------------
| 100 |1000 |23.321- | | | | |TEXT I WANT TO KEEP|
| 100 |1000 |0.12 | | | | |TEXT I WANT TO KEEP|
| 100 |1500 |90 | | | | |TEXT I WANT TO KEEP|
---------------------------------------------------------------------------
Output
MANDT|BUKRS|NETWR|UMSKS|UMSKZ|AUGDT|AUGBL|ZUONR
100|1000|23.321-|||||TEXT I WANT TO KEEP
100|1000|0.12|||||TEXT I WANT TO KEEP
100|1500|90|||||TEXT I WANT TO KEEP