8

I want to clean up some data I copied from an Excel spreadsheet and wash off any formatting using Notepad++. I want to take advantage of line numbering of Notepad++ to count the number of lines I have. That is: the number of rows with actual data in Excel.

Cause for exporting data from Excel

Each row of a spreadsheet is it's own line number. It's indicated by the number in the left margin in Excel. So I could look at that and figure it out.

That is, if it looks like this:

  A    B
1 #### something
2 #### something
3 #### something
4 #### something
5 #### something
6 #### something

But if it looks like this...

  A    B
1 #### something
2 #### something
3 
4 #### something
5 
6 #### something

... and the list is very long, then it's really hard to say how many rows/lines there are in this list. I would have to go down the list and visually count the rows. I mean I know Word has the option to tell you how many lines or words you have in a document. But I'm not sure there even is such a thing in Excel.

And I don't know if you can do anything about this right in Excel, i.e. tell it to remove rows of cells without any data and move the rows of cells next to each other (top/bottom). If you know of a way, please share it. But I think Notepad++ is a great tool for manipulation of text data like this.

Problem I face in Notepad++

I essentially want to remove the extra line breaks. This is what it might look like.

Line 01:something
Line 02:something
Line 03:something
Line 05:
Line 06:
Line 07:something
Line 08:something
Line 09:
Line 10:
Line 11:
Line 12:
Line 13:something

I want to remove line 5, 6, 9, 10, 11, and 12.

What I have tried

  • TextFX Edit: I tried the option Delete Blank Lines from Edit menu. Didn't work.*1
  • Replace: I tried typing \n\r in find and nothing in replace field. "0 occurancies were found" *2
  • Replace: I tried typing \r\n in find and nothing in replace field. "402 occurancies were replaced" But it puts everything on a single line. Like this: something something something
  • Edit menu: I tried Remove Empty Lines from Edit menu.*3
  • Edit menu: I tried Join Lines from Edit menu.

So what else?... what else can I try?

This is the result I am after:

Line 01:something
Line 02:something
Line 03:something
Line 07:something
Line 08:something
Line 13:something

How can I achieve this?

Update

****1, 2, 3** = These three methods should work. I have tested and verified each. The other two do not and should not. They have a different purpose.*

Okay, I think I know now why this is not working. The text I am processing is copied and pasted from Excel 2010 into Notepad++ and there are some extra characters in there that throw off these commands. There are some tabs and CF or LF characters.

Here is a closer description of what I have.

Line 01:####[tab]something[CR][LF]
Line 02:####[tab]something[CR][LF]
Line 03:####[tab]something[CR][LF]
Line 05:[tab][CR][LF]
Line 06:[tab][CR][LF]
Line 07:####[tab]something[CR][LF]
Line 08:####[tab]something[CR][LF]
Line 09:[tab][CR][LF]
Line 10:[tab][CR][LF]
Line 11:[tab][CR][LF]
Line 12:[tab][CR][LF]
Line 13:####[tab]something

Where...

  • = 4-digit number
  • [tab] = tab
  • [CR] = carriage return
  • [LF] = line feed

So how do you deal with this?...

Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311
Samir
  • 21,235

5 Answers5

7

You can use the Regular Expression mode of Search->Replace to do this :

Your empty line can be described with the regular expression \s*\R following a new line \R, thus you could use the following replacement :

Find what : "(\R)(\s*\R)+"
Replace with : "\1"

\R is a universal new-line, it's equivalent to \r\n (Windows), \n (Unix) or \r (Mac), depending on the current new-line format.

\s is any spacing character, it's equivalent to [[:space:]] or [ \t\n\r\f\v]

zakinster
  • 2,567
7

Another response after the update...

If the text is in this format you should be able to do a find and replace using the following:

Set the search mode in Notepad++ to "Extended" first.

Set "Find What" to "\t\r\n" and "Replace With" left blank.

That should search and replace for the pattern TAB CR LF as in the text above.

3

In Excel (I'm using 2010) you can select and remove blank rows:


Select your data range

starting data


Under the Home tab go to Find & Select and choose Go to special

Go to special


In the options window choose blanks. This will select all blank cells within the data range. Right click within a blank cell and select delete, then select Move cells up

selected blanks


Done

CLockeWork
  • 2,167
2

If you don't care about retaining the order of the lines you could sort the lines and this will either put the empty lines at the top or the bottom of the document depending on which sort order is used?

2

For Notepad++, by far the easiest way to deal with this situation is a built-in function:

Edit > Line Operations > Remove Empty Lines (Containing Blank characters)

@Sammy said that he tried the option immediately above this one ("Remove Empty Lines"), but you need to choose the one including lines with blank characters (such as Tab and Space).

I love the Find/Replace features of Notepad++, but this built-in function is easier.

Dane
  • 1,945