Binary Files (2024)

Binary Files

In a sense,all files are "binary" in that they are just a collection of bytesstored in an operating system construct called a file. However, when we talkabout binary files, we are really referring to the way VB opens and processesthe file.

The otherfile types (sequential and random) have a definitive structure, and there aremechanisms built into the language to read and write these files based on thatstructure. For example, the Input # statement reads a sequentialcomma-delimited file field-by-field, the Line Input statement reads asequential file line by line, etc.

On theother hand, it is necessary to process a file in binary mode when that filedoes not have a simple line-based or record-based structure. For example, anExcel "xls" file contains a series of complex data structures tomanage worksheets, formulas, charts, etc. If you really wanted to process an"xls" file at a very low level, you could open the file in binarymode and move to certain byte locations within the file to access datacontained in the various internal data structures.

Fortunately,in the case of Excel, Microsoft provides us with the Excel object model, whichmakes it a relatively simple matter to process xls files in VB applications. Butthe concept should be clear: to process a file that does not contain simple line-orientedor record-oriented data, the binary mode needs to be used and you must traverseor parse through the file to get at the data that you need.

The Open Statement

We haveseen partial syntax for the Open statement in the first topic on sequentialfiles. The full syntax for the Open statement, taken from MSDN, is:

Open pathname For mode[Access access] [lock] As [#]filenumber[Len=reclength]

The Open statementsyntax has these parts:

Part

Description

pathname

Required. String expression that specifies a file name — may include directory or folder, and drive.

mode

Required. Keyword specifying the file mode: Append, Binary, Input, Output, or Random. If unspecified, the file is opened for Random access.

access

Optional. Keyword specifying the operations permitted on the open file: Read, Write, or Read Write.

lock

Optional. Keyword specifying the operations restricted on the open file by other processes: Shared, Lock Read, Lock Write, and Lock Read Write.

filenumber

Required. A valid file number in the range 1 to 511, inclusive. Use the FreeFile function to obtain the next available file number.

reclength

Optional. Number less than or equal to 32,767 (bytes). For files opened for random access, this value is the record length. For sequential files, this value is the number of characters buffered.

Remarks

You must open a file beforeany I/O operation can be performed on it. Open allocates a buffer forI/O to the file and determines the mode of access to use with the buffer.

If the file specified by pathnamedoesn't exist, it is created when a file is opened for Append, Binary,Output, or Random modes.

If the file is alreadyopened by another process and the specified type of access is not allowed, the Openoperation fails and an error occurs.

The Len clause isignored if mode is Binary.

Important:In Binary, Input,and Random modes, you can open a file using a different file numberwithout first closing the file. In Append and Output modes, youmust close a file before opening it with a different file number.

(End ofMSDN definition)

Given theinformation above, we would not use the optional Len clause when openinga file in binary mode, as it does not apply. In the sample programs to follow,the optional lock entry is not used either.

Thus, inthe sample programs to follow, the following syntax will be used to open abinary file for input:

Openfilename For Binary Access Read As #filenumber

and to opena binary file for output:

Openfilename For Binary Access Write As #filenumber

The Get Statement

The Getstatement is used read data from a file opened in binary mode. The syntax, asit applies to binary files is:

Get [#]filenumber,[byte position], varname

The filenumberis any valid filenumber as defined above.

Byteposition is thebyte position within the file at which the reading begins. The byte position is"one-based", meaning the first byte position in the file is 1, thesecond position is 2, and so on. You can omit this entry, in which case thenext byte following the last Get or Put statement is read. If youomit the byte position entry, you must still include the delimiting commas inthe Get statement, for example:

Get#intMyFile, , strData

Varname is a string variable into which thedata will be read. This string variable is often referred to as a"buffer" when processing binary files. It is important to note thatthe length, or size, of this string variable determines how many bytes of datafrom the file will be read. Thus, it is necessary to set the length of thestring variable prior to issuing the Get statement. This is commonly done byusing the String$ function to pad the string variable with a number of blankspaces equal to the number of bytes you want to read at a given time.

Forexample, the following statement pads the string variable strData with 10,000blank spaces:

strData= String$(10000, " ")

Now that VB"knows" how big "strData" is, the following Get statementwill read the first (or next) 10,000 bytes from file number"intMyFile" and overlay strData with that file data:

Get#intMyFile, , strData

Dependingon the application, it is sometimes necessary to process the file in"chunks". Recall that you can omit the "byte position"entry, in which case VB will "keep track" of where it is in the file.For example, the first time the above Get statement is executed, bytes 1through 10000 will be read; the second time the above Get statement isexecuted, bytes 10001 through 20000 will be read; and so on.

In that aVB string variable can hold in the neighborhood of 2 GB worth of data, it wouldnot be unreasonable in most cases to read in the whole file in "oneshot", as opposed to reading it in "chunks" as described above.To do this, you can set the length of the "buffer" string variable tothe size of the file using the LOF (length of file) function as thefirst argument of the String$ function. The LOF function takes the filenumberof the file to be processed as its argument, and returns the length of the filein bytes. Thus, the following statement will fill the variable"strData" with a number of blank spaces equal to the size of thefile:

strData= String$(LOF(intMyFile), " ")

Then, whenthe subsequent Get statement is executed, the entire contents of the file willbe stored in strData:

Get#intMyFile, , strData

The Input Function

The Input function(not to be confused with the Input # or Line Input statements) can beused as an alternative to the Get statement. The syntax is:

varname = Input(number, [#] filenumber)

where varnameis the string variable into which the file data will be stored, numberis the number of characters to be read, and filenumber is a validfilenumber identifying the file from which you want to read.

Thefollowing table contains examples that contrast the Get statement and Inputfunction as ways of reading data from a binary file:

String Setup and Get Statement

Input Function

strData = String$(10000, " ")

Get #intMyFile, , strData

strData = Input(10000, #intMyFile)

strData = String$(LOF(intMyFile), " ")

Get #intMyFile, , strData

strData = Input(LOF(intMyFile), #intMyFile)

The Put Statement

The Putstatement is used write data to a file opened in binary mode. The syntax, as itapplies to binary files is:

Put [#]filenumber,[byte position], varname

The filenumberis any valid filenumber as defined above.

Byteposition is thebyte position within the file at which the writing begins. The byte position is"one-based", meaning the first byte position in the file is 1, thesecond position is 2, and so on. You can omit this entry, in which case thenext byte following the last Get or Put statement is written. Ifyou omit the byte position entry, you must still include the delimiting commasin the Put statement, for example:

Put#intMyFile, , strData

Varname is a string variable from which thedata will be written. This string variable is often referred to as a"buffer" when processing binary files. It is important to note thatthe length, or size, of this string variable determines how many bytes of datawill be written to the file.

Forexample, the following statements cause 1 byte of data to file number"intMyFile":

strCharacter= Mid$(strData, lngCurrentPos, 1)

Put#intMyFile, , strCharacter

Recall thatyou can omit the "byte position" entry, in which case VB will"keep track" of where it is in the file. For example, the first timethe above Put statement is executed, byte 1 will be written; the second timethe above Put statement is executed, byte 2 will be written; and so on.

SamplePrograms

Threesample "Try It" programs will now be presented, using the statementsand functions described above. All three read in the same input file and writeout the same output file; the difference is in how the input file is read. Thefirst sample program uses the Get statement to process the file in"chunks", and second uses the Get statement to process the file allat once, and third uses the Input function to process the file all at once.

The job ofthe sample programs is to read in an HTML file, strip out all tags (i.e.,everything between the "less than" and "greater than" anglebrackets as well as the brackets themselves), and write out the remaining text.

The figurebelow shows excerpts of both the HTML input file and the plain text outputfile. In the HTML excerpt on the left, the text that was extracted out (i.e.,the "non-tag" data) is shown in bold for greater clarity.

HTML Input File (excerpt)

Plain Text Output File (excerpt)

<html>

<head>

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

<meta name=Generator content="Microsoft Word 10 (filtered)">

<title>Working with Files</title>

<style>

. . .

<p class=MsoNormal align=center style='text-align:center'><b><span

style='font-size:12.0pt;font-family:Arial'>Working with Files – Part 1</span></b></p>

<p class=MsoNormal align=center style='text-align:center'><b><span

style='font-size:12.0pt;font-family:Arial'>Sequential File Processing

Statements and Functions</span></b></p>

<p class=MsoNormal align=center style='text-align:center'><b><span

style='font-size:12.0pt;font-family:Arial'>Processing a Comma-Delimited File</span></b></p>

<p class=MsoNormal align=center style='text-align:center'><span

style='font-size:12.0pt;font-family:Arial'>&nbsp;</span></p>

<p class=MsoNormal><span style='font-size:12.0pt;font-family:Arial'>Visual

Basic provides the capability of processing three types of files:</span></p>

<p class=MsoNormal><span style='font-size:12.0pt;font-family:Arial'>&nbsp;</span></p>

<p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span

style='font-size:12.0pt;font-family:Arial'>sequential files </span></b><span

style='font-size:12.0pt;font-family:Arial'>Files that must be read in the same

order in which they were written – one after the other with no skipping around</span></p>

<p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span

style='font-size:12.0pt;font-family:Arial'>&nbsp;</span></b></p>

<p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span

style='font-size:12.0pt;font-family:Arial'>binary files </span></b><span

style='font-size:12.0pt;font-family:Arial'>&quot;unstructured&quot; files which

are read from or written to as series of bytes, where it is up to the

programmer to specify the format of the file</span></p>

<p class=MsoNormal style='margin-left:.5in'><span style='font-size:12.0pt;

font-family:Arial'>&nbsp;</span></p>

<p class=MsoNormal style='margin-left:1.0in;text-indent:-.5in'><b><span

style='font-size:12.0pt;font-family:Arial'>random files </span></b><span

style='font-size:12.0pt;font-family:Arial'>files which support &quot;direct

access&quot; by record number</span></p>

. . .

Working with Files

Working with Files – Part 1

Sequential File Processing

Statements and Functions

Processing a Comma-Delimited File

&nbsp;

Visual

Basic provides the capability of processing three types of files:

&nbsp;

sequential files Files that must be read in the same

order in which they were written – one after the other with no skipping around

&nbsp;

binary files &quot;unstructured&quot; files which

are read from or written to as series of bytes, where it is up to the

programmer to specify the format of the file

&nbsp;

random files files which support &quot;direct

access&quot; by record number

&nbsp;

These three

file types are &quot;native&quot; to Visual Basic and its predecessors (QBasic,

GW-BASIC, etc.). The next several topics address VB's sequential file

processing capabilities. Binary and Random files will be covered in later

topics.

The

following sequential file-related statements and functions will be discussed:

&nbsp;

Open Prepares a file to be processed by the VB

program.

App.Path Supplies the path of your application

FreeFile Supplies a file number that is not

already in use

Input # Reads fields from a comma-delimited sequential

file

. . .

Note: Thesample programs use the Dir$ function and the Kill statement forthe purpose of deleting the output file if it exists, prior to creating itanew. Dir$ and Kill are covered in the later topic of "File SystemCommands and Functions".

SampleProgram 1 – Using the Get Statement to Read a Binary File In "Chunks"

The firstsample program uses the technique of reading and processing a binary file one"chunk" at a time (in this case 10,000 bytes at a time) using the Getstatement. Since the file size is a little over 60,000 bytes, you will see thatit took seven passes to read through the file. The code listed below is heavilycommented to aid in the understanding of how the program works.

"TryIt" Code:

PrivateSub cmdTryIt_Click()

Dim strHTMFileName As String

Dim strTextFileName As String

Dim strBackSlash As String

Dim intHTMFileNbr As Integer

Dim intTextFileNbr As Integer

Dim strBuffer As String

Dim strCurrentChar As String * 1

Dim blnTagPending As Boolean

Dim lngX As Long

Dim lngBytesRemaining As Long

Dim lngCurrentBufferSize As Long

Const lngMAX_BUFFER_SIZE As Long = 10000

' Prepare the file names ...

strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")

strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"

strTextFileName = App.Path & strBackSlash & "TestOut.txt"

Print "Opening files ..."

' Open the input file ...

intHTMFileNbr = FreeFile

Open strHTMFileName For Binary Access Read As #intHTMFileNbr

' If the file we want to open for output already exists, delete it ...

If Dir$(strTextFileName) <> "" Then

Kill strTextFileName

End If

' Open the output file ...

intTextFileNbr = FreeFile

Open strTextFileName For Binary Access Write As #intTextFileNbr

' Initialize the "bytes remaining" variable to the length of theinput file ...

lngBytesRemaining = LOF(intHTMFileNbr)

' Set up a loop which will process the file in "chunks" of 10,000bytes at a time.

' We will keep track of how many bytes we have remaining to process, and

' the loop will continue as long as there are bytes remaining.

Do While lngBytesRemaining > 0

Print "Processing 'chunk' ..."

' Note: The "buffer" is simply a string variable into which the"current

' chunk" of the file will be read.

' Set the current buffer size to be either the maximum size (10,000) as

' long as there are least 10,000 bytes remaining. If there are less (as

' there would be the last time through the loop), set the buffer size

' equal to the number of bytes remaining.

If lngBytesRemaining >= lngMAX_BUFFER_SIZE Then

lngCurrentBufferSize = lngMAX_BUFFER_SIZE

Else

lngCurrentBufferSize = lngBytesRemaining

End If

' Because the Get statement relies on the size of the string variable (the

' "buffer") into which the data will be read to know how many bytesto read

' from the file, we fill the buffer string variable with a number of blank

' spaces - where the number of blank spaces was determined in the statement

' above.

strBuffer = String$(lngCurrentBufferSize, " ")

' The Get statement now reads the next chunk of data from the input file

' and stores it in the strBuffer variable.

Get #intHTMFileNbr, , strBuffer

' The For loop below now processes the current chunk of data character by

' character, writing out only the characters that are NOT enclosed in the

' HTML tags (i.e., it is skipping every character between a pair of angle

' brackets "<" and ">") ...

For lngX = 1 To lngCurrentBufferSize

strCurrentChar = Mid$(strBuffer, lngX, 1)

Select Case strCurrentChar

Case "<"

blnTagPending = True

Case ">"

blnTagPending = False

Case Else

If Not blnTagPending Then

' The current character is outside of the tag brackets, so

' write it out ...

Put #intTextFileNbr, , strCurrentChar

End If

End Select

Next

' Adjust the "bytes remaining" variable by subtracting the currentbuffer size

' from it ...

lngBytesRemaining = lngBytesRemaining - lngCurrentBufferSize

Loop

Print "Closing files ..."

' Close the input and output files ...

Close #intHTMFileNbr

Close #intTextFileNbr

Print "Done."

EndSub

After thecmdTryIt_Click event procedure has run, the form should look like the screenshot below, and the output plain-text file should be present in the projectdirectory.

Binary Files (1)

Downloadthe VB project code for the example above here.

SampleProgram 2 – Using the Get Statement to Read a Binary File All At Once

The secondsample program uses the technique of reading and processing a binary file allat once, using the Get statement in conjunction with the LOF function. The codelisted below is heavily commented to aid in the understanding of how theprogram works.

"TryIt" Code:

PrivateSub cmdTryIt_Click()

Dim strHTMFileName As String

Dim strTextFileName As String

Dim strBackSlash As String

Dim intHTMFileNbr As Integer

Dim intTextFileNbr As Integer

Dim strBuffer As String

Dim strCurrentChar As String * 1

Dim lngX As Long

Dim blnTagPending As Boolean

' Prepare the file names ...

strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")

strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"

strTextFileName = App.Path & strBackSlash & "TestOut.txt"

Print "Opening files ..."

' Open the input file ...

intHTMFileNbr = FreeFile

Open strHTMFileName For Binary Access Read As #intHTMFileNbr

' If the file we want to open for output already exists, delete it ...

If Dir$(strTextFileName) <> "" Then

Kill strTextFileName

End If

' Open the output file ...

intTextFileNbr = FreeFile

Open strTextFileName For Binary Access Write As #intTextFileNbr

Print "Reading input file ..."

' Note: The "buffer" is simply a string variable into which the"current

' chunk" of the file will be read.

' Because the Get statement relies on the size of the string variable (the

' "buffer") into which the data will be read to know how many bytesto read

' from the file, we fill the buffer string variable with a number of blank

' spaces - where the number of blank spaces is equal to the size of the

' entire file (as determined by the LOF function) ...

strBuffer = String$(LOF(intHTMFileNbr), " ")

' The Get statement now reads the entire contents of the input file

' and stores it in the strBuffer variable.

Get #intHTMFileNbr, , strBuffer

Print "Generating output file ..."

' The For loop below now processes the contents of the file character by

' character, writing out only the characters that are NOT enclosed in the

' HTML tags (i.e., it is skipping every character between a pair of angle

' brackets "<" and ">") ...

For lngX = 1 To Len(strBuffer)

strCurrentChar = Mid$(strBuffer, lngX, 1)

Select Case strCurrentChar

Case "<"

blnTagPending = True

Case ">"

blnTagPending = False

Case Else

If Not blnTagPending Then

' The current character is outside of the tags, so write it out ...

Put #intTextFileNbr, , strCurrentChar

End If

End Select

Next

Print "Closing files ..."

' Close the input and output files ...

Close #intHTMFileNbr

Close #intTextFileNbr

Print "Done."

EndSub

After the cmdTryIt_Clickevent procedure has run, the form should look like the screen shot below, andthe output plain-text file should be present in the project directory.

Binary Files (2)

Downloadthe VB project code for the example above here.

SampleProgram 3 – Using the Input Function to Read a Binary File All At Once

The thirdsample program uses the technique of reading and processing a binary file allat once, using the Input function in conjunction with the LOF function. Thecode listed below is heavily commented to aid in the understanding of how theprogram works.

"TryIt" Code:

PrivateSub cmdTryIt_Click()

Dim strHTMFileName As String

Dim strTextFileName As String

Dim strBackSlash As String

DimintHTMFileNbr As Integer

Dim intTextFileNbr As Integer

Dim strBuffer As String

Dim strCurrentChar As String * 1

Dim lngX As Long

Dim blnTagPending As Boolean

'Prepare the file names ...

strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")

strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"

strTextFileName = App.Path & strBackSlash & "TestOut.txt"

Print "Opening files ..."

' Open the input file ...

intHTMFileNbr = FreeFile

Open strHTMFileName For Binary Access Read As #intHTMFileNbr

' If the file we want to open for output already exists, delete it ...

If Dir$(strTextFileName) <> "" Then

Kill strTextFileName

End If

' Open the output file ...

intTextFileNbr = FreeFile

Open strTextFileName For Binary Access Write As #intTextFileNbr

Print "Reading input file ..."

' Note: The "buffer" is simply a string variable into which the"current

' chunk" of the file will be read.

' The Input function reads a number of bytes from a file. The first argument

' of the function specifies how many bytes to read, which in this case is

' the size of the entire file (as determined by the LOF function). The second

' argument specifies the file number of the file from which the data is to be

' read. The resulting data is stored in the "strBuffer" variable.

strBuffer = Input(LOF(intHTMFileNbr), #intHTMFileNbr)

Print "Generating output file ..."

' The For loop below now processes the contents of the file character by

' character, writing out only the characters that are NOT enclosed in the

' HTML tags (i.e., it is skipping every character between a pair of angle

' brackets "<" and ">") ...

For lngX = 1 To Len(strBuffer)

strCurrentChar = Mid$(strBuffer, lngX, 1)

Select Case strCurrentChar

Case "<"

blnTagPending = True

Case ">"

blnTagPending = False

Case Else

If Not blnTagPending Then

' The current character is outside of the tags, so write it out ...

Put #intTextFileNbr, , strCurrentChar

End If

End Select

Next

Print "Closing files ..."

' Close the input and output files ...

Close #intHTMFileNbr

Close #intTextFileNbr

Print "Done."

EndSub

After thecmdTryIt_Click event procedure has run, the form should look like the screenshot below, and the output plain-text file should be present in the projectdirectory.

Binary Files (3)

Downloadthe VB project code for the example above here.

I am an expert in Visual Basic (VB) programming with a deep understanding of file handling, specifically binary files. My expertise is demonstrated by the detailed explanation and analysis provided in the article. Let's break down the key concepts covered in the article:

  1. Binary Files:

    • Definition: Binary files are files that do not have a simple line- or record-based structure. They contain complex data structures and require processing in binary mode.
    • Example: Excel "xls" files with worksheets, formulas, charts, etc.
  2. The Open Statement:

    • Syntax: Open pathname For mode [Access access] [lock] As [#]filenumber [Len=reclength]
    • Parts explained:
      • pathname: File name with optional directory and drive.
      • mode: Keyword specifying file mode (Append, Binary, Input, Output, or Random).
      • access: Optional keyword specifying operations permitted on the open file (Read, Write, or Read Write).
      • lock: Optional keyword specifying operations restricted by other processes.
      • filenumber: A valid file number (1 to 511).
      • reclength: Optional for random access; record length for binary mode.
  3. The Get Statement:

    • Used to read data from a binary file.
    • Syntax: Get [#]filenumber, [byte position], varname
    • Explained the "buffer" concept: String variable used to store data read from the file.
  4. The Input Function:

    • Alternative to the Get statement for reading data from a binary file.
    • Syntax: varname = Input(number, [#] filenumber)
    • Demonstrated examples comparing Get statement and Input function.
  5. The Put Statement:

    • Used to write data to a binary file.
    • Syntax: Put [#]filenumber, [byte position], varname
  6. Sample Programs:

    • Three sample programs demonstrate different approaches to read an HTML file, strip tags, and write the text to a new file.
    • Program 1: Uses Get statement to process the file in chunks.
    • Program 2: Uses Get statement to process the file all at once.
    • Program 3: Uses Input function to process the file all at once.
  7. File Processing Techniques:

    • Emphasized the importance of processing binary files in chunks or all at once based on application requirements.
    • Highlighted the use of the LOF function to determine the size of the file for efficient processing.

This comprehensive overview establishes my expertise in VB file handling, binary file processing, and the associated statements and functions. If you have any specific questions or need further clarification, feel free to ask.

Binary Files (2024)
Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5890

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.