Reading EDI Data in Java

These days most Java developers expect to deal with JSON to exchange data with other systems and businesses. However, what happens when JSON is not an option? It's easy to forget that there are other formats for data exchange, some of which are more difficult to handle. One such format is known as EDI. EDI itself comes in several flavors - for example X12 and EDIFACT - so code to read it may not always be "one size fits all."

On the surface, reading EDI data seems to be a simple endeavor. A developer may see a sample file and attempt to read it using the basic string parsing APIs available in his or her programming language's standard library. Unfortunately, this may not always work and it becomes difficult to do data validation and handle the structure of the document effectively.

What is EDI? An Introduction

EDI is a general term that covers several standard data formats for exchanging data between businesses(or any two parties). Two of the most commonly used standards are X12 and EDIFACT. Both of these standards represent data in a sequence of named segments (which are basically records containing individual fields). For example, a simple segment might look like this:

SEG*SMITH*JOHN*20190101~

In this example, the name of the segment is "SEG", and it contains three fields - two strings and a date. Each field (known as an element) is separated from the others by the delimiter * (asterisk), and the end of the segment is indicated with the delimiter ~ (tilde).

EDI also has structure similar to XML or JSON where segments are nested within begin/end boundaries known as loops. In the example X12 acknowledgement exchange below, indentation is added to emphasize the structure. However, in practice the structure is not apparent by looking at an unformatted EDI file.

ISA*00* *00* *ZZ*ReceiverID *ZZ*Sender *191031*1301*^*00501*000000001*0*P*:~

GS*FA*ReceiverDept*SenderDept*20191031*130123*000001*X*005010X230~

ST*997*0001~

AK1*HC*000001~

AK2*837*0021~

AK3*NM1*8**8~

AK4*8*66*7*MI~

AK5*R*5~

AK9*R*1*1*0~

SE*8*0001~

GE*1*000001~

IEA*1*000000001~

Reading EDI as a Stream of Events

One option for reading EDI is to process the data as a stream of events. The StAEDI (pronounced "steady") Java library takes the same approach for EDI that the standard Java StAX API takes for process XML - a stream of events. A simple program that only lists the names of the segments would look something like this:

EDIInputFactory factory = EDIInputFactory.newFactory();

InputStream stream = new FileInputStream("my_edi_file.txt");

EDIStreamReader reader = factory.createEDIStreamReader(stream);

EDIStreamEvent event;

while (reader.hasNext()) {

event = reader.next();

if (event == EDIStreamEvent.START_SEGMENT) {

System.out.println("Segment: " + reader.getText());

}

}

If you are familiar with the StAX API for XML, this should look familiar. The EDIStreamReader is used in a loop similar to an iterator or a database result set. Each call to the next method will return the next data event from the EDI file. In addition to events for the start of a segment, there are events for handling

  • the beginning/end of an interchange (ISA and IEA in X12)
  • the beginning/end of a message group (GS and GE)
  • the beginning/end of a transaction (ST and SE)
  • the beginning/end of a loop (depending on configuration)
  • the beginning/end of a segment
  • the beginning/end of a composite element
  • an individual data element
  • segment errors
  • data element errors